Sign In

(Allmost) All Image Generation Model Comparison

13

(Allmost) All Image Generation Model Comparison

Open Source Image Generation Model Comparison – Personal Benchmarks

TL;DR: PDFs attached! One shows only closed-source models, one only open-source, and the last one throws everyone into the same mix.

I set out to compare as many image generation models as I could reasonably get my hands on. Initially, I wanted to focus only on open-source models, but eventually expanded the experiment to include closed-source ones as well for broader context.

All open-source models were run locally on my machine, which doesn’t have high-end specs:

  • CPU: Intel 12th Gen i7

  • GPU: RTX 3060 6GB

  • RAM: 32GB

I logged render times and used a variety of samplers and settings, though I didn’t have the time to fine-tune each model extensively. Most are foundation models, not fine-tuned. So, consider this an overview—not a scientific benchmark—with practical results on consumer-grade hardware.

All output images (where applicable) were generated at 896 × 1152 resolution.


Images were semi-cherry picked; selected best out of 4


Prompts Used Across All Tests

Prompt 1 – Modern Architecture at Dusk

“A modern architectural marvel featuring a cantilevered structure extending over the ocean, captured at dusk. The building comprises two main horizontal levels with floor-to-ceiling glass walls, warmly lit from within by orange ambient lighting that contrasts vividly with its dark exterior and the misty atmosphere. Its rectangular form appears to float above the turbulent waters, securely anchored into dark granite cliffs that vanish into dense fog. Shot from a high vantage point with a wide-angle lens, the composition highlights the building’s bold geometric design and its seamless integration with the rugged coastal landscape—steep cliffs and crashing waves below. Dramatic dusk lighting and sharp architectural detail complete the scene.”

Prompt 2 – Editorial Nuclear Explosion

“Portrait, masterpiece, 4K, 8K RAW editorial photo, ray tracing, hyper-realistic, highly detailed. A nuclear explosion contained within a large transparent bottle, captured with cinematic and dramatic lighting. The scene features a shallow depth of field, rich bokeh effects, and a high-budget visual style. Presented in cinemascope with subtle film grain, emphasizing intricate details and a surreal, high-impact composition.”

Prompt 3 – Mythic Eye and Human Figure

“A striking, surreal scene blending nature and fantasy. Dominating the frame is an enormous, hyper-detailed reptilian eye—its vertical, slit-like pupil hinting at a powerful mythical creature, such as a dragon. The surrounding skin is rugged and scaly, evoking an ancient, weathered presence. In stark contrast, a solitary human figure stands in the lower portion of the image, cloaked in a flowing black robe. Dwarfed by the colossal eye, the figure emphasizes the immense scale and mystique of the creature. The ground beneath appears to be mist or reflective water, casting back the eerie, otherworldly light that bathes the scene. A misty, dreamlike atmosphere enhances the sense of awe and fantasy. Visually captivating, the composition fuses dramatic lighting, fantasy worldbuilding, and a powerful contrast between human and beast.”

Prompt 4 – Cyberpunk Samurai Silhouette

“A cyberpunk cyborg samurai woman stands in a wide, powerful stance, silhouetted against a red sun and a futuristic city skyline. Her figure is rendered in a highly detailed, chaotic "Mayhem" splatter style, with a minimalist yet richly textured aesthetic. The background depicts a bizarre techno-ghetto, pulsing with distorted transmissions and patterned illumination. The scene evokes epic cinematic action, with a mix of conceptual art and stylized Japanese influence. Rendered with intentional bad VHS tape artifacts, adding gritty visual texture. Award-winning artwork feel, with surreal detail, atmospheric lighting, and a sense of dynamic movement.”

Prompt 5 – Ethereal Angelic Portrait

“A surreal, ethereal close-up portrait of a hauntingly beautiful woman with an angelic, almost divine presence. Her pale, porcelain skin glows softly and blends seamlessly into a pure white, overexposed background, giving the illusion she is emerging from light. Her eyes are extremely large and piercing blue, glowing slightly with crystalline clarity, conveying calm and otherworldly wisdom. Bold, dramatic lashes frame her gaze, creating stark contrast against her luminous skin. Her facial features are delicately refined—high cheekbones, a straight nose, and soft full lips in a subtle, serene expression. Her platinum-white hair flows weightlessly upward, light and wispy, fading into the glowing white surroundings. Soft white ash gently falls around her. The lighting is high-key and diffused, with no visible shadows, emphasizing minimalism and purity. The overall aesthetic is timeless and dreamlike, evoking an angelic, introspective atmosphere of quiet mystique and celestial beauty.”

Prompt 6 – Galactic Floral Train

“A highly detailed train adorned with blooming flowers travels beneath a vast sky filled with swirling galaxies and cosmic colors—rich purples, blues, and pinks. Ethereal cloud animals with shimmering outlines drift overhead, adding a surreal and mystical touch. Passengers inside the train gaze out in awe, their faces lit by dramatic, celestial light. The atmosphere is otherworldly and enchanting, blending hyperrealism with fantasy to create a vivid, cosmic dreamscape.”

Prompt 7 – 1930s Explorer and Pyramid

“Nistyle, intricate linework with expressive contrasts. A detailed illustration of a 1930s female explorer with blonde hair, wearing a safari hat and jacket, standing in a dusty desert. She looks up at a towering Egyptian pyramid, silhouetted against a glowing sunset. The scene is rendered with soft lighting and dynamic highlights, capturing a sense of quiet awe and adventure. Sparse clouds drift across the sky, enhancing the mood. The composition combines vintage charm with a mystical atmosphere, brought to life through refined textures and dramatic contrasts.”


Open Source Model Summaries

Auraflow

Model Used: Auraflow V0.3

License: Apache 2.0
Best setting tested: Uni_pc / Normal @ 25 steps (2:54)
Settings used:

  • Uni_pc / Normal @ 20 steps – 2:21 (looks great)

  • Euler / Normal @ 10 steps – 1:10 (soft, usable)

  • DPM_adaptive / Normal @ 39 steps – 4:39 (slow but stunning)

Auraflow shines in atmospheric and moody renders. It’s a bit slow and unoptimized, but the quality—especially in architectural and landscape prompts—is worth the wait. Even low step counts produced usable outputs.

Strengths: Dreamy lighting, soft realism
Weaknesses: Speed


Hunyuan DiT

Model Used: Hunyuan DiT v1.2

License: tencent-hunyuan-community
Best setting tested: Euler / Simple @ 30 steps (0:49)
Settings used:

  • Euler / Simple @ 10 steps – 0:16 (very fast, decent results)

  • Heun / Sgm_uniform @ 20 steps – 1:04 (slightly slower, similar output)

Surprisingly fast and occasionally brilliant. Hunyuan’s strength lies in its burst potential—when a prompt clicks, it delivers stunning output. However, it’s inconsistent, sometimes generating messy results.

Strengths: Speed, dynamic range
Weaknesses: Unpredictability


Lumina

Model Used: Lumina Image 2.0

License: Apache 2.0
Best setting tested: res_multistep / Simple @ 36 steps (2:09)
Settings used:

  • res_multistep / Simple @ 25 steps – 1:29

  • dpm++2M / Simple @ 15 steps – 0:52 (visibly better than Euler)

  • euler / Simple @ 10 steps – 0:36

One of the best performers overall. Lumina handles a wide range of visual styles with precision. Even at lower steps, the quality remains high.

Strengths: Balanced realism and style, sharp render
Weaknesses: Needs sampler tuning


FLUX

Model Used: Flux 1D fp16

License: Non-commercial
Best setting tested: Euler / Simple @ 20 steps (2:23)
Settings used:

  • Euler / Simple @ 15 steps – 1:44

  • Euler / Simple @ 30 steps – 3:39

FLUX delivered consistent, stable renders. While not stylistically bold, it was dependable across all prompts—great for baseline comparisons and less stylized concepts.

Strengths: Stability
Weaknesses: No stylistic edge


Stable Diffusion 3.5

Model Used: SD3.5 Medium fp8

License: Stability AI Community
Best setting tested: Euler / Simple @ 20 steps (0:49)
Settings used:

  • Euler / Simple @ 15 steps – 0:31

  • Heun / Simple @ 20 steps – 2:15 (slower but consistently good)

  • Euler / Simple @ 30 steps – 1:10

SD3.5 can produce fantastic results—especially for fantasy and surreal prompts—but it's fussy. Prompt structure and the right sampler matter.

Strengths: High ceiling for quality
Weaknesses: Inconsistent


PixArt

Model Used: PixArt Sigma w Photon Refiner

License: Openrail++
Best setting tested: DPM++2MSDE / Beta @ 14 steps (0:34)
Settings used:

  • DPM++2MSDE / Beta @ 20 steps – 0:46

  • Euler / Beta @ 10 steps – 0:19

  • DPM++2M / Beta @ 14 steps – 0:29

PixArt Sigma is blazing fast and produces detailed, well-lit renders. The refiner adds a nice finishing touch. Even the shortest setting (14 steps) was visually solid.

Strengths: Speed, refinement, polish
Weaknesses: Needs prompt tuning at times


Kwai Kolors

Model Used: KWAI Kolors 1.0 fp16

License: Apache 2.0
Best setting tested: DPM++2MSDE / Karras @ 40 steps (0:56)
Settings used:

  • Uni_pc / Karras @ 20 steps – 0:26

  • DPM++2M / Karras @ 25 steps – 0:31

  • Euler / Karras @ 35 steps – 0:49

Kwai Kolors performs with speed and style. It excelled in colorful, surreal prompts and proved adaptable across many samplers. One of the most consistently great performers. The apache2.0 licence came as a big surprise.

Strengths: Speed, clarity, vivid style, licence
Weaknesses: Needs prompt tuning at times


HiDream

Model Used: HiDream I1 Fast_Q4_0

License: MIT
Best setting tested: Euler / Beta @ 16 steps (2:51)
Settings used:

  • Uni_pc / Normal @ 16 steps – 2:34

  • DPM++2M / Normal @ 16 steps – 2:37

  • Heun / Normal @ 16 steps – 5:07 (slow and underwhelming)

HiDream is slower than most, but its output—especially for ethereal or clean portraits—was surprisingly strong. Great fallback when you want smooth tones and mood over speed.

Strengths: Soft, atmospheric detail
Weaknesses: Slower, locked CFG

 Closed-Source Models Used:

  • Adobe – Firefly V4

  • Canva – Magic Media

  • DeepAI – DeepCore

  • DeviantArt – DreamUp

  • Google – Imagen3

  • LeonardoAI – Phoenix V1.0

  • Meta – Emu3

  • MiniMax AI – Image-01

  • OpenAI – DALL·E 3

  • OpenAI – GPT-4o

  • ReCraft – V3 RAW

  • Reve AI – Reve Image 1.0

  • X – Grok

Models not covered:

Omnigen: Took forever and mostly gave OOM's

Cogview: OOM

Janus Pro: 1B sucked, 7B OOM

Google Gemini Flash: No time (will include at later stage)

 

PS: If anyone wants workflows etc, lemme know and I will upload :)

13