Image Generation¶

Diffusion models, image edits, controllable synthesis. The default open pick flipped from SDXL to Flux in 2024. Commercial keeps its lead on prompt adherence.

Generation is a distinct world from perception — different libraries, different compute profiles, different failure modes. Most CV engineers don't need it early; it becomes relevant for content tooling, synthetic data, editing, or generative UX.

Recommended picks¶

Use case	Pick	When to use
Commercial flagship	Midjourney v7, Flux 2 Pro, or DALL-E 3	Marketing / hero images; prompt adherence matters; quality > control
Open default (photorealism)	Flux 2 (Black Forest Labs, Nov 2025)	Up to 4MP output, best open photorealism, strong prompt adherence
Open default (text-in-image)	Qwen-Image (Alibaba, Apache 2.0)	Best-in-class multilingual text rendering within images. Commercial-friendly license.
Stable / mature open	SDXL 1.0 + LoRAs	Massive ecosystem of fine-tunes and community tools; prefer over Flux only if you need an existing LoRA
Controllable generation	Flux + ControlNet or SDXL + ControlNet	Pose-controlled, edge-controlled, depth-controlled generation
Editing (img2img / inpainting)	Flux Fill, SDXL Inpainting, or a commercial "edit" API	Targeted regions of an existing image
Fast / real-time	SDXL Turbo, SD Lightning, or Flux Schnell	Live preview, sketch-to-image, interactive apps

[!WARNING] Open image-gen weights have messy licenses — check before shipping. - Flux.1 [dev]: FLUX.1 [dev] Non-Commercial License — research/personal only, no commercial use of the weights. - Flux.1 [schnell]: Apache-2.0 — commercial-safe. - Flux 2 Pro / Pro 1.1: commercial API-only (Black Forest Labs hosted). The open Flux 2 [dev] weights inherit the same non-commercial terms as Flux.1 [dev]. - Stable Diffusion 3 / 3.5: Stability Community License — free below $1M annual revenue or non-commercial; paid Enterprise License required above that threshold. - SDXL 1.0: CreativeML OpenRAIL++-M — commercial-safe with use restrictions (no illegal / harmful content, attribution). - SD 1.5: OpenRAIL-M — similar terms to SDXL. - Qwen-Image: Apache-2.0 — commercial-safe.

For commercial deployment with no license friction: Flux.1 [schnell] (Apache-2.0), SDXL 1.0 (OpenRAIL++-M), or Qwen-Image (Apache-2.0). For commercial flagship quality with a clean license: pay for Flux 2 Pro / Midjourney / DALL-E / Imagen API.

Why Flux 2 is the current open leader¶

Flux 2 (Black Forest Labs, November 2025) is the biggest open-model step forward since SDXL. Superior photorealism, better prompt comprehension, up to 4MP output. Best single choice for portrait / product / hero image quality.

Flux family as of 2026: - Flux 2 Pro — API-only commercial, max quality. - Flux.1 [dev] — 12B params, FLUX.1 [dev] Non-Commercial License. Still widely used in research and non-commercial tooling because of the maturing ecosystem. Don't ship this in a commercial product. - Flux.1 [schnell] — Apache-2.0 license, 1-step or 4-step generation. Use this for self-hosted commercial.

For self-hosted commercial: Flux.1 [schnell] (Apache) or Qwen-Image (Apache). For highest-quality self-hosted with non-commercial acceptable: Flux 2 / Flux.1 [dev] weights.

Why Qwen-Image for text-in-image¶

Qwen-Image (Alibaba, Apache 2.0) integrates language and layout reasoning directly into the architecture. Best-in-class multilingual text rendering within images — font consistency, spatial alignment, complex backgrounds. If your use case involves typography in the generated image, Qwen-Image beats Flux and SDXL on that axis. And it's Apache 2, which matters for commercial use.

SDXL — still the mature ecosystem¶

SDXL 1.0 shipped mid-2023 and spawned the largest ecosystem in image generation: hundreds of thousands of LoRAs on Civitai, dozens of ControlNet variants, strong community tooling (ComfyUI, Automatic1111). It's sometimes the right pick because of the ecosystem, not despite age:

You need a specific style LoRA → probably on SDXL, not Flux.
You need IP-Adapter for face consistency → mature on SDXL.
You're using an existing ComfyUI graph → it's SDXL-shaped.

Controllable generation¶

Plain text → image is often not enough for production use. You usually want to constrain output — pose a character, preserve layout, edit one region. That's ControlNet territory.

ControlNet (Zhang et al., 2023) — the control-by-condition framework. Variants for pose (OpenPose / DWPose input), depth, edges (Canny / SoftEdge), segmentation, scribble, lineart.
IP-Adapter — image prompting; condition the output on a reference image's style or identity.
T2I-Adapter — lighter-weight alternative to ControlNet.

Both Flux and SDXL have ControlNet support. SDXL's ecosystem is broader; Flux is catching up.

When to pick something else¶

Consistent characters across many images → specialized pipelines (LoRA training on the character, or InstantID / PhotoMaker).
Text rendering in the image (signs, logos) → Flux is the open winner. DALL-E 3 and Midjourney also strong. SDXL is weak here.
Video generation → different world. See Video Generation dump below. Moving target.
3D-aware generation → NeRF / Gaussian Splatting / 3D-native models. Out of scope for this page.

Running Flux / SDXL locally¶

ComfyUI — node-graph UI. The power-user default. Every ControlNet, LoRA, adapter works here first.
Automatic1111 WebUI — classic interface. SDXL-focused, less current on Flux.
Forge / Fooocus / SwarmUI — alternative UIs. Fooocus in particular is "simple mode" for SDXL.
Diffusers (Hugging Face) — the Python library. Script-first.
InvokeAI — another UI, artist-friendly.

Commercial APIs¶

Midjourney — Discord/web interface. Highest aesthetic quality. No API until recently (now limited beta).
DALL-E 3 (via OpenAI API) — strong prompt adherence, tight content policy.
Imagen 3 (via Google AI Studio / Vertex) — fast, strong.
Stable Diffusion 3 Large (via Stability AI API) — the commercial variant of SD3.
Flux Pro (via Black Forest Labs / replicate) — hosted.
Ideogram — specialized in text-in-image.
Leonardo / Stability / Replicate / fal.ai — aggregator/hosting platforms.

Video generation (fast-moving, verify before relying on)¶

Leaderboard as of April 2026 (these rotate every 1–2 months):

Commercial¶

Kling 3.0 (Kuaishou, Feb 2026) — multi-shot sequences 3–15 seconds, 4K native, subject consistency across shots. Cost-effective (~$0.50/clip).
Veo 3.1 (Google, Jan 2026) — 4K native, best-in-class lip sync for talking heads.
Seedance 2.0 (ByteDance, Feb 2026) — first with unified audio-video joint generation, multi-shot storytelling from a single prompt, phoneme-level lip sync in 8+ languages.
Runway Gen-4 — commercial, broad creative tooling.
Luma Dream Machine — commercial.

Open¶

Wan 2.6 (Alibaba) — open-weight, strong, full control over the generation process.
HunyuanVideo (Tencent) — open weights.
LTX-Video — open, fast.
CogVideoX — open.
AnimateDiff — motion LoRA on top of SD/SDXL. Lower quality than dedicated video models.

Graveyard (video)¶

Sora (OpenAI) — shut down March 24, 2026, six months after launch. Do not recommend.
Sora 2 — preceded the shutdown; coherent prompts and camera control were praised.

Video generation is moving faster than image generation and the recommendations here will age inside 3 months. Verify before citing.

The Dump¶

Diffusion model families¶

Stable Diffusion 1.5 (SD 1.5) — the classic. Still running in many pipelines for backwards compat. License: CreativeML OpenRAIL-M (commercial with use restrictions).
SDXL 1.0 (Stability AI, 2023) — mature open default. License: CreativeML OpenRAIL++-M (commercial with use restrictions).
SD 3 / SD 3.5 (Stability AI, 2024) — mixed reception. License: Stability Community License — free below $1M annual revenue; paid Enterprise License above.
Flux.1 dev / schnell (Black Forest Labs, 2024) — the current open leader. License: Flux.1 [dev] non-commercial; Flux.1 [schnell] Apache-2.0.
Pixart-α / Pixart-Σ — alternative open model families. AGPL-3.0.
Kolors (Kuaishou) — strong Chinese-text generation. Apache-2.0.
Playground v3 — aesthetic-tuned SDXL variant. Playground v2.5 Community License (non-commercial for weights).
Juggernaut / DreamShaper / RealVis — popular SDXL fine-tunes on Civitai. Varies per author — always check the Civitai model card before commercial use.

Tools¶

ComfyUI — node-based UI, most flexible. License: GPL-3.0 — if you distribute a product based on it, you must open-source your derivative.
Automatic1111 — classic WebUI.
Fooocus / Forge / InvokeAI — alternative UIs.
Diffusers — Python library.
OneTrainer / Kohya — LoRA training.

Control / edit¶

ControlNet variants — Canny, Depth, Pose, Scribble, etc.
IP-Adapter — image-as-prompt.
InstantID / PhotoMaker — identity preservation.
LCM / Hyper-SD / Turbo / Lightning — few-step sampling.

Commercial¶

Midjourney, DALL-E, Imagen, Flux Pro, Ideogram — see above.

Graveyard¶

SD 1.5 as a default for new work — retired; use SDXL or Flux unless ecosystem-locked.
DALL-E 2 — retired by OpenAI.
Disco Diffusion / VQ-VAE-2 — historical.
Original ControlNet 1.0 weights — succeeded by 1.1 and SDXL-native variants.

Last reviewed¶

2026-04-22. Generation is second-fastest-moving page after VLM.