Image Generation¶
Diffusion models, image edits, controllable synthesis. The default open pick flipped from SDXL to Flux in 2024. Commercial keeps its lead on prompt adherence.
Generation is a distinct world from perception — different libraries, different compute profiles, different failure modes. Most CV engineers don't need it early; it becomes relevant for content tooling, synthetic data, editing, or generative UX.
Recommended picks¶
| Use case | Pick | When to use |
|---|---|---|
| Commercial flagship | Midjourney v7, Flux 2 Pro, or DALL-E 3 | Marketing / hero images; prompt adherence matters; quality > control |
| Open default (photorealism) | Flux 2 (Black Forest Labs, Nov 2025) | Up to 4MP output, best open photorealism, strong prompt adherence |
| Open default (text-in-image) | Qwen-Image (Alibaba, Apache 2.0) | Best-in-class multilingual text rendering within images. Commercial-friendly license. |
| Stable / mature open | SDXL 1.0 + LoRAs | Massive ecosystem of fine-tunes and community tools; prefer over Flux only if you need an existing LoRA |
| Controllable generation | Flux + ControlNet or SDXL + ControlNet | Pose-controlled, edge-controlled, depth-controlled generation |
| Editing (img2img / inpainting) | Flux Fill, SDXL Inpainting, or a commercial "edit" API | Targeted regions of an existing image |
| Fast / real-time | SDXL Turbo, SD Lightning, or Flux Schnell | Live preview, sketch-to-image, interactive apps |
[!WARNING] Open image-gen weights have messy licenses — check before shipping. - Flux.1 [dev]: FLUX.1 [dev] Non-Commercial License — research/personal only, no commercial use of the weights. - Flux.1 [schnell]: Apache-2.0 — commercial-safe. - Flux 2 Pro / Pro 1.1: commercial API-only (Black Forest Labs hosted). The open Flux 2 [dev] weights inherit the same non-commercial terms as Flux.1 [dev]. - Stable Diffusion 3 / 3.5: Stability Community License — free below $1M annual revenue or non-commercial; paid Enterprise License required above that threshold. - SDXL 1.0: CreativeML OpenRAIL++-M — commercial-safe with use restrictions (no illegal / harmful content, attribution). - SD 1.5: OpenRAIL-M — similar terms to SDXL. - Qwen-Image: Apache-2.0 — commercial-safe.
For commercial deployment with no license friction: Flux.1 [schnell] (Apache-2.0), SDXL 1.0 (OpenRAIL++-M), or Qwen-Image (Apache-2.0). For commercial flagship quality with a clean license: pay for Flux 2 Pro / Midjourney / DALL-E / Imagen API.
Why Flux 2 is the current open leader¶
Flux 2 (Black Forest Labs, November 2025) is the biggest open-model step forward since SDXL. Superior photorealism, better prompt comprehension, up to 4MP output. Best single choice for portrait / product / hero image quality.
Flux family as of 2026: - Flux 2 Pro — API-only commercial, max quality. - Flux.1 [dev] — 12B params, FLUX.1 [dev] Non-Commercial License. Still widely used in research and non-commercial tooling because of the maturing ecosystem. Don't ship this in a commercial product. - Flux.1 [schnell] — Apache-2.0 license, 1-step or 4-step generation. Use this for self-hosted commercial.
For self-hosted commercial: Flux.1 [schnell] (Apache) or Qwen-Image (Apache). For highest-quality self-hosted with non-commercial acceptable: Flux 2 / Flux.1 [dev] weights.
Why Qwen-Image for text-in-image¶
Qwen-Image (Alibaba, Apache 2.0) integrates language and layout reasoning directly into the architecture. Best-in-class multilingual text rendering within images — font consistency, spatial alignment, complex backgrounds. If your use case involves typography in the generated image, Qwen-Image beats Flux and SDXL on that axis. And it's Apache 2, which matters for commercial use.
SDXL — still the mature ecosystem¶
SDXL 1.0 shipped mid-2023 and spawned the largest ecosystem in image generation: hundreds of thousands of LoRAs on Civitai, dozens of ControlNet variants, strong community tooling (ComfyUI, Automatic1111). It's sometimes the right pick because of the ecosystem, not despite age:
- You need a specific style LoRA → probably on SDXL, not Flux.
- You need IP-Adapter for face consistency → mature on SDXL.
- You're using an existing ComfyUI graph → it's SDXL-shaped.
Controllable generation¶
Plain text → image is often not enough for production use. You usually want to constrain output — pose a character, preserve layout, edit one region. That's ControlNet territory.
- ControlNet (Zhang et al., 2023) — the control-by-condition framework. Variants for pose (OpenPose / DWPose input), depth, edges (Canny / SoftEdge), segmentation, scribble, lineart.
- IP-Adapter — image prompting; condition the output on a reference image's style or identity.
- T2I-Adapter — lighter-weight alternative to ControlNet.
Both Flux and SDXL have ControlNet support. SDXL's ecosystem is broader; Flux is catching up.
When to pick something else¶
- Consistent characters across many images → specialized pipelines (LoRA training on the character, or InstantID / PhotoMaker).
- Text rendering in the image (signs, logos) → Flux is the open winner. DALL-E 3 and Midjourney also strong. SDXL is weak here.
- Video generation → different world. See Video Generation dump below. Moving target.
- 3D-aware generation → NeRF / Gaussian Splatting / 3D-native models. Out of scope for this page.
Running Flux / SDXL locally¶
- ComfyUI — node-graph UI. The power-user default. Every ControlNet, LoRA, adapter works here first.
- Automatic1111 WebUI — classic interface. SDXL-focused, less current on Flux.
- Forge / Fooocus / SwarmUI — alternative UIs. Fooocus in particular is "simple mode" for SDXL.
- Diffusers (Hugging Face) — the Python library. Script-first.
- InvokeAI — another UI, artist-friendly.
Commercial APIs¶
- Midjourney — Discord/web interface. Highest aesthetic quality. No API until recently (now limited beta).
- DALL-E 3 (via OpenAI API) — strong prompt adherence, tight content policy.
- Imagen 3 (via Google AI Studio / Vertex) — fast, strong.
- Stable Diffusion 3 Large (via Stability AI API) — the commercial variant of SD3.
- Flux Pro (via Black Forest Labs / replicate) — hosted.
- Ideogram — specialized in text-in-image.
- Leonardo / Stability / Replicate / fal.ai — aggregator/hosting platforms.
Video generation (fast-moving, verify before relying on)¶
Leaderboard as of April 2026 (these rotate every 1–2 months):
Commercial¶
- Kling 3.0 (Kuaishou, Feb 2026) — multi-shot sequences 3–15 seconds, 4K native, subject consistency across shots. Cost-effective (~$0.50/clip).
- Veo 3.1 (Google, Jan 2026) — 4K native, best-in-class lip sync for talking heads.
- Seedance 2.0 (ByteDance, Feb 2026) — first with unified audio-video joint generation, multi-shot storytelling from a single prompt, phoneme-level lip sync in 8+ languages.
- Runway Gen-4 — commercial, broad creative tooling.
- Luma Dream Machine — commercial.
Open¶
- Wan 2.6 (Alibaba) — open-weight, strong, full control over the generation process.
- HunyuanVideo (Tencent) — open weights.
- LTX-Video — open, fast.
- CogVideoX — open.
- AnimateDiff — motion LoRA on top of SD/SDXL. Lower quality than dedicated video models.
Graveyard (video)¶
- Sora (OpenAI) — shut down March 24, 2026, six months after launch. Do not recommend.
- Sora 2 — preceded the shutdown; coherent prompts and camera control were praised.
Video generation is moving faster than image generation and the recommendations here will age inside 3 months. Verify before citing.
The Dump¶
Diffusion model families¶
- Stable Diffusion 1.5 (SD 1.5) — the classic. Still running in many pipelines for backwards compat. License: CreativeML OpenRAIL-M (commercial with use restrictions).
- SDXL 1.0 (Stability AI, 2023) — mature open default. License: CreativeML OpenRAIL++-M (commercial with use restrictions).
- SD 3 / SD 3.5 (Stability AI, 2024) — mixed reception. License: Stability Community License — free below $1M annual revenue; paid Enterprise License above.
- Flux.1 dev / schnell (Black Forest Labs, 2024) — the current open leader. License: Flux.1 [dev] non-commercial; Flux.1 [schnell] Apache-2.0.
- Pixart-α / Pixart-Σ — alternative open model families. AGPL-3.0.
- Kolors (Kuaishou) — strong Chinese-text generation. Apache-2.0.
- Playground v3 — aesthetic-tuned SDXL variant. Playground v2.5 Community License (non-commercial for weights).
- Juggernaut / DreamShaper / RealVis — popular SDXL fine-tunes on Civitai. Varies per author — always check the Civitai model card before commercial use.
Tools¶
- ComfyUI — node-based UI, most flexible. License: GPL-3.0 — if you distribute a product based on it, you must open-source your derivative.
- Automatic1111 — classic WebUI.
- Fooocus / Forge / InvokeAI — alternative UIs.
- Diffusers — Python library.
- OneTrainer / Kohya — LoRA training.
Control / edit¶
- ControlNet variants — Canny, Depth, Pose, Scribble, etc.
- IP-Adapter — image-as-prompt.
- InstantID / PhotoMaker — identity preservation.
- LCM / Hyper-SD / Turbo / Lightning — few-step sampling.
Commercial¶
- Midjourney, DALL-E, Imagen, Flux Pro, Ideogram — see above.
Graveyard¶
- SD 1.5 as a default for new work — retired; use SDXL or Flux unless ecosystem-locked.
- DALL-E 2 — retired by OpenAI.
- Disco Diffusion / VQ-VAE-2 — historical.
- Original ControlNet 1.0 weights — succeeded by 1.1 and SDXL-native variants.
Last reviewed¶
2026-04-22. Generation is second-fastest-moving page after VLM.