releaseJune 3, 2026

Ideogram 4.0 releases as open-weight image model with JSON layout control

Ideogram 4.0 shipped as an open-weight image model with JSON prompting, bounding boxes, stronger text rendering, and native 2048px output. The release targets layout-heavy creative work, and teams can test early fal and Leonardo integrations in production flows.

5 min read

Ideogram 4.0 releases as open-weight image model with JSON layout control

TL;DR

Ideogram 4.0 shipped as an open-weight 9.3B text-to-image model, and the launch repost points to downloadable weights, local inference, and fine-tuning on your own data.
The big workflow change is structured prompting: according to the GitHub repo, Ideogram 4 was trained on JSON captions with optional bounding boxes and color palettes, which gives layout-heavy jobs more deterministic controls than plain-text prompting alone.
Ideogram is leaning hard into design claims, and the Design Arena repost says the model opened at #1 among open-weight image models with a 1285 Elo, while the official technical post adds internal and third-party typography wins.
The release landed straight into production surfaces, with the fal repost announcing day-one availability on fal and the Leonardo integration post calling out 2K output, cleaner text, and stronger composition control inside Leonardo.
One early caveat is that the weights are open but gated and non-commercial on Hugging Face, while the repo keeps the code under Apache-2.0 and the Stable Diffusion workflow post shows local users already poking at how much of the safety stack lives outside the core model.

You can read the official technical post, browse the GitHub repo, and check the gated Hugging Face model cards. There is also day-zero ComfyUI support, plus early rollout posts for fal and Leonardo.

JSON prompts

The repo makes the real pitch clear: plain-text prompts still work, but the model was trained on structured JSON captions, not free-form prose. Ideogram says that schema can specify subject attributes, style, lighting, typography, color palettes, and optional bounding boxes in one prompt object via the prompting guide in the repo.

For creative teams, that matters because layout is finally first-class. Bounding-box coordinates and hex color conditioning are built into the prompt format, so posters, ads, packaging comps, and social graphics do not have to rely on prompt poetry to place text and objects.

Architecture

According to the official technical post and GitHub repo, Ideogram 4 is a 9.3B model trained from scratch, not a fine-tune or distilled checkpoint.

The architecture choices are unusually specific:

A fully single-stream 34-layer DiT processes text and image tokens in one shared sequence.
The text encoder is Qwen3-VL-8B-Instruct, not a text-only encoder.
The DiT consumes hidden states from 13 intermediate Qwen layers, not just one final embedding.
Native output runs from 256 up to 2048 pixels, in multiples of 16, with aspect ratios up to 6:1.

The model cards add an immediate deployment split. The nf4 model card supports CUDA and Diffusers, while the fp8 card is broader on hardware but does not have Diffusers support yet.

Benchmarks

Ideogram's own materials stack three kinds of proof, all linked from the technical post:

Design Arena: Ideogram says 4.0 is the top-ranked open-weight model, and the Design Arena repost gives the opening score as 1285 Elo with 68.7 seconds average generation time.
ContraLabs typography eval: The GitHub README says ten professional designers picked Ideogram 4 first 47.9% of the time, ahead of Gemini 3.1 Flash Image Preview at 30.0%.
Client-work usability: The same ContraLabs panel rated it 3.55 out of 5 on whether they would use it in real client work, above Grok Imagine 1.0 and FLUX.2 [max], per the README benchmark section.
General image arena: the live LMArena text-to-image leaderboard lists Ideogram Open Model at 1204 ±10, ahead of several open peers.

That benchmark mix is unusually on-message for designers. The company is not just claiming prettier samples, it is claiming better typography, better layout control, and higher odds that a designer would actually ship the output.

Day-one surfaces

The release did not stay inside Ideogram's own app for long.

The fal repost announced Ideogram 4.0 live on fal, framing it around realism, text rendering, and artistic generation.
The Leonardo integration post says Leonardo users got 2K images, cleaner text, and stronger composition control on day one.
The ComfyUI announcement says native support landed immediately, with JSON prompting and local workflows as the headline feature.
a beta examples post explicitly called out fine-tuning on your own data as the interesting part of the open-weight release.

That mix is the story for working creators. Ideogram shipped a model, but it also shipped into API, hosted app, node graph, and third-party image platforms in the same news cycle.

Safety pipeline

r/StableDiffusion

On Ideogram 4 safety: Make sure it's not coming from the LLM, I used a local LLM and got 0 rejections on normal prompts

0 comments

The repo buries one important implementation detail in the quick-start docs: prompt and output safety screening runs through Hive, and the default plain-text CLI flow rewrites prompts through a hosted "magic prompt" API before generation, according to the README quick-start section.

That makes the Stable Diffusion workflow post more interesting than a routine subreddit experiment. The user swapped in a local Gemma-4-31B prompt-expansion model, reused Ideogram's open-sourced magic-prompt pattern, and reported zero rejections on ordinary prompts, arguing that some refusal behavior may be coming from the prompt layer rather than the image model itself.

There is also a licensing wrinkle here. The code repo is Apache-2.0, but the model weights on Hugging Face are gated under an Ideogram 4 Non-Commercial license, per the model card. Open-weight, yes, but not open-season.