releaseApril 21, 2026

OpenAI launches GPT Image 2 with thinking, 2K outputs, and text rendering gains

OpenAI released GPT Image 2 in ChatGPT, Codex, and the API with thinking mode and 2K outputs. Early tests and Arena scores suggest it is usable for slides, UI mockups, and dense infographic layouts.

6 min read

OpenAI launches GPT Image 2 with thinking, 2K outputs, and text rendering gains

TL;DR

OpenAI shipped ChatGPT Images 2.0 across ChatGPT, Codex, and the API, and OpenAI's availability post says the underlying API model is gpt-image-2 while Official announcement frames it as a major upgrade in instruction following, editing, and layout control.
The new wrinkle is reasoning: OpenAI's thinking-mode post says Images 2.0 can search the web, generate multiple distinct images from one prompt, double-check outputs, and create QR codes when a thinking model is selected.
OpenAI is leaning hard into work-like outputs, and OpenAI's precision post, OpenAIDevs on structured image generation, and OpenAI's multilingual demo thread all center the same claim: readable text, diagrams, comics, slides, and UI-heavy layouts are now first-class use cases.
Early external signals point the same way: arena's leaderboard post put GPT-Image-2 at 1512 on Text-to-Image, +242 points over second place, while Ethan Mollick's otter test thread and an early enterprise tester thread both focused on dense text and document-style outputs rather than pretty pictures.
The launch still came with caveats, because Mollick's editing note said edits can get stubborn after a couple rounds, fofrAI's map comparison found a geography prompt where Nano Banana Pro looked more precise, and Adam G's reply clarified that advanced results depend on picking the thinking model.

You can read the official launch post, skim the API image-generation guide, check the Arena leaderboard, and watch the launch livestream. The docs already have a gpt-image-2 gallery, the launch thread includes a dedicated researcher demo set, and the rollout hit third-party surfaces fast, from fal to Replicate.

What shipped

OpenAI's launch materials split the product into two layers. ChatGPT Images 2.0 is the user-facing name, and gpt-image-2 is the API model name.

Day-one details from the launch thread and dev posts:

ChatGPT Images 2.0 is live for ChatGPT and Codex users, per OpenAI's availability post.
Thinking-enabled image generation is available to Plus, Pro, and Business users, with Enterprise marked as coming soon in the same post.
The API model is gpt-image-2, available the same day, according to OpenAI's availability post and OpenAIDevs' API announcement.
Output resolution goes up to 2K, per OpenAI's precision post and the developer guide.
Aspect ratios run from 3:1 to 1:3, according to OpenAI's aspect-ratio post.
OpenAI says the model has an updated knowledge cutoff of December 2025 in its real-world intelligence post.

The naming is already a little messy. As Simon Willison noted, OpenAI's own materials alternate between ChatGPT Images 2.0, image gen 2, and gpt-image-2.

Thinking mode

The most important change is not photorealism. It is that OpenAI is now treating image generation as a reasoning workflow.

According to OpenAI's thinking-mode post, the reasoning path can:

search the web for real-time information
generate multiple distinct images from one prompt
double-check its own outputs
produce functional QR codes

That matches how early testers described it. One early tester said the reasoning model searched the web, used tools, and produced a one-page compliance guide from Texas utilization review law, while the follow-up post claimed the generated legal summary was accurate. Sam Goodside's doily example showed the model spending nearly nine minutes and several iterations to hit a specific 11-fold symmetry target.

OpenAI also appears to be exposing more of the image-generation pipeline than earlier models did. nptacek's screenshot thread surfaced a generated contact sheet and a visible file path for contact_sheet.png, suggesting the system may build reference context internally before finalizing an image.

Text rendering

This launch is really about crossing from image toy to layout engine.

The official claims cluster around the same failure modes older models used to blow up on:

small text
iconography
UI elements
dense compositions
multilingual copy
non-standard aspect ratios

Community tests immediately hit those seams. an early grid test and ProperPrompter's pixel-art inventory both pushed 10 by 10 labeled layouts. Ethan Mollick's first thread said the model had crossed a threshold where it could generate slides, academic papers, and readable fine print in one shot, while his follow-up examples showed multi-page fake books with legible page structure and jokes that survived zooming in.

OpenAI's own demos leaned into the same workloads. the slides and infographics demo, OpenAIDevs on comics and charts, and The Rundown's fake news screenshot test all emphasize that the model can keep layout, typography, and scene logic together in the same frame.

Benchmarks and caveats

The third-party headline number came from Arena. arena's leaderboard post put GPT-Image-2 at 1512 on Text-to-Image, +242 points over the second-place Nano Banana 2, and the same thread reported +125 points on Single-Image Edit and +90 points on Multi-Image Edit.

Arena also broke out category deltas against GPT-Image-1.5. In the category drill-down, text rendering led at +316 points, with big jumps in portraits, cartoon and fantasy, product design, and photoreal imagery too.

The harder question is where it still breaks. Three limits showed up fast:

Editing drift: Mollick's editing note said preserve-and-change edits slow down after a round or two, and restarting in a fresh chat helps.
Precision misses: fofrAI's London map comparison found Nano Banana Pro more geographically accurate on a satellite-style Westminster prompt.
Mode confusion: Adam G's reply said OpenAI tested many models internally and that advanced image generation requires selecting the thinking model, which helps explain why some launch-day comparisons looked inconsistent.

There are also a few hints that OpenAI may be tuning the public experience aggressively. one pre-launch Sudoku claim alleged a capability drop after release, but OpenAI has not publicly corroborated that.

Day-one rollout

The ecosystem rollout was fast enough to be part of the story.

Within hours, GPT Image 2 showed up across a bunch of developer and product surfaces:

fal exposed text-to-image and edit endpoints on day zero, via fal's launch post and its direct links.
Replicate added a hosted model page, according to Replicate's announcement.
Venice pushed the model live, and a later Venice update claimed support up to 4K plus edit-image access.
ComfyUI added GPT Image 2 through Partner Nodes, and its rollout thread highlighted up to 8 consistent images, targeted edits, and aspect ratios from 3:1 to 1:3.
OpenAI's own stack adopted it immediately: one Codex post said the new image model is now the default in Codex, and Peter Gostev's workflow post showed a new loop where GPT Image 2 generates a UI reference and Codex implements against it.
More agent and app surfaces followed the same day, including Hermes Agent, T3 Chat canvas, and Agent-S.

That last pattern may be the real tell. Once an image model becomes a reliable way to generate slides, diagrams, UI comps, charts, and reference designs, it stops behaving like a creative sidecar and starts looking like infrastructure for coding agents and productivity tools.