Skip to content
AI Primer
release

OpenAI launches GPT Image 2 with thinking, 2K outputs, and text rendering gains

OpenAI released GPT Image 2 in ChatGPT, Codex, and the API with thinking mode and 2K outputs. Early tests and Arena scores suggest it is usable for slides, UI mockups, and dense infographic layouts.

6 min read
OpenAI launches GPT Image 2 with thinking, 2K outputs, and text rendering gains
OpenAI launches GPT Image 2 with thinking, 2K outputs, and text rendering gains

TL;DR

You can read the official launch post, skim the API image-generation guide, check the Arena leaderboard, and watch the launch livestream. The docs already have a gpt-image-2 gallery, the launch thread includes a dedicated researcher demo set, and the rollout hit third-party surfaces fast, from fal to Replicate.

What shipped

OpenAI's launch materials split the product into two layers. ChatGPT Images 2.0 is the user-facing name, and gpt-image-2 is the API model name.

Day-one details from the launch thread and dev posts:

The naming is already a little messy. As Simon Willison noted, OpenAI's own materials alternate between ChatGPT Images 2.0, image gen 2, and gpt-image-2.

Thinking mode

The most important change is not photorealism. It is that OpenAI is now treating image generation as a reasoning workflow.

According to OpenAI's thinking-mode post, the reasoning path can:

  • search the web for real-time information
  • generate multiple distinct images from one prompt
  • double-check its own outputs
  • produce functional QR codes

That matches how early testers described it. One early tester said the reasoning model searched the web, used tools, and produced a one-page compliance guide from Texas utilization review law, while the follow-up post claimed the generated legal summary was accurate. Sam Goodside's doily example showed the model spending nearly nine minutes and several iterations to hit a specific 11-fold symmetry target.

OpenAI also appears to be exposing more of the image-generation pipeline than earlier models did. nptacek's screenshot thread surfaced a generated contact sheet and a visible file path for contact_sheet.png, suggesting the system may build reference context internally before finalizing an image.

Text rendering

This launch is really about crossing from image toy to layout engine.

The official claims cluster around the same failure modes older models used to blow up on:

  • small text
  • iconography
  • UI elements
  • dense compositions
  • multilingual copy
  • non-standard aspect ratios

Community tests immediately hit those seams. an early grid test and ProperPrompter's pixel-art inventory both pushed 10 by 10 labeled layouts. Ethan Mollick's first thread said the model had crossed a threshold where it could generate slides, academic papers, and readable fine print in one shot, while his follow-up examples showed multi-page fake books with legible page structure and jokes that survived zooming in.

OpenAI's own demos leaned into the same workloads. the slides and infographics demo, OpenAIDevs on comics and charts, and The Rundown's fake news screenshot test all emphasize that the model can keep layout, typography, and scene logic together in the same frame.

Benchmarks and caveats

The third-party headline number came from Arena. arena's leaderboard post put GPT-Image-2 at 1512 on Text-to-Image, +242 points over the second-place Nano Banana 2, and the same thread reported +125 points on Single-Image Edit and +90 points on Multi-Image Edit.

Arena also broke out category deltas against GPT-Image-1.5. In the category drill-down, text rendering led at +316 points, with big jumps in portraits, cartoon and fantasy, product design, and photoreal imagery too.

The harder question is where it still breaks. Three limits showed up fast:

  • Editing drift: Mollick's editing note said preserve-and-change edits slow down after a round or two, and restarting in a fresh chat helps.
  • Precision misses: fofrAI's London map comparison found Nano Banana Pro more geographically accurate on a satellite-style Westminster prompt.
  • Mode confusion: Adam G's reply said OpenAI tested many models internally and that advanced image generation requires selecting the thinking model, which helps explain why some launch-day comparisons looked inconsistent.

There are also a few hints that OpenAI may be tuning the public experience aggressively. one pre-launch Sudoku claim alleged a capability drop after release, but OpenAI has not publicly corroborated that.

Day-one rollout

The ecosystem rollout was fast enough to be part of the story.

Within hours, GPT Image 2 showed up across a bunch of developer and product surfaces:

That last pattern may be the real tell. Once an image model becomes a reliable way to generate slides, diagrams, UI comps, charts, and reference designs, it stops behaving like a creative sidecar and starts looking like infrastructure for coding agents and productivity tools.

Further reading

Discussion across the web

Where this story is being discussed, in original context.

On X· 6 threads
TL;DR7 posts
What shipped3 posts
Thinking mode4 posts
Text rendering7 posts
Benchmarks and caveats4 posts
Day-one rollout8 posts