Skip to content
AI Primer
release

ChatGPT Images 2.0 launches gpt-image-2 with API access and 2K output

OpenAI rolled out ChatGPT Images 2.0 across ChatGPT, API, and Codex with stronger instruction following, text rendering, and 2K output; 4K stays API beta only. Try it on layout-heavy prompts and multi-panel continuity, where HN tests saw the clearest gains.

3 min read
ChatGPT Images 2.0 launches gpt-image-2 with API access and 2K output
ChatGPT Images 2.0 launches gpt-image-2 with API access and 2K output

TL;DR

  • OpenAI rolled out ChatGPT Images 2.0 across ChatGPT, the API, and Codex, with gpt-image-2 positioned as the new production image model, according to the launch summary and the developer announcement.
  • The headline upgrades are stronger instruction following, better text rendering, more aspect ratios, and support for structured outputs like diagrams, infographics, comics, and multilingual text, per the launch summary and OpenAI's prompting guide.
  • OpenAI's public product post says ChatGPT can use reasoning, live web data, and self-checks for multi-step image tasks, while the API materials frame gpt-image-2 as the default model for new image builds the launch summary.
  • The reliable ceiling looks lower than the launch hype suggests: OpenAI advertised up to 2K broadly, while its prompting guide calls anything above 2560x1440 experimental, and the HN thread separately notes 4K was beta-only in the API framing.
  • Early HN testing was mixed. The discussion summary said prompt adherence looked competitive with Google's model, macro-photography stress tests lagged Nano Banana, and multi-panel comic continuity looked unusually strong.

You can read the official launch post, check the API model page, and skim OpenAI's prompting guide for the size constraints the announcement mostly skips. The HN thread is also worth a pass because the best comments quickly converged on the real eval axes: layout fidelity, text rendering, hard prompt adherence, and continuity across panels.

What shipped

Introducing ChatGPT Images 2.0

OpenAI has launched ChatGPT Images 2.0 (powered by the gpt-image-2 model), a significant update designed to transition image generation from purely creative experimentation to a production-ready visual workflow platform. The model introduces reasoning capabilities, allowing it to integrate live web data, perform self-checks, and handle complex, multi-step tasks such as generating infographics, slides, diagrams, and multilingual text. Key technical improvements include stronger instruction following, enhanced text rendering, flexible aspect ratios, and resolutions up to 2K (or 4K for API beta users). The system is available now across ChatGPT, the API, and Codex, with advanced reasoning modes accessible to Plus, Pro, and Business subscribers.

OpenAI framed ChatGPT Images 2.0 as a shift from toy image generation toward production visual workflows. The official post says ChatGPT can plan multi-step image requests, pull live web data, and run self-checks before returning outputs like slides, diagrams, and infographics.

The developer-facing rollout is more concrete in the community announcement:

  • gpt-image-2 is available now in the API and Codex.
  • OpenAI markets it for stronger editing, layout control, and instruction following.
  • Structured visual work is a first-class use case: diagrams, charts, posters, comics, and multilingual text.
  • Standard output goes up to 2K across more aspect ratios.

API and output limits

ChatGPT Images 2.0

For AI engineers, the key signal is that OpenAI is positioning `gpt-image-2` as a production-capable image model with an API, documented pricing, and stronger instruction-following for structured visual outputs. The discussion includes practical comparisons and stress tests that hint at real evaluation criteria: layout fidelity, text rendering, complex scene consistency, and whether the model is actually better than alternatives on hard prompts.

The API model page positions gpt-image-2 as OpenAI's state-of-the-art image generator and exposes a dated snapshot, gpt-image-2-2026-04-21, for version pinning. The developer announcement also publishes token pricing: image input at $8 per 1M tokens, cached image input at $2, and image output at $30.

OpenAI's prompting guide adds the caveats buried beneath the launch copy:

  • outputQuality supports low, medium, and high.
  • Max edge is under 3840 px.
  • Both edges must be multiples of 16.
  • Aspect ratio cannot exceed 3:1.
  • Total pixels must stay between 655,360 and 8,294,400.
  • Output above 2560x1440 is experimental and more variable.

That makes the 2K story straightforward, but the 4K story much squishier than the headline summary implied.

HN stress tests

Discussion around ChatGPT Images 2.0

Thread discussion highlights: - minimaxir on API docs and pricing: Points to the model card and pricing docs, and notes that the submitted page is sparse while the livestream suggests feature parity with Gemini-style image tools. - vunderba on comparative evals: Compares OpenAI’s image model with Google’s on a prompt-adherence benchmark, saying they’re neck-and-neck while Gemini has had better visual fidelity. - neom on hard prompt testing: Shares a detailed macro-photography prompt used to stress image models and says the results on both the website and API were not as good as Nano Banana.

According to the discussion summary, the most useful early comparisons were not beauty shots. They were prompts that break image models in specific ways.

  • One HN commenter said prompt-adherence results were roughly neck-and-neck with Google's model, though Gemini still looked stronger on raw visual fidelity.
  • Another used a detailed macro-photography prompt as a stress test and reported weaker results than Nano Banana on both the website and API.
  • A separate HN test found unusually good multi-panel comic generation, especially continuity across panels, which is exactly the sort of structured output OpenAI is pushing in its launch materials.

That last result is probably the clearest sign of what OpenAI optimized for here: not prettier single images, but images that obey fussy instructions without falling apart halfway through.

Share on X