Skip to content
AI Primer
release

ChatGPT Images 2.0 launches gpt-image-2 with 4K API beta and live web data

OpenAI says ChatGPT Images 2.0 adds live web data, self-checks, stronger text rendering, flexible aspect ratios, and up to 4K in API beta. Testers on Hacker News report strong comic continuity but weaker macro product shots than Nano Banana, so teams should compare it against their own visual workload.

4 min read
ChatGPT Images 2.0 launches gpt-image-2 with 4K API beta and live web data
ChatGPT Images 2.0 launches gpt-image-2 with 4K API beta and live web data

TL;DR

  • OpenAI says the OpenAI launch summary turns image generation into a broader production workflow, with live web data, self-checks, stronger text rendering, flexible aspect ratios, and up to 4K output for API beta users in the official announcement.
  • The new model is gpt-image-2, and the API model docs position it as OpenAI's state-of-the-art image generator for both generation and editing, available across ChatGPT, the API, and Codex according to the OpenAI launch summary.
  • Early user reaction is split by workload: the HN discussion highlights describe prompt adherence as roughly neck-and-neck with Google's Nano Banana line, while the main HN thread also surfaces reports of unusually strong multi-panel comic continuity.
  • The same HN thread includes a harder caveat, where the discussion highlights say macro-photography stress tests came out worse than Nano Banana on both the website and API.
  • OpenAI's own image generation guide adds concrete production constraints that matter more than the headline, including 3:1 max aspect ratios, unsupported transparent backgrounds on gpt-image-2, and 4K outputs that the docs label experimental above 2560x1440.

You can read the launch post, skim the model page, and dig into the image generation guide. The HN thread attached to the main discussion is more useful than most launch-day reaction, because it quickly narrows in on comics, prompt-following, and the exact kinds of product-shot failures that break real creative workflows.

Production surfaces

Introducing ChatGPT Images 2.0

OpenAI has launched ChatGPT Images 2.0 (powered by the gpt-image-2 model), a significant update designed to transition image generation from purely creative experimentation to a production-ready visual workflow platform. The model introduces reasoning capabilities, allowing it to integrate live web data, perform self-checks, and handle complex, multi-step tasks such as generating infographics, slides, diagrams, and multilingual text. Key technical improvements include stronger instruction following, enhanced text rendering, flexible aspect ratios, and resolutions up to 2K (or 4K for API beta users). The system is available now across ChatGPT, the API, and Codex, with advanced reasoning modes accessible to Plus, Pro, and Business subscribers.

OpenAI's pitch is straightforward: ChatGPT Images 2.0 is meant for diagrams, infographics, slides, comics, and other text-heavy assets, not just one-off art prompts, according to the OpenAI launch summary and the official post.

The rollout spans three surfaces at once:

  • ChatGPT
  • The API
  • Codex

OpenAI also says advanced reasoning modes are gated to Plus, Pro, and Business users, while the API side exposes the new gpt-image-2 model directly in the model docs.

Web-aware image generation

Discussion around ChatGPT Images 2.0

Thread discussion highlights: - minimaxir on API docs and pricing: Points to the model card and pricing docs, and notes that the submitted page is sparse while the livestream suggests feature parity with Gemini-style image tools. - vunderba on comparative evals: Compares OpenAI’s image model with Google’s on a prompt-adherence benchmark, saying they’re neck-and-neck while Gemini has had better visual fidelity. - neom on hard prompt testing: Shares a detailed macro-photography prompt used to stress image models and says the results on both the website and API were not as good as Nano Banana.

The launch's most interesting feature is not 4K. It is that OpenAI says the model can pull in live web data and run self-checks before producing an image, as described in the OpenAI launch summary.

That shifts the tool toward assets that usually fall apart in older image models:

  • Diagrams that need current facts
  • Slides and infographics with dense text
  • Multilingual visuals
  • Multi-step prompt chains where the model has to verify pieces before rendering

OpenAI has not published a dense public breakdown of how those checks work internally, but the product framing in the official announcement is much closer to a reasoning system with an image output than to a plain text-to-image endpoint.

Comics and continuity

ChatGPT Images 2.0

For AI creatives, the update is that OpenAI is pushing image generation beyond one-off art prompts toward usable creative production: comics, diagrams, infographics, slides, and text-heavy visuals. The discussion focuses on whether the model can reliably follow intricate prompts, preserve continuity across panels, and produce polished results without obvious AI artifacts.

One of the clearest hands-on wins from the main HN thread is comic generation. A top comment linked from the thread reported that gpt-image-2 handled unusual multi-panel prompts well and preserved continuity across panels, which is exactly the failure mode creatives usually hit first with storyboard-style prompting.

That matters because continuity is a harder bar than a single pretty frame. The HN discussion repeatedly centers on whether the model can keep characters, layout logic, and readable text coherent across a sequence, not just produce a strong hero image.

Prompt adherence versus visual fidelity

Discussion around ChatGPT Images 2.0

Thread discussion highlights: - minimaxir on API docs and pricing: Points to the model card and pricing docs, and notes that the submitted page is sparse while the livestream suggests feature parity with Gemini-style image tools. - vunderba on comparative evals: Compares OpenAI’s image model with Google’s on a prompt-adherence benchmark, saying they’re neck-and-neck while Gemini has had better visual fidelity. - neom on hard prompt testing: Shares a detailed macro-photography prompt used to stress image models and says the results on both the website and API were not as good as Nano Banana.

The practitioner read from HN is sharper than the leaderboard talk:

  • One commenter said OpenAI's gpt-image-1.5 and Google's NB2 had been roughly tied on a prompt-adherence benchmark, with both around a 70 percent success rate.
  • The same commenter said Gemini had the edge in visual fidelity.
  • Another commenter said a detailed macro-photography prompt performed worse on OpenAI's website and API than Nano Banana.

Those reports all come via the discussion highlights and the linked Hacker News thread. The useful takeaway from that thread is that OpenAI seems to be pushing hardest on instruction following and structured outputs, while some users still prefer Google's line for photoreal polish under stress.

API limits and 4K caveats

The API docs add the details the launch post mostly glides past. In the image generation guide, OpenAI says gpt-image-2 supports common sizes from 1024 square up to 3840x2160 landscape and 2160x3840 portrait, with both edges required to be multiples of 16 and aspect ratios capped at 3:1.

The same guide says:

  • Transparent backgrounds are not supported on gpt-image-2
  • low, medium, high, and auto quality modes are available
  • Square images generate fastest
  • Outputs above 2560x1440 are considered experimental
  • The model can edit existing images, use reference images, and replace masked regions

That last point is the sleeper feature. Between reference-image generation, masked editing, and high-resolution outputs, gpt-image-2 looks less like a novelty prompt box and more like a general-purpose visual production endpoint with a few very specific, very relevant constraints.

Share on X