releaseJune 7, 2026

ChatGPT Images 2.0 adds multilingual text and floor-plan output in creator tests

Hacker News testers used ChatGPT Images 2.0 on comics, infographics, floor plans, and prompt-heavy benchmark grids, while commenters said low-end API pricing is still inexpensive. Try it for layout-heavy production work such as e-commerce assets and structured visuals.

4 min read

ChatGPT Images 2.0 adds multilingual text and floor-plan output in creator tests

TL;DR

OpenAI says ChatGPT Images 2.0 is built for composition-heavy jobs like comics, infographics, and floor plans, which matches the core evidence in the main HN thread and OpenAI's launch summary in the evidence pool.
The biggest practical upgrade is text and layout control: OpenAI's launch summary says the model renders non-Latin scripts and keeps characters or objects consistent across multiple images, while the system card adds that thinking mode can use web search and generate multiple images from one prompt.
Developers got the model on day one as gpt-image-2, and the HN discussion roundup zeroed in on pricing that stayed cheap at the low end even as high-quality renders got pricier.
Early testers used it on prompt-heavy edge cases, including Simon Willison's raccoon-in-a-crowd prompt that the HN discussion roundup surfaced as a quick stress test for dense compositions and object placement.

OpenAI's own materials are worth skimming here: the launch post frames comics, infographics, and floor plans as the headline use cases, the developer announcement pitches gpt-image-2 as a production model for accurate, localized, on-brand assets, and the prompting guide quietly adds the practical bit: low quality is the latency play, while medium and high are for fidelity. Meanwhile Simon Willison's raccoon test and the main HN thread show where people immediately pushed it, crowded scenes, readable details, and prompts that used to fall apart.

Comics, infographics, floor plans

ChatGPT Images 2.0

For creatives, the thread is about a noticeably more capable image generator for making comics, infographics, floor plans, and other composition-heavy assets. Commenters focused on visual quality, text rendering, and whether the outputs are convincing enough to raise watermarking and provenance concerns.

OpenAI's launch framing is unusually specific for an image model. Instead of generic "better visuals" language, the company pointed at multi-paneled comics, infographic layouts, and floor plans in both the launch post and the developer announcement.

That lines up with what made the HN thread interesting. As the main thread summary puts it, commenters treated the release as a jump in composition-heavy output, not just prettier single images.

Multilingual text and multi-image consistency

Introducing ChatGPT Images 2.0

OpenAI has launched ChatGPT Images 2.0, an updated image generation model featuring advanced "thinking capabilities" that allow for web searching, improved instruction following, and the creation of complex assets like multi-paneled comic strips, infographics, and floor plans. Key technical enhancements include significantly improved text rendering—including non-Latin scripts such as Hindi, Japanese, and Korean—and the ability to maintain character and object continuity across multiple generated images. The model is available to all ChatGPT and Codex users, with advanced "thinking" features and additional capabilities provided for paid tiers. Developers also have access to the model via the gpt-image-2 API.

The most creator-relevant claim is not raw realism. It is readable text inside the image.

According to OpenAI's launch summary in the evidence pool, ChatGPT Images 2.0 improves text rendering across scripts including Hindi, Japanese, and Korean, and it can keep characters or objects consistent across multiple generated images. The system card adds the mechanism behind that pitch: thinking mode can reason through the prompt, call web search, and generate multiple images from a single prompt.

The developer announcement is even blunter about the intended workload. It calls out assets that need to be accurate, readable, localized, formatted for the destination surface, and usable without heavy cleanup.

Pricing and production settings

Discussion around ChatGPT Images 2.0

Thread discussion highlights: - minimaxir on API model card and pricing: Model card for the API endpoint gpt-image-2... API Pricing is mostly unchanged from gpt-image-1.5... The submitted page is annoyingly uninformative, but from the livestream it proports the same exact features as Gemini's Nano Bana… - ea016 on price comparison: GPT Image 2 ... Low: 1024×1024 $0.006 ... High: 1024×1024 $0.211 ... GPT Image 1 ... Low: 1024×1024 $0.011 ... High: 1024×1024 $0.167 - simonw on developer prompt testing: I've been trying out the new model like this: ... `uv run ... openai_image.py -m gpt-image-2` ... 'Do a where's Waldo style image but it's where is the raccoon holding a ham radio'

The pricing chatter was more nuanced than "same price" or "price hike." In the HN discussion roundup, one commenter said API pricing looked mostly unchanged from the previous model, while another broke out the per-image estimates at 1024 by 1024: low quality moved from about $0.011 to $0.006, and high quality moved from about $0.167 to $0.211.

The official docs frame cost control around settings rather than a flat image fee. The prompting guide says gpt-image-2 is the default recommendation for new builds, with low, medium, and high quality tiers, and notes that low is especially strong for latency-sensitive use cases.

That split matters because OpenAI is pitching the model for production surfaces, not just showcase prompts. Cheap low-end renders and more expensive high-fidelity passes fit that story better than a single headline number.

The first production-use reactions

Ask HN: What was your "oh shit" moment with GenAI?

For creatives, the most relevant signal is that image models are crossing from novelty into usable production output, especially for e-commerce/product photography and infographic-style visuals. Several commenters frame their “oh shit” moment as the first time AI produced assets good enough to replace a photographer or designer for a specific use case.

Discussion around Ask HN: What was your "oh shit" moment with GenAI?

Thread discussion highlights: - dang on debugging and workflow leverage: Watching it do log file analysis in seconds that would have taken me hours... Helping me with optimizations... Tracking down bugs in code... Finding information that I had been unable to find using Google searches. - bluejay2387 on coding agents and self-built tooling: I had a locally hosted model write its own semantic search system that indexed 250,000 documentation and code files... and then write a fully functioning mod... This freaked me out enough that I then had it write a CLI based activity and TODO tracker and then integrate that tool into its coding process. - rerdavies on domain-specific reasoning: I provided a reference to a The Spice Manual 2nd ed. a page number and an equation number, and asked Claude to implement it... It proceeded to implement not only the equation, but the calculation of the Langrangian...

A separate HN thread from June pushed the story past launch-week novelty. As the Ask HN summary notes, commenters described image models crossing into replace-a-real-step territory, especially for e-commerce product photography and infographic-style visuals.

The interesting part is how narrow those claims were. People were not claiming image models had solved all design work. They were naming specific asset classes where the output was suddenly usable enough to change the workflow.

Prompt torture tests

Discussion around ChatGPT Images 2.0

The fastest way people checked the upgrade was by throwing structured prompts at it. The HN discussion roundup captured Simon Willison's "where's the raccoon holding a ham radio" test, a crowded-scene prompt that forces the model to manage object placement and detail density at the same time.

That same roundup also pointed to minimaxir's benchmark prompt, an 8 by 8 grid of Pokémon whose National Pokédex numbers match the first 64 prime numbers. That is a good example of where the release stands out: the conversation around this model kept drifting toward grids, labels, panels, and layout rules, which is exactly the kind of work older image demos usually avoided.