Ask HN reports image models can replace product photos and infographic work
An Ask HN thread on GenAI ‘oh shit’ moments includes creators saying recent image models are the first tools good enough for e-commerce visuals and other layout-heavy outputs. The examples are anecdotal, but they match current tests around comics, floor plans, text rendering, and structured image prompts.

TL;DR
- In a high-signal Ask HN thread, creators and builders described recent image models as the first ones good enough for specific production jobs, especially e-commerce product visuals and infographic-style assets.
- That anecdotal shift lines up with what the ChatGPT Images 2.0 discussion and OpenAI's launch post emphasized: comics, floor plans, dense layouts, multilingual text, and more reliable instruction following.
- Simon Willison's early prompt test added a useful sanity check, because he used
gpt-image-2on a crowded "Where's Waldo"-style brief instead of a polished demo prompt. - OpenAI pitched
gpt-image-2as a model for "production workflows" in the developer announcement, with API access on day one and token-based pricing instead of a flat per-image menu.
You can read the full Ask HN thread, browse OpenAI's consumer launch post, and check the developer announcement that framed the model around readable, on-brand, localized assets. Simon Willison's raccoon test is a nice counterweight to launch marketing, because it asks for a cluttered scene with one specific object to find.
Production threshold
Ask HN: What was your "oh shit" moment with GenAI?
For creatives, the most relevant signal is that image models are crossing from novelty into usable production output, especially for e-commerce/product photography and infographic-style visuals. Several commenters frame their “oh shit” moment as the first time AI produced assets good enough to replace a photographer or designer for a specific use case.
The interesting part of the Ask HN thread is not that people think image models are better. It is that several commenters described a threshold crossing, the first time outputs looked usable enough to replace a paid step in a workflow for a narrow job.
That maps cleanly to OpenAI's own framing in its launch post, which pushed the model toward deliverables instead of moodboards: comics, infographics, floor plans, product photos, and multi-image consistency.
Structured layouts
ChatGPT Images 2.0
For creatives, the thread is about a noticeably more capable image generator for making comics, infographics, floor plans, and other composition-heavy assets. Commenters focused on visual quality, text rendering, and whether the outputs are convincing enough to raise watermarking and provenance concerns.
According to the HN discussion around ChatGPT Images 2.0, commenters fixated on three practical changes:
- text rendering that stays legible enough for menus, labels, and infographic blocks
- composition-heavy outputs such as comics, maps, floor plans, and slides
- prompt adherence on dense briefs, where older models usually dropped constraints
OpenAI made the same bet in the developer announcement, calling out better layouts, stronger editing, improved text rendering, and assets formatted for apps, ads, presentations, docs, and product flows.
Crowded prompt tests
Simon Willison's write-up matters because it uses a deliberately annoying prompt, "Do a where's Waldo style image but it's where is the raccoon holding a ham radio," rather than a vendor-chosen showcase. In the same HN thread, commenters also pointed to heuristic-heavy prompts that mix layout, counting, and domain knowledge, which is where prompt following usually falls apart first.
That is the overlap with the Ask HN anecdotes. The reports are still anecdotal, but they are anecdotal in the same direction: models are getting good at constrained, composition-heavy image tasks that used to collapse into gibberish.
API pricing
The last practical detail is that OpenAI shipped gpt-image-2 straight into the API and Codex in its developer post, with token pricing of $8 per million image input tokens, $2 cached image input, $30 image output, $5 text input, and $10 text output. In the HN launch thread, commenters compared those rates with the prior model and noted that low-quality 1024×1024 generations got cheaper while high-quality ones got more expensive.
That pricing split is its own signal. The current wave is not just about prettier samples, it is about whether structured image work is now reliable enough that teams will pay production prices for it.