updateJune 11, 2026

Engineers report GPT-5.5, Opus 4.7, and DeepSeek V4 cost spikes and API errors

Fresh threads compared three coding-model stacks on operating cost: Opus 4.7 raised spend with xhigh defaults, DeepSeek V4-Flash beat V4-Pro on cost and stability, and GPT-5.5 drew scrutiny on access and limits. Use these tests to judge cost per task and API behavior before switching stacks.

6 min read

Engineers report GPT-5.5, Opus 4.7, and DeepSeek V4 cost spikes and API errors

TL;DR

Anthropic's release summary kept Claude Opus 4.7 list pricing flat at $5 per million input tokens and $25 per million output tokens, but the HN discussion summary flagged two immediate spend multipliers: the new tokenizer can map the same prompt to 1.0 to 1.35 times more input tokens, and Claude Code now defaults to xhigh effort.
DeepSeek's release summary shipped V4 as two open-weight API models with 1M context, but the HN discussion summary is where the operational split shows up: commenters reported V4-Flash as the cheaper, steadier option for simple agent work, while V4-Pro drew timeout complaints and at least one report that reasoning_content was needed to avoid API errors.
OpenAI's release summary said GPT-5.5 hit ChatGPT and Codex first, then API access a day later, while the HN discussion summary focused on the less glamorous questions: whether access was really open, how tight the limits were, and whether token-efficiency claims outweighed a higher $5 and $30 API rate card.
The strongest cross-model pattern in the DeepSeek HN thread, the Opus 4.7 HN thread, and the GPT-5.5 HN thread is operational rather than benchmark-driven. Engineers kept circling back to cost per task, serving stability, and what happens after long tool-using sessions, not launch-chart scores.
Simon Willison's article summary and the related HN discussion summary added a month-later datapoint: coding agents are burning enough tokens that $100 subscriptions can mask four-figure API-equivalent usage, which helps explain why small tokenizer or default-effort changes landed as a real story.

You can read Anthropic's Opus 4.7 launch post and Claude Code tuning note, DeepSeek's V4 preview docs, OpenAI's GPT-5.5 launch post and API pricing page, plus Simon Willison's coding-agent spend write-up. The weirdly important details are buried in the migration notes and comment threads: Anthropic raised Claude Code's default effort to xhigh, DeepSeek pitched Flash as good enough for simple agent tasks, and OpenAI's API story changed within a day.

Claude Opus 4.7

Anthropic's headline was straightforward: the launch post says Opus 4.7 is generally available across Claude, the API, Bedrock, Vertex AI, and Microsoft Foundry at the same $5 and $25 price as Opus 4.6, with a new xhigh effort tier and stronger instruction following.

Anthropic Announces General Availability of Claude Opus 4.7

Anthropic has released Claude Opus 4.7, which offers improved performance in advanced software engineering and complex, long-running agentic tasks compared to Opus 4.6. The model features enhanced instruction following and a self-verification capability. It introduces a new effort level, 'xhigh', providing users with finer control over the reasoning-latency tradeoff. Opus 4.7 is available across all Claude products and major API platforms, with pricing consistent with the previous version ($5/million input tokens and $25/million output tokens). Users should note that an updated tokenizer may increase input token usage by 1.0–1.35x, and higher effort levels may increase output token consumption.

Discussion around Claude Opus 4.7

Thread discussion highlights: - jimmypk on token cost and default effort: The default effort change in Claude Code is worth knowing before your next session: it's now `xhigh` ... Combined with the 1.0–1.35× tokenizer overhead ... actual token spend per agentic session will likely exceed naive estimates from 4.6 baselines. - simonw on adaptive thinking UI changes: I'm finding the "adaptive thinking" thing very confusing... Also notable: 4.7 now defaults to NOT including a human-readable reasoning token summary in the output, you have to add "display": "summarized" to get that. - sallymander on stricter refusals in Claude Code: It seems a little more fussy than Opus 4.6 so far. It actually refuses to do a task from Claude's own Agentic SDK quick start guide... "I can analyze and describe the bugs ... but I will not apply fixes to `utils.py`."

The buried caveat is token economics. Anthropic's own Claude Code best-practices post says the updated tokenizer and a tendency to think more at higher effort levels, especially later in long sessions, can materially increase usage.

That lines up with the HN discussion summary, where commenters highlighted three concrete changes:

Claude Code now defaults to xhigh effort.
The new tokenizer can raise input-token counts by 1.0 to 1.35 times for the same text.
Adaptive-thinking output no longer shows a human-readable summary unless "display": "summarized" is set.

The thread also surfaced a regression that matters more than the benchmark deltas for agent users. According to the HN discussion summary, one early tester hit stricter refusals inside Claude Code, including a refusal to apply fixes from Anthropic's own Agentic SDK quick start flow.

DeepSeek V4 Flash and Pro

DeepSeek's preview release shipped two open-weight models on the same day: deepseek-v4-pro, a 1.6T total and 49B active MoE, and deepseek-v4-flash, a 284B total and 13B active model. Both expose a 1M-token context window and OpenAI-compatible API access.

DeepSeek-V4 Preview Release Announcement (April 24, 2026)

On April 24, 2026, DeepSeek announced the preview release of DeepSeek-V4, featuring two open-weight models: DeepSeek-V4-Pro (1.6T total/49B active parameters) and DeepSeek-V4-Flash (284B total/13B active parameters). Both models introduce a 1M-token context window standard enabled by novel attention mechanisms, including token-wise compression and DeepSeek Sparse Attention (DSA). The API is immediately available for developers by updating model parameters, with support for both Thinking and Non-Thinking modes. DeepSeek also announced that legacy model endpoints deepseek-chat and deepseek-reasoner will be retired on July 24, 2026.

Discussion around DeepSeek v4

Thread discussion highlights: - rvz on architecture and training: Focuses on the paper’s architectural changes, especially manifold-constrained hyper-connections and hybrid attention, and advises waiting for independent tests before trusting the benchmarks. - XCSme on third-party benchmarking and serving limits: Says the model looks weaker than the blog post suggests, notes it may be below some other frontier models in third-party benchmarks, and points out rate limits / timeout errors on V4-Pro. - cmitsakis on practical cost/performance: Reports a customer-support benchmark where V4-Flash was competitive with other models and much cheaper, while V4-Pro performed worse and required `reasoning_content` to avoid API errors.

The interesting part is that DeepSeek's own docs already hint at the split the HN thread later emphasized. The release page calls Flash the "fast, efficient, and economical choice" and says it performs on par with Pro on simple agent tasks, while positioning Pro as the stronger agentic-coding model.

Commenters in the HN discussion summary turned that marketing split into an operations story:

One top comment focused on architectural changes, especially manifold-constrained hyper-connections and hybrid attention, but warned against trusting the launch benchmarks before independent tests.
Another said V4-Pro looked weaker than the blog-post framing in third-party evals and reported rate-limit or timeout problems.
A third reported a customer-support benchmark where V4-Flash was competitive and much cheaper, while V4-Pro performed worse and needed reasoning_content to avoid API errors.

That is a nice reality check on the 1M-context pitch. Flash looked like the practical path for teams that mainly care about simple agent loops and price, while Pro drew the harder questions about serving behavior.

GPT-5.5 Access and price

OpenAI's launch post initially framed GPT-5.5 as a ChatGPT and Codex rollout, with API availability coming "very soon." The page was later updated to say GPT-5.5 and GPT-5.5 Pro were available in the API on April 24.

OpenAI Announces Release of GPT-5.5

OpenAI has introduced GPT-5.5, described as its most intuitive model to date, designed to enhance computer-based workflows. The model features strengthened safety measures developed through rigorous testing, including red-teaming and feedback from nearly 200 early-access partners. As of April 2026, GPT-5.5 and GPT-5.5 Pro are available in ChatGPT for Plus, Pro, Business, and Enterprise users, as well as via the API. The release includes updated system cards documenting new safeguards for advanced cybersecurity and biological research.

Discussion around GPT-5.5

Thread discussion highlights: - minimaxir on Token efficiency and throughput: "Codex analyzed weeks’ worth of production traffic patterns and wrote custom heuristic algorithms to optimally partition and balance work... increasing token generation speeds by over 20%." - simonw on API access and Codex backdoor: "This doesn't have API access yet, but OpenAI seem to approve of the Codex API backdoor... And that backdoor API has GPT-5.5." - 6thbit on Benchmark comparison with Anthropic: Compares GPT-5.5 to Anthropic's Mythos on SWE-bench Pro, Terminal-bench, GPQA Diamond, HLE, BrowseComp, and OSWorld-Verified, concluding it is "quite comparable otherwise."

The official pitch pairs higher intelligence with better efficiency. OpenAI says GPT-5.5 matches GPT-5.4's per-token latency in real serving and uses fewer tokens to finish the same tasks. The API pricing page still moved the sticker price up to $5 per million input tokens and $30 per million output tokens.

That gap between per-token price and per-task efficiency is exactly what the HN discussion summary wrestled with. The most cited discussion points were:

OpenAI and Codex claimed custom traffic-partitioning heuristics increased token generation speed by more than 20 percent.
Commenters questioned whether GPT-5.5 was truly available through the normal API surface or mainly reachable through Codex paths at first.
Others compared the model to Anthropic on SWE-bench Pro, Terminal-Bench, GPQA Diamond, HLE, BrowseComp, and OSWorld-Verified, and came away with a read that it was competitive rather than obviously dominant.
Pricing and usage limits kept coming up alongside the performance claims.

The result is a familiar token-tax argument, just with a different shape from Opus 4.7. Anthropic held the list price flat while changing tokenizer and default effort. OpenAI doubled the standard rate versus GPT-5.4 on paper, then argued that throughput and task efficiency would close the gap.

Token spend in the wild

A month later, Simon Willison's product-market-fit post put rough dollar figures on what heavy coding-agent use looks like outside a launch week. He estimated that his previous 30 days of use would have cost $1,199.79 through Anthropic's API for Claude Code and $980.37 through OpenAI's API for Codex, versus the $100 and $100 subscription plans he actually paid for.

I think Anthropic and OpenAI have found product-market fit

Simon Willison posits that OpenAI and Anthropic have finally achieved true product-market fit, evidenced by the rising costs companies are facing from internal LLM usage and the aggressive adoption of coding/general-purpose agent products like Claude Code/Cowork and Codex. While ChatGPT's earlier viral success demonstrated consumer demand, Willison argues that the current convergence of coding agents and enterprise-scale pricing marks a new inflection point where these companies are generating significant revenue, potentially sufficient to cover their operational costs.

Discussion around I think Anthropic and OpenAI have found product-market fit

Thread discussion highlights: - simonw on real-world token spend on coding agents: Simonw posts usage and cost numbers for Claude Code and OpenAI Codex, then says: "Given the code I've been able to build in the past month I genuinely do think I got value for the API price version." - trjordan on scale of required revenue: The argument here is that the labs need roughly "$1t+ per year in spending" and that even 20% productivity gains may not justify the required token spend. - binary0010 on open-source pressure: This comment asks how OpenAI and Anthropic plan to keep customers if models like GLM-5.1 are "just as good and open source and a lot cheaper."

Those numbers do not settle which stack is cheapest. They do explain why engineers reacted so strongly to apparently small levers like tokenizer remaps, effort defaults, and tighter limits.

The HN discussion around Willison's thread summary added two more useful pressures on the story:

Some commenters argued current revenue still looks tiny against the infrastructure bill implied by frontier-model spending.
Others pointed to open-source competition, including cheaper alternatives, as the margin compression waiting on the other side of this coding-agent boom.

That makes the April threads read less like isolated launch gripes and more like an early operating manual for coding models. The benchmark tables got the headlines, but the details engineers kept bookmarking were spend curves, timeout behavior, and whether the model still behaves after a long run with tools.