Skip to content
AI Primer
breaking

Engineers compare DeepSeek V4, GPT-5.5, and Claude Opus on 1M context and token spend

Fresh discussions compared DeepSeek V4, GPT-5.5, and Claude Opus 4.7/4.8 on real coding tasks. Teams should weigh 1M context, faster modes, rate limits, tool use, and token inflation before switching models.

6 min read
Engineers compare DeepSeek V4, GPT-5.5, and Claude Opus on 1M context and token spend
Engineers compare DeepSeek V4, GPT-5.5, and Claude Opus on 1M context and token spend

TL;DR

DeepSeek's official V4 announcement linked straight to the tech report and open weights. Anthropic's Opus 4.7 post buried the token inflation caveat in the migration details, then Opus 4.8 turned around six weeks later with fast mode cuts and dynamic workflows. OpenAI's GPT-5.5 announcement got an April 24 API update, while the Codex changelog and Codex pricing page filled in the operational details engineers were actually asking about.

DeepSeek V4

DeepSeek Announces V4 Preview Release with 1M Context Window

DeepSeek has released the preview version of its V4 model family, introducing two Mixture-of-Experts (MoE) variants: DeepSeek-V4-Pro (1.6T total/49B active parameters) and DeepSeek-V4-Flash (284B total/13B active parameters). Both models feature a 1 million-token context window, enabled by a new attention mechanism involving token-wise compression and DeepSeek Sparse Attention (DSA). The models are available immediately via API and at chat.deepseek.com. Legacy model aliases 'deepseek-chat' and 'deepseek-reasoner' will be retired on July 24, 2026, with developers advised to transition to the new model IDs.

Discussion around DeepSeek v4

Thread discussion highlights: - rvz on architecture and paper reading: The key point to look at is the residual design / manifold-constrained hyper-connections (mHC) that make this efficiently trainable, especially with the hybrid attention mechanism. - XCSme on benchmark skepticism and rate limits: The blog post shows really good results, but third-party benchmarks suggest it’s not really SOTA... V4-Pro is heavily rate-limited and gives a lot of timeout errors when I try to test it. - cmitsakis on low-cost flash variant in a benchmark: deepseek-v4-flash was better than qwen3.5-27b... and roughly equal to gemini-3-flash-preview, but deepseek-v4-flash had the lowest cost of all of them by far.

The headline spec is simple: the official release introduced deepseek-v4-pro, a 1.6T-parameter MoE with 49B active parameters, and deepseek-v4-flash, a 284B MoE with 13B active parameters. Both ship with a 1M-token context window and open weights.

The interesting part is how people split the models. According to the main HN thread, V4 Pro drew attention for long-horizon agentic coding, while the discussion digest highlighted a different angle: Flash landed near Gemini 3 Flash Preview in one shared benchmark and got called out as the cheapest option in that comparison.

One top HN commenter, quoted in the discussion digest, pointed readers to the residual design and manifold-constrained hyper-connections in the report, plus the hybrid attention stack. That lines up with DeepSeek's own framing in the tech report, which treats token-wise compression and DeepSeek Sparse Attention as the mechanism that makes the 1M window usable instead of just marketable.

Tokenizer and effort

Introducing Claude Opus 4.7

Anthropic has released Claude Opus 4.7, its latest generally available model, featuring improved performance in software engineering, complex task execution, and instruction following compared to Opus 4.6. The model is available across all Claude products, the Claude API, Amazon Bedrock, Google Cloud’s Vertex AI, and Microsoft Foundry. Pricing remains identical to the previous version at $5 per million input tokens and $25 per million output tokens. Users should note two key changes affecting token usage: the model utilizes an updated tokenizer that may map inputs to 1.0–1.35x more tokens, and it produces more output tokens in agentic settings due to increased "thinking" at higher effort levels. Detailed evaluations are provided in the associated System Card.

Discussion around Claude Opus 4.7

Thread discussion highlights: - jimmypk on tokenizer overhead and effort defaults: The default effort change in Claude Code is worth knowing before your next session: it's now `xhigh` ... Combined with the 1.0–1.35× tokenizer overhead ... actual token spend per agentic session will likely exceed naive estimates. - simonw on adaptive thinking and output format changes: I'm finding the "adaptive thinking" thing very confusing ... Also notable: 4.7 now defaults to NOT including a human-readable reasoning token summary in the output. - gertlabs on early benchmark impressions: Opus 4.7 is more strategic, more intelligent, and has a higher intelligence floor than 4.6 or 4.5 ... in agentic sessions with tools, it IS the best, as advertised.

Opus 4.7 looked like a quiet version bump until people read the fine print. Anthropic's launch post kept pricing at $5 per million input tokens and $25 per million output tokens, but also said the updated tokenizer may map the same input to 1.0 to 1.35x more tokens.

The other buried change was effort. Anthropic added a new xhigh setting, and the company said Claude Code now defaults to xhigh for all plans. In the HN discussion digest, commenters connected those two knobs immediately: more tokens in, more thinking out, flatter price sheet, less predictable spend per agentic session.

The workflow changes went beyond billing. In the HN discussion digest, Simon Willison called adaptive thinking confusing and noted that 4.7 stopped returning the human-readable reasoning token summary by default. Another commenter in the main HN thread said the model felt materially better at instruction following, which helps explain why the migration conversation centered on prompt and harness behavior instead of raw benchmark charts.

Dynamic workflows

Anthropic Announces Claude Opus 4.8 with Enhanced Benchmarks and New Workflow Capabilities

Anthropic has released Claude Opus 4.8, an upgrade to its predecessor, Opus 4.7, featuring improved benchmark performance and collaboration capabilities. New features include effort control in claude.ai and Cowork, allowing users to adjust model output depth versus speed and rate limit consumption. Additionally, Claude Code now supports dynamic workflows, enabling the execution of hundreds of parallel subagents for large-scale tasks like codebase migrations. Fast mode for Opus 4.8 operates at 2.5 times the speed and is three times cheaper than previous versions. Pricing remains unchanged from Opus 4.7, and the model is accessible via the Claude API as claude-opus-4-8.

Discussion around Claude Opus 4.8

Thread discussion highlights: - senko on coding benchmark: Used an RTS game-in-one-file prompt as a frontier-model coding benchmark and says Claude Code with Opus 4.8 in ultracode mode “nailed it,” the best result so far. - jkxyz on creative generation: Reports that the model’s crossword layout generation is the first time it has done a good job on placement, showing a concrete improvement on a creative smoke test. - simonw on workflow/API change: Highlights the newly documented mid-conversation system messages behavior and notes it matters for prompt caching and agentic loops, but may break existing abstraction layers.

Opus 4.8 kept the same list price and moved the conversation back to workflow surface area. Anthropic's announcement added three concrete knobs:

  • effort control in claude.ai and Cowork
  • fast mode at 2.5x speed, now three times cheaper than earlier Opus fast modes
  • dynamic workflows in Claude Code, where Claude can plan work and run hundreds of parallel subagents in one session

That last feature is Christmas come early for coding agent nerds. The main HN thread framed the practical signal around parallel subagents, not just better benchmark bars, and the discussion digest paired it with another change that matters more to framework authors than to end users: mid-conversation system messages are now documented behavior.

According to the HN discussion digest, Simon Willison flagged those system messages as relevant to prompt caching and agentic loops, with a warning that existing abstraction layers may break. That is a very specific kind of launch surprise, the kind that does not show up in a model card headline.

GPT-5.5

Introducing GPT-5.5

OpenAI introduced GPT-5.5 on April 23, 2026, as a highly capable model designed to handle complex, real-world tasks including coding, research, and data analysis. It is built to better understand complex goals, utilize tools effectively, perform self-corrections, and execute tasks with minimal guidance. Released alongside a Pro version, the model underwent extensive safety evaluations, including red-teaming for cybersecurity and biology risks, and was deployed with robust safeguards. GPT-5.5 is available across various OpenAI consumer and enterprise platforms, with API access provided following the initial launch.

Discussion around GPT-5.5

Thread discussion highlights: - minimaxir on infra/performance optimization: “To better utilize GPUs, Codex analyzed weeks’ worth of production traffic patterns and wrote custom heuristic algorithms to optimally partition and balance work… increasing token generation speeds by over 20%.” - 6thbit on benchmark comparisons: Posts a comparison table versus “Mythos” and says GPT-5.5 is “Still far from Mythos on SWE-bench but quite comparable otherwise.” - simonw on API access: Says GPT-5.5 “doesn't have API access yet,” but claims OpenAI is effectively exposing it through a Codex backdoor and a plugin for LLM.

OpenAI pitched GPT-5.5 as a model for complex real-world work: coding, research, analysis, tool use, and self-correction over long tasks. In the official post, the company said GPT-5.5 matches GPT-5.4 on per-token serving latency while using fewer tokens to complete the same Codex tasks.

The rollout details changed fast. The launch post initially framed API access as separate from the first ChatGPT and Codex rollout, then added an April 24 update saying GPT-5.5 and GPT-5.5 Pro were available in the API. The API model page lists GPT-5.5 at a 272K context length, not 1M, which makes it the obvious outlier in this comparison.

The HN thread went straight at the operational story. In the discussion digest, one commenter quoted OpenAI on Codex backend tuning that improved token generation speed by more than 20 percent, while another said GPT-5.5 was still well behind Anthropic's strongest coding model on SWE-Bench even if it looked competitive elsewhere. A third, again in the discussion digest, focused on API timing and the weird period where Codex exposure seemed ahead of the public API story.

Access and limits

DeepSeek v4

DeepSeek v4 matters as an API and open-weight model release: the big engineering signals are the 1M-token context, MoE variants with different cost/latency tradeoffs, and the practical question of whether Flash or Pro is better for coding and agentic workflows. The discussion also flags deployment realities like rate limits, host reliability, and the paper’s sandbox/execution stack.

GPT-5.5

GPT-5.5 is being discussed as an agentic coding/model-ops upgrade: better tool use, stronger task completion, faster inference/throughput tuning, and the practical constraints around API access and pricing. The useful signal for engineers is how OpenAI positions the model for real workloads and how commenters compare it to Anthropic on coding and benchmark performance.

Across all three vendors, the most useful differences were operational:

  • DeepSeek used the V4 launch to retire legacy aliases deepseek-chat and deepseek-reasoner on July 24, 2026, per the official release.
  • Anthropic made Opus 4.7 and 4.8 available across Claude products, API, Bedrock, Vertex AI, and Microsoft Foundry in the 4.7 post and 4.8 post.
  • OpenAI rolled GPT-5.5 through ChatGPT, Codex, and then the API, while the Codex pricing page framed usage in credits and plan multipliers rather than a simple model quota.
  • In DeepSeek's HN discussion digest, commenters complained that V4 Pro was heavily rate-limited and prone to timeouts during testing.
  • In the GPT-5.5 HN core thread, commenters pointed to tighter Codex limits and higher effective cost around GPT-5.5 than earlier versions.
  • In the DeepSeek V4 HN core thread, one hands-on coding report said V4 "went hard" on a gnarly refactor and left a significantly nicer codebase, while the Opus 4.8 HN core thread surfaced a separate report that Opus 4.8 in ultracode mode "nailed" a one-file RTS benchmark.

That leaves a clean split. DeepSeek won the open-weight, 1M-context argument. Anthropic kept stacking workflow features around coding agents, then made token spend harder to eyeball. OpenAI sold GPT-5.5 as the agentic workhorse, but the engineering chatter kept gravitating back to access paths, quotas, and whether the model actually closes the coding gap.

Further reading

Discussion across the web

Where this story is being discussed, in original context.

Share on X