Skip to content
AI Primer
release

Claude Opus 4.8 ships with 69.2% SWE-Bench Pro and 2.5x Fast mode

Anthropic released Claude Opus 4.8 across Claude, the API, and major clouds with higher coding scores and a cheaper 2.5x-speed Fast mode. Use it for coding workloads that want better benchmark performance without a price increase over 4.7.

8 min read
Claude Opus 4.8 ships with 69.2% SWE-Bench Pro and 2.5x Fast mode
Claude Opus 4.8 ships with 69.2% SWE-Bench Pro and 2.5x Fast mode

TL;DR

Anthropic's own launch copy called Opus 4.8 a "modest but tangible improvement", which is unusually blunt for a flagship model release. You can read Simon Willison's notes, compare third-party scores on Artificial Analysis, and trace the same-day Claude Code rollout in the official changelog.

What shipped

Anthropic framed Opus 4.8 as a sharper Opus 4.7, not a reset. The launch post says it improves judgment, honesty about progress, and long independent runs, and Boris Cherny, Claude Code lead at Anthropic, summarized the model the same way: stronger coding plus fewer premature claims of success.

The concrete ship list was broader than the benchmark table:

Benchmarks that moved

First-party

Third-party evaluators

Customer-reported

Artificial Analysis added the most useful qualifier. Its follow-up post says Opus 4.8 used 15% fewer turns and 35% fewer output tokens than 4.7 on GDPval-AA, but still needed about 30% more turns than GPT-5.5.

Where it regressed

Not every evaluator saw a clean win.

Those misses line up with Anthropic's own modest framing in the launch post. Simon Willison quoted the same line because it was rare to see a frontier release pitched as incremental instead of epochal in his notes.

Fast mode

Fast mode was the easiest launch detail to miss and probably the most operationally relevant one.

The change is simple:

That price cut changed the launch conversation fast. teortaxesTex called it a new point on the Pareto frontier, while nummanali still complained that Fast burns through Claude credits quickly in practice.

Dynamic Workflows

Claude Code got its own parallel story. Anthropic's release copy, as quoted by Yuchenj_UW, says Dynamic Workflows can plan a job, run hundreds of subagents in one session, then verify the result before returning control.

The same-day Claude Code 2.1.154 changelog added more detail:

Hands-on posts show the feature getting marketed as "ultracode." daniel_mac8 described Claude autonomously writing an orchestration script and spawning an agent swarm, while WesRoth used it to build a town simulation with an autonomous economy and traffic system.

Vibe Check

Early hands-on reports clustered around two themes: longer autonomous runs and less fake confidence.

  • Ethan Mollick said Opus 4.8 drafted an academic paper from hundreds of de-identified research files, then corrected a major issue after GPT-5.5 reviewed it.
  • In a follow-up, Mollick's second post said the model handled hypothesis formation, data cleaning, reference research, robustness checks, and LaTeX-style drafting in one run.
  • Jeremy Howard said 4.8 felt more cooperative than 4.7 and less over-agentic, stopping for input where 4.7 and GPT-5.5 would push ahead.
  • theo said the new ultracode path could exhaust a $100 Claude tier in a single prompt.
  • benhylak highlighted how long and unusual Opus 4.8 reasoning traces looked, and haider1 argued the model appeared to spend more tokens than prior Opus versions at similar effort levels.

That matches the split in the benchmark chatter. Artificial Analysis saw efficiency gains versus 4.7 on some workloads, while sarahwooders relaying Charles Packer said it was still far less token-efficient than GPT-5.5.

Where it shows up

The ecosystem rollout was immediate and broad.

  • OpenRouter shipped both standard and Fast mode on day one, per OpenRouter's availability post.
  • Vercel added the anthropic/claude-opus-4.8 slug to AI Gateway, according to vercel_dev's AI Gateway post and the Vercel changelog.
  • Perplexity made it available to Max subscribers on Perplexity and Computer, per perplexity_ai.
  • Cognition added it to Windsurf and Devin CLI, according to cognition.
  • v0 exposed Opus 4.8 on launch day, per v0.
  • GitHub rolled it out to @code, Copilot CLI, and Copilot app developers, according to pierceboggan.
  • Letta Code added support and reported better token efficiency with comparable context management to 4.7, per Letta_AI.
  • Box said Opus 4.8 would roll out shortly to Box AI agents after internal testing, per Aaron Levie's Box testing thread.

This was also a classic inference-layer land grab. testingcatalog flagged same-day availability on AI/ML API, rork pushed it into Rork, and charmcli said Crush got it without a client update.

Claude Code hotfix

Six hours after the big Claude Code rollout, Anthropic pushed 2.1.156 to fix an Opus 4.8-specific bug.

According to ClaudeCodeLog's 2.1.156 changelog post, the issue modified thinking blocks and caused API errors. The official changelog entry confirms the same fix, which is a useful reminder that this launch was not just a model swap. It changed the surrounding harness fast enough to need a same-night patch.

Further reading

Discussion across the web

Where this story is being discussed, in original context.

On X· 8 threads
TL;DR3 posts
What shipped4 posts
Benchmarks that moved7 posts
Where it regressed3 posts
Fast mode2 posts
Dynamic Workflows3 posts
Vibe Check5 posts
Where it shows up9 posts
Share on X