Skip to content
AI Primer
release

Anthropic releases Claude Opus 4.8 with Claude Code dynamic workflows

Anthropic rolled out Claude Opus 4.8 across Claude Code, the API, and partner surfaces, plus a research-preview workflow mode that coordinates large subagent fleets. It keeps 4.7 pricing, but early tests suggest workflow runs can burn very large token budgets, so teams should watch usage closely.

7 min read
Anthropic releases Claude Opus 4.8 with Claude Code dynamic workflows
Anthropic releases Claude Opus 4.8 with Claude Code dynamic workflows

TL;DR

Anthropic paired the dynamic workflows blog post with workflows docs, linked a refreshed prompting guide, and pushed a one-command claude-api migration flow for people updating model strings. The oddest detail in the launch-day chatter was stevibe's cutoff test, which pegged Opus 4.8's knowledge cutoff in mid-October 2025, later than 4.7's late-July cutoff.

What shipped

Anthropic split the release into a model upgrade and a new orchestration mode.

  • Opus 4.8 is live in Claude Code, per ClaudeDevs' launch thread.
  • The built-in claude-api skill now supports /claude-api migrate, which updates model strings and suggests prompt changes tuned for 4.8, according to ClaudeDevs' migration note.
  • Dynamic workflows shipped in research preview, and ClaudeDevs' feature post says users can trigger it by using the word "workflow" in a prompt.
  • Availability spans Max, Team, Enterprise, and API access, including Bedrock, Vertex AI, and Foundry, per ClaudeDevs' availability note.
  • For Claude Code accounts, ClaudeDevs' rollout note says the feature is on by default for Max and Team, while Enterprise admins must opt in.

Benchmarks and honesty

Anthropic's own launch message centered on two deltas: coding went up, and self-reporting got less slippery.

  • SWE-bench Pro: 64.3 to 69.2, a +4.9 point gain, per Boris Cherny's launch post.
  • Cursor said its internal CursorBench showed Opus 4.8 working "much more efficiently" than 4.7 and described it as more persistent on harder tasks, according to Cursor's day-one post.
  • Dan Shipper's early test thread claimed Opus 4.8 beat GPT-5.5 by 1 point on Every's Senior Engineer benchmark, 63 to 62, and outscored GPT-5.5 by 6 points on Every's writing benchmark.

The more interesting launch-day tell was the framing. aakashgupta's post highlighted that Anthropic's own chart bolded GPT-5.5's 78.2% over Opus 4.8's 74.6% on a terminal-coding row, then argued that the product pitch was trust, not scoreboard maximalism. That lines up with Cat Wu's summary, which called 4.8 the recommended daily model in Claude Code because it flags what it does not know and surfaces problems in its own code.

Dynamic workflows

The new workflow mode moves Claude Code from single-run coding toward multi-stage orchestration.

The official examples skew toward work that used to be too broad for a single agent pass. Cat Wu's Bun example says Jarred Sumner used dynamic workflows to port Bun from Zig to Rust across roughly 750,000 lines, reaching 99.8% test-suite pass rate in 11 days, while Cat Wu's internal example says the same system processed hundreds of A/B flags in parallel in under 10 minutes.

Vibe Check

The day-one hands-on reports were unusually specific about where 4.8 feels different.

  • Writing: Dan Shipper's test thread said Opus 4.8 produced fewer "AI-isms" and handled voice matching better than prior Claude runs in Every's usage.
  • Reasoning controls: the same Every test thread said coding and writing quality moved around noticeably by effort level, with xhigh looking best for code and high looking best for writing.
  • UX tone: trq212's first impression described the model as warm and collaborative, not just high-scoring.
  • Persistence: Cursor's launch note emphasized harder-task persistence, which is one of the failure modes people actually notice in daily agentic use.
  • Cutoff recency: stevibe's modelclock test estimated a mid-October 2025 knowledge cutoff, later than Opus 4.7's late-July cutoff.

One negative showed up inside the praise. Dan Shipper's thread said Codex still had the stronger harness versus Claude Desktop, even while he rated the model itself much higher.

Token burn and effort levels

Dynamic workflows look expensive by design.

  • Boris Cherny's workflow note explicitly called the feature token-intensive and framed it for migrations, refactors, performance optimization, and batch bug-fix runs.
  • Noah Zweben's repost of Pawel Huryn's test cited a workflow run that used 253,000 tokens in 24.5 seconds across 8 agents.
  • minchoi's demo post said users can surface the mode by switching to Opus 4.8, setting effort to "ultracode," and using "workflow" in the prompt.
  • LLMJunky's complaint points to adjacent pricing friction around fast mode credits, which became part of the same launch-day cost conversation.

That makes the new effort controls part of the story, not a side setting. Dan Shipper's benchmark notes and Dan Shipper's repost about writing effort levels both reported that output quality shifted materially with reasoning level, so the practical unit is no longer just model choice. It is model plus effort plus whether a task is large enough to justify orchestration.

Where it shows up

Opus 4.8 landed across Anthropic's own surfaces and partner products on day one.

The interesting split is that the model rollout was broad immediately, while dynamic workflows stayed narrower. ClaudeDevs' docs link places workflows inside Claude Code rather than partner IDEs, so the day-one ecosystem story was mostly "new model everywhere, new orchestration mode mostly at home."

Claude Code's harness got tuned the day before

One day before the 4.8 launch, Anthropic posted a separate Claude Code update about the harness itself.

That timing helps explain why some launch-day reactions, like Dan Shipper's harness comment, separated the model from the product wrapper around it. Anthropic did not just ship a smarter Opus. It also spent the preceding day tightening the tool people were about to run it in.

Further reading

Discussion across the web

Where this story is being discussed, in original context.

On X· 8 threads
TL;DR2 posts
What shipped1 post
Benchmarks and honesty2 posts
Dynamic workflows5 posts
Vibe Check4 posts
Token burn and effort levels4 posts
Where it shows up2 posts
Claude Code's harness got tuned the day before2 posts
Share on X