Skip to content
AI Primer
update

Opus 4.7 users report 1.47x token overhead and web-search refusals two days after launch

Users and analysts say Opus 4.7 is using more tokens, refusing web search, and missing orchestration steps in Claude Code-style workflows. Watch token costs and regression reports closely if you rely on xhigh defaults or tokenizer-sensitive prompts.

7 min read
Opus 4.7 users report 1.47x token overhead and web-search refusals two days after launch
Opus 4.7 users report 1.47x token overhead and web-search refusals two days after launch

TL;DR

You can read Anthropic's launch post, skim Simon Willison's system-prompt diff, inspect the independent tokenizer measurement writeup, and watch the main Hacker News thread turn from benchmark talk into operational complaints.

Tokenizer overhead

Anthropic buried the most practical caveat in the migration notes: Opus 4.7 keeps the same sticker price as 4.6, but the same prompt can map to more tokens. madiator's migration-guide screenshot quotes the official guidance at roughly 1.0 to 1.35x more input tokens, and mattpocockuk's launch screenshot shows Claude Code simultaneously moving its default effort to xhigh.

Independent measurements came in higher on real text. daniel_mac8's tokenizer thread cites a writeup that found 1.47x on some samples, while badlogicgames' README token count shows a concrete jump from 1,091 input tokens on Opus 4.6 to 1,454 on 4.7 for the same README.

A lot of the follow-on complaints make more sense in that context:

The other immediate complaint was that 4.7 got more willing to stop, pause, or refuse in places users did not expect. MatthewBerman's web-search refusal post says Claude flatly refused to do web searches, and GergelyOrosz's workflow complaint says a behavior that used to trigger web search now often ended in refusal instead.

That bled into safe prompts. LechMazur's paused puzzle chat and LechMazur's NYT Connections example both show NYT Connections style prompts getting paused by Opus 4.7's safety filters, with the product suggesting a retry on Sonnet 4.

The benchmark fallout was ugly too. LechMazur's benchmark post says Opus 4.7 scored 41.0 on the Extended NYT Connections benchmark in high reasoning mode and 15.3 with no reasoning, with refusals counted as failures. LechMazur's creative-writing refusal example adds that 13 percent of creative-writing requests were also refused in his testing.

The official line moved quickly. alexalbert__'s bug-fix note said many day-one bugs were fixed within a day, and emollick's follow-up reported adaptive thinking was already triggering more often and doing more web search after the patch, though he also said rough edges remained.

Claude Code orchestration

For coding agents, the sharpest criticism was not raw code quality. It was orchestration. pvncher's MCP tool-call report says 4.7 was the first Claude model he had seen make incorrect MCP tool calls, and pvncher's RepoPrompt release note says he added explicit support for older Opus versions because he was reverting orchestration work back to 4.6.

The same theme shows up across other hands-on reports:

There were positive long-run reports too. nummanali's long session screenshot shows a 2 hour plus Claude Code run on a ratatui overhaul, and bridgemindai's Max-plan post says the model's coding output justified buying two Max subscriptions. But even the fans kept mentioning token burn.

Adaptive thinking and tone shifts

A lot of the weirdness lines up with Anthropic's new control surface. The launch notes in Claude Code rollout thread describe Opus 4.7 as more agentic and better on long-running work, while Simon Willison's prompt diff shows the chat system prompt also changed meaningfully between 4.6 and 4.7.

Willison's diff highlights three changes that match what users noticed in practice:

  • More tool use: 4.7 now prefers trying tool discovery before declaring a capability unavailable.
  • More completion pressure: the prompt pushes Claude to make a reasonable attempt and finish tasks instead of interviewing the user first.
  • More direct style: 4.7 is tuned for shorter, less validation-heavy responses.

That last point was visible almost immediately. dbreunig's "Good push" post joked that Anthropic had replaced "You're absolutely right" with "Good push," and nrehiew_'s migration screenshot points to official migration guidance saying 4.7 is more literal and more direct than 4.6.

The interface choices made that shift harder to steer. Yuchenj_UW's Claude web screenshot shows Claude web offering only Adaptive or non-thinking on Opus 4.7, and emollick's adaptive-thinking thread says that can leave non-coding tasks under-thought because there is no manual override like ChatGPT's reasoning toggle.

Mixed evals, mixed mood

The public eval picture is messy enough that no single chart settles it. natolambert's internal eval chart points to Anthropic's own agentic coding curve, where 4.7 beats 4.6 across effort levels but spends more tokens doing it.

Third-party charts pulled in both directions:

Practitioner reports looked just as split. jeremyphoward's early praise said 4.7 was the first model that really "gets" his work, while kimmonismus' regression post, GergelyOrosz's pushback report, and nummanali's instruction-follower complaint all said they were reverting to 4.6 for some tasks.

Fast patches and quiet product changes

Anthropic spent the next 48 hours patching around the launch. alexalbert__'s bug-fix note acknowledged that many bugs people hit on day one were already fixed, and the ClaudeDevs bug-fix repost says one of those bugs was Opus 4.7 falsely flagging normal code edits as possible malware.

Claude Code also shipped a same-day CLI fix. ClaudeCodeLog's 2.1.112 post and the linked Claude Code changelog say version 2.1.112 removed the spurious "claude-opus-4-7 is temporarily unavailable" alert in auto mode.

One of the stranger footnotes is that users were already passing around interface workarounds as if this were a mature migration. nummanali's thinking-display fix points to a GitHub issue and a CLI flag, claude --thinking-display summarized, to restore thinking summaries, while the linked GitHub issue(https://github.com/anthropics/claude-code/issues/49322) says the VS Code extension was not rendering those summaries correctly on 4.7. Two days in, Opus 4.7 looked less like a clean version bump than a live-fire harness update shipped in public.

Further reading

Discussion across the web

Where this story is being discussed, in original context.

On X· 7 threads
TL;DR6 posts
Tokenizer overhead5 posts
Claude Code orchestration5 posts
Adaptive thinking and tone shifts5 posts
Mixed evals, mixed mood8 posts
Fast patches and quiet product changes3 posts