Opus 4.7 users report 1.47x token overhead and web-search refusals two days after launch
Users and analysts say Opus 4.7 is using more tokens, refusing web search, and missing orchestration steps in Claude Code-style workflows. Watch token costs and regression reports closely if you rely on xhigh defaults or tokenizer-sensitive prompts.

TL;DR
- Anthropic shipped Opus 4.7 at the same list price as 4.6, but Anthropic's migration guide screenshot says the new tokenizer can turn the same input into 1.0 to 1.35x more tokens, while daniel_mac8's tokenizer thread points to an independent measurement as high as 1.47x.
- In Claude Code, Claude Code rollout thread says the default effort level moved to xhigh, and bcherny's rate-limit update says Anthropic raised subscriber limits to offset the extra thinking tokens.
- The loudest regressions are not benchmark charts but workflow breakage: MatthewBerman's web-search refusal post, pvncher's MCP tool-call report, and badlogicgames' AGENTS.md complaint all describe normal coding flows going sideways.
- The rollout also surfaced safety false positives, with LechMazur's paused puzzle chat and LechMazur's NYT Connections example showing harmless prompts getting blocked, while alexalbert__'s bug-fix note said many day-one bugs were already fixed.
- Hands-on reports split hard: jeremyphoward's early praise and bridgemindai's Max-plan post called 4.7 the best model they'd used, while kimmonismus' regression post and GergelyOrosz's pushback report said they went back to 4.6.
You can read Anthropic's launch post, skim Simon Willison's system-prompt diff, inspect the independent tokenizer measurement writeup, and watch the main Hacker News thread turn from benchmark talk into operational complaints.
Tokenizer overhead
Anthropic buried the most practical caveat in the migration notes: Opus 4.7 keeps the same sticker price as 4.6, but the same prompt can map to more tokens. madiator's migration-guide screenshot quotes the official guidance at roughly 1.0 to 1.35x more input tokens, and mattpocockuk's launch screenshot shows Claude Code simultaneously moving its default effort to xhigh.
Independent measurements came in higher on real text. daniel_mac8's tokenizer thread cites a writeup that found 1.47x on some samples, while badlogicgames' README token count shows a concrete jump from 1,091 input tokens on Opus 4.6 to 1,454 on 4.7 for the same README.
A lot of the follow-on complaints make more sense in that context:
- bridgemindai's usage screenshot burned 13 percent of a session in three prompts.
- bridgemindai's rate-limit screenshot hit 100 percent session usage in under two hours on a $200 Max plan.
- songjunkr's token test measured 5,657 tokens on 4.7 versus 4,262 on 4.6 for the same text.
- According to the main HN thread, commenters also tied the extra spend to the new xhigh default and longer thinking traces.
Refusals and web search
The other immediate complaint was that 4.7 got more willing to stop, pause, or refuse in places users did not expect. MatthewBerman's web-search refusal post says Claude flatly refused to do web searches, and GergelyOrosz's workflow complaint says a behavior that used to trigger web search now often ended in refusal instead.
That bled into safe prompts. LechMazur's paused puzzle chat and LechMazur's NYT Connections example both show NYT Connections style prompts getting paused by Opus 4.7's safety filters, with the product suggesting a retry on Sonnet 4.
The benchmark fallout was ugly too. LechMazur's benchmark post says Opus 4.7 scored 41.0 on the Extended NYT Connections benchmark in high reasoning mode and 15.3 with no reasoning, with refusals counted as failures. LechMazur's creative-writing refusal example adds that 13 percent of creative-writing requests were also refused in his testing.
The official line moved quickly. alexalbert__'s bug-fix note said many day-one bugs were fixed within a day, and emollick's follow-up reported adaptive thinking was already triggering more often and doing more web search after the patch, though he also said rough edges remained.
Claude Code orchestration
For coding agents, the sharpest criticism was not raw code quality. It was orchestration. pvncher's MCP tool-call report says 4.7 was the first Claude model he had seen make incorrect MCP tool calls, and pvncher's RepoPrompt release note says he added explicit support for older Opus versions because he was reverting orchestration work back to 4.6.
The same theme shows up across other hands-on reports:
- pvncher's dispatching complaint says 4.7 stops too often to ask permission instead of dispatching subagents.
- pvncher's Theo clip reaction calls the model's habit of stopping to ask a perfect example of instruction-following regression.
- badlogicgames' AGENTS.md complaint shows a coding run that committed changes without being asked.
- badlogicgames' PR-comment complaint says a later run cited AGENTS.md in a pull-request comment and kept showing what he called "tail anxiety."
- According to the fresh HN delta, other Claude Code users were already swapping settings like summarized thinking display and disabling 1M context to get saner behavior.
There were positive long-run reports too. nummanali's long session screenshot shows a 2 hour plus Claude Code run on a ratatui overhaul, and bridgemindai's Max-plan post says the model's coding output justified buying two Max subscriptions. But even the fans kept mentioning token burn.
Adaptive thinking and tone shifts
A lot of the weirdness lines up with Anthropic's new control surface. The launch notes in Claude Code rollout thread describe Opus 4.7 as more agentic and better on long-running work, while Simon Willison's prompt diff shows the chat system prompt also changed meaningfully between 4.6 and 4.7.
Willison's diff highlights three changes that match what users noticed in practice:
- More tool use: 4.7 now prefers trying tool discovery before declaring a capability unavailable.
- More completion pressure: the prompt pushes Claude to make a reasonable attempt and finish tasks instead of interviewing the user first.
- More direct style: 4.7 is tuned for shorter, less validation-heavy responses.
That last point was visible almost immediately. dbreunig's "Good push" post joked that Anthropic had replaced "You're absolutely right" with "Good push," and nrehiew_'s migration screenshot points to official migration guidance saying 4.7 is more literal and more direct than 4.6.
The interface choices made that shift harder to steer. Yuchenj_UW's Claude web screenshot shows Claude web offering only Adaptive or non-thinking on Opus 4.7, and emollick's adaptive-thinking thread says that can leave non-coding tasks under-thought because there is no manual override like ChatGPT's reasoning toggle.
Mixed evals, mixed mood
The public eval picture is messy enough that no single chart settles it. natolambert's internal eval chart points to Anthropic's own agentic coding curve, where 4.7 beats 4.6 across effort levels but spends more tokens doing it.
Third-party charts pulled in both directions:
- AiBattle_'s Simple-Bench post says Opus 4.7 scored 62.9 percent, below Opus 4.6 at 67.6 percent.
- petergostev's BullshitBench post says Opus 4.7 Max underperformed non-thinking 4.7 on pushback.
- scaling01's LisanBench post says 4.7 likely is the stronger model on a difficulty-weighted metric, but also notes much heavier token use.
- arena's category-rank chart says 4.7 improved in overall, expert, and creative-writing ranks while falling behind 4.6 in business-management, entertainment, and hard prompts.
Practitioner reports looked just as split. jeremyphoward's early praise said 4.7 was the first model that really "gets" his work, while kimmonismus' regression post, GergelyOrosz's pushback report, and nummanali's instruction-follower complaint all said they were reverting to 4.6 for some tasks.
Fast patches and quiet product changes
Anthropic spent the next 48 hours patching around the launch. alexalbert__'s bug-fix note acknowledged that many bugs people hit on day one were already fixed, and the ClaudeDevs bug-fix repost says one of those bugs was Opus 4.7 falsely flagging normal code edits as possible malware.
Claude Code also shipped a same-day CLI fix. ClaudeCodeLog's 2.1.112 post and the linked Claude Code changelog say version 2.1.112 removed the spurious "claude-opus-4-7 is temporarily unavailable" alert in auto mode.
One of the stranger footnotes is that users were already passing around interface workarounds as if this were a mature migration. nummanali's thinking-display fix points to a GitHub issue and a CLI flag, claude --thinking-display summarized, to restore thinking summaries, while the linked GitHub issue(https://github.com/anthropics/claude-code/issues/49322) says the VS Code extension was not rendering those summaries correctly on 4.7. Two days in, Opus 4.7 looked less like a clean version bump than a live-fire harness update shipped in public.