Claude Code users report 30-40% token growth and incomplete long tasks
Users reported higher token use, partial long-document reviews, and rising spend on routine tasks after Claude Code regressions came into focus. Some developers still get strong results in constrained harnesses, but others may want to switch to Codex for long-running work.

TL;DR
- badlogicgames said Anthropic changed the tokenizer, turning the same prompts into roughly 30 to 40 percent more input and output tokens, while the main Hacker News thread surfaced the same 1.0 to 1.35x overhead in practitioner discussion.
- Spend complaints landed fast: in zeeg's thread, a routine Opus 4.7 day in Claude and Patch reportedly hit $50, and a second HN thread collected similar reports about credits getting burned on work that still failed.
- The quality complaints were not just about cost. Gergely Orosz's screenshot shows Claude hitting a tool-use limit mid-task, and in a second Orosz post Claude stopped after reviewing only the first 80 pages of a full-book proofreading job.
- Anthropic's own April 23 postmortem, linked in the quality-report thread, turned the rollout into a broader reliability story about quiet behavior changes, cache effects, and auto-mode regressions.
- The backlash is not universal: sqs said Opus 4.7 had been working well inside Amp's
smartmode, while HN commenters described better results when they wrapped Claude in tighter prompts, smaller scopes, and strong test harnesses.
You can read Anthropic's Opus 4.7 launch post, then jump to the April 23 postmortem. The HN threads in the launch discussion and the quality-report discussion are where the buried details surfaced: Claude Code's default effort moving to xhigh, adaptive thinking getting more opaque, and users arguing that hidden defaults mattered as much as the model itself.
Token overhead
Claude Opus 4.7
2k upvotes · 1.4k comments
The clearest concrete complaint is token inflation. In badlogicgames' reply, the claim is simple: same input, 30 to 40 percent more tokens, same for output.
That lines up with the launch-thread summary, which cites a top HN comment describing 1.0 to 1.35x tokenizer overhead and notes that Claude Code's default effort moved to xhigh. Those two changes stack badly in agentic sessions, where the bill is shaped by both how many tokens the model counts and how hard the harness now pushes it to think.
Incomplete runs
The reliability complaints are unusually concrete. Gergely Orosz posted a Claude UI screenshot showing, "Claude reached its tool-use limit for this turn," after comparing it side by side with ChatGPT on the same task.
In a separate book-review test, Orosz said Claude stopped after the first 80 pages of a Hungarian translation review and recommended another human pass before print, while ChatGPT reportedly processed the whole book in the same kind of experiment. That is less about benchmark scores than about long-horizon work getting truncated midstream.
Hidden defaults
An update on recent Claude Code quality reports
932 upvotes · 719 comments
Claude Opus 4.7
2k upvotes · 1.4k comments
The postmortem thread collected the complaints engineers usually hate most: not just degraded output, but behavior changing underneath them. Its linked discussion quotes users calling out secret changes, degraded auto mode, and performance shifts after idle periods that looked tied to cache rehydration.
The earlier launch thread adds two more implementation details from commenters: Claude Code defaulting to xhigh effort, and Opus 4.7 no longer returning a human-readable reasoning summary unless "display": "summarized" is set. Those are harness-level changes, not just model vibes.
Constrained harnesses still look better
I cancelled Claude: Token issues, declining quality, and poor support
941 upvotes · 565 comments
The strongest counter-signal came from users who were not running Opus 4.7 raw. In sqs' post, Amp's smart mode is presented as a noticeably better environment for the same model.
The criticism thread contains the same pattern from the other direction. One top comment describes a simple two-file task where Claude changed the file it had planned not to touch, while another says recent Claude runs one-shotted 9 of 9 bug fixes when paired with extensive unit tests. The split is not subtle: loose sessions get expensive fast, tight harnesses fare better.
Support and trust
I cancelled Claude: Token issues, declining quality, and poor support
941 upvotes · 565 comments
An update on recent Claude Code quality reports
932 upvotes · 719 comments
The last new wrinkle is that some users are treating this as a support story, not just a model story. The cancellation thread includes reports of Pro plan cancellations, long appeal delays, and what one commenter called nonexistent support.
The postmortem discussion shows why that landed so hard. Several commenters argued the real break was trust, because major behavior changes appeared to arrive before any clear public explanation. Once cost spikes, incomplete runs, and quiet defaults show up in the same week, every failure starts looking systemic.