Skip to content
AI Primer
update

Opus 4.7 users report 1.46x tokenization and faster limit burn

Four days after the Opus 4.7 launch, independent tests measured about 1.35-1.46x more text tokens than 4.6 while users kept reporting faster limit burn and weaker coding. That can change effective cost and session economics in Claude Code even if list prices stay flat.

6 min read
Opus 4.7 users report 1.46x tokenization and faster limit burn
Opus 4.7 users report 1.46x tokenization and faster limit burn

TL;DR

You can read Anthropic's launch post, inspect Simon Willison's token counter, and browse the live BullshitBench viewer. There is also an open Claude Code quota bug report, plus a long HN launch thread where users compare adaptive thinking, latency, and refusal behavior.

Tokenizer

Anthropic disclosed in its launch post that Opus 4.7 uses an updated tokenizer and that identical inputs may expand to roughly 1.0x to 1.35x more tokens. Independent measurements immediately started landing above the pleasant end of that range.

The outside measurements line up surprisingly well for text-heavy engineering inputs:

  • nrehiew_ measured 1.35x to 1.40x on Claude Design, Claude Code, and Claude Cowork system prompts.
  • Simon Willison measured 1.46x on a pasted Opus 4.7 system prompt using Anthropic's counting API, then published the method and examples in his blog post.
  • badlogicgames' README test counted /app/README.md at 1,091 tokens on 4.6 versus 1,454 on 4.7, about 1.33x.
  • daniel_mac8's linked analysis summarized a broader third-party test set that peaked at 1.47x and averaged about 1.325x across mixed real-world Claude Code content.

The weird part is not just larger totals. In nrehiew_'s toy example, hi becomes two tokens on 4.7 while it stays one on 4.6, and another nrehiew_ test suggests whitespace-heavy text can balloon even faster.

Quota burn

Flat list pricing plus fatter tokenization is the whole story for API users. For Claude subscribers, it also changes how fast a session meter drains.

A few distinct signals converged here:

  • bridgemindai reported 13 percent session usage after three prompts.
  • Later, the same account said one Max plan hit 100 percent session usage in under two hours.
  • In the main HN cost thread, one commenter said each exchange now feels like about 5 percent of a five-hour limit, while another estimated a 30 percent to 40 percent token increase from their own tests.
  • the GitHub issue summary describes an open Claude Code bug report from a Pro Max 5x user who said quota was exhausted in 1.5 hours under moderate usage and asked how cache_read affects accounting.

That does not make every 4.7 workflow more expensive on net. According to a top HN comment summarized in the discussion extract, some users see fewer output tokens and lower reasoning spend than 4.6, which complicates any clean before-and-after cost claim. The consistent complaint was narrower: the input side got fatter, and the product surfaces still feel like they burn faster.

Vision is the important caveat. Simon Willison's first image test showed up to 3x more tokens on a high-resolution PNG, but his follow-up clarification says that jump came from 4.7 accepting much larger images. On a smaller 682x318 image, he measured 314 tokens on 4.7 versus 310 on 4.6.

Coding quality

The strangest part of the rollout is how polarized hands-on reports became. A lot of users liked the model for UI work and long agentic runs. A lot of other users wanted 4.6 back within hours.

The negative cluster was unusually specific:

  • nummanali called 4.7 a literal instruction follower that no longer explores enough.
  • Gergely Orosz said it was combative and kept pushing back on research tasks.
  • kimmonismus said the model felt rushed and ignored prompt intent.
  • Peter Gostev's BullshitBench post said Opus 4.7 scored worse than the 4.6 family, with the Max-thinking variant dropping to 74 percent clear pushback versus 83 percent for non-thinking. The underlying viewer and repo are public at BullshitBench and GitHub.

The positive cluster was also concrete:

  • bridgemindai said the output was unmatched for coding, despite brutal token burn.
  • doodlestein said productivity "skyrocketed" and credited 4.7 for careful, persistent work with skills.
  • ai_for_success said UI work jumped a tier over Gemini 3.1, even if backend quality felt similar to 4.6.
  • Jeremy Howard's first-day reaction called it the first model that "gets" what he is doing while working.

The community read on the fresh HN delta lands close to the evidence pool: some users say 4.7 solves more complex tasks without losing the plot, others say it is slower, fussier, and more expensive in real sessions than the launch framing suggests.

Claude Code defaults

A lot of the rollout friction was not about the base model at all. It was about how Claude Code exposed it.

Three product-level changes kept surfacing in user reports and community threads:

  • Thinking summaries stopped showing by default in Claude Code. nummanali posted the workaround: claude --thinking-display summarized.
  • the fresh HN discussion says users were also toggling CLAUDE_CODE_DISABLE_1M_CONTEXT=1 and noting that adaptive thinking could no longer be fully disabled for 4.7 in the same way as before.
  • the main HN launch thread summary highlighted a new xhigh default in Claude Code, plus user confusion around adaptive thinking output format and safety-triggered refusals.

Some rollout issues were plain bugs. a retweeted ClaudeDevs statement said the "this might be malware" warnings on normal code edits were Anthropic's bug, and koltregaskes posted a desktop app state where usage showed room left but the client still claimed the limit was hit.

One final wrinkle: Wes Roth's screenshot of Boris Cherny's reply shows Boris Cherny saying Anthropic has "no plans to change" a subscriber rate-limit increase that rolled out alongside the model. That means users were dealing with two overlapping shifts at once, a bigger bucket on paper and a model that often seems to drink from it faster.

Further reading

Discussion across the web

Where this story is being discussed, in original context.