Skip to content
AI Primer
update

Opus 4.7 users report verbose output, weaker 1M context, and 12–27% higher costs

Users reported more verbosity, weaker 1M-context behavior, and little coding gain after Opus 4.7 rolled out. OpenRouter measured 12–27% higher costs, and some teams reverted their default model.

5 min read
Opus 4.7 users report verbose output, weaker 1M context, and 12–27% higher costs
Opus 4.7 users report verbose output, weaker 1M context, and 12–27% higher costs

TL;DR

You can read Anthropic's 2.1.123 changelog, browse OpenRouter's tokenizer breakdown, inspect petergostev's side-by-side generations, and skim the live r/ClaudeCode thread. The weird bit is how many of the user complaints line up with prompt and surface changes that shipped within about a day, including new system text, memory attachment behavior, and changes to URL-generation rules.

Costs

OpenRouter said its market-wide measurements found Opus 4.7 costs rose 12 to 27 percent for most workloads, with short prompts as the exception. The core claim in OpenRouter's full post is simple: list pricing stayed familiar, but tokenizer behavior changed enough to move actual spend.

That cost argument landed next to a separate tokenization complaint. In arankomatsuzaki's language-overhead thread, Anthropic's tokenizer showed materially higher overhead than OpenAI's on non-English text, including 1.71x for Chinese, 2.86x for Arabic, and 3.24x for Hindi relative to an English OpenAI baseline; he also published a public Claude token counter artifact to make the issue inspectable.

Together, those two datapoints explain why the pushback got loud so quickly. Users were not only arguing about taste, they were arguing about a model that could feel worse while also costing more.

1M context

The sharpest claim about the 1M-context mode came from petergostev's comparison thread, which toggled Claude Code's 1M default on and off across repeated generations with the same prompts and xHigh reasoning. His conclusion was not that the outputs were merely stochastic, but that the 400k and 1M versions felt like different models.

His examples were concrete:

  • Voxel Rome: the 1M run made the Colosseum look much less impressive.
  • Golden Gate: cars went sideways, waves looked weak, and the bridge geometry drifted into land.
  • Stonehenge: structure, lighting, shadows, and textures all looked flatter.

The useful part is that he published both hosted generations and a GitHub repo with code and prompts. That turns an otherwise hand-wavy vibe report into something other builders can inspect line by line.

Output style

The workflow backlash was broader than one benchmark thread. In TheZachMueller's post, a long-time Claude Code Max user said 4.7 changed behavior, knowledge, and especially laziness enough that he was leaving after ten months on Claude.

r/ClaudeCode

Anyone feels Claude code output style recently is too verbose to read

9 comments

The most specific community complaint was verbosity. A Reddit post on r/ClaudeCode said recent Claude Code answers had become so wordy they were hard to read at all, and the top included comments turned the complaint into folk remedies: one suggested adding a top-level CLAUDE.md verbosity rule, another said switching back to Opus 4.6 felt like "a breeze of fresh air."

That matches what some teams reported at a higher level. In zeeg's rollback post, the complaint was not style alone, but the absence of "true gains in performance" relative to the extra compute burn.

Rollbacks

Zeeg's post is the cleanest signal that this was not just solo-user frustration. He said he asked his team to turn off Opus 4.7 because it was "burning compute (and money)," then clarified in a follow-up that the move was a reversion to 4.6 rather than a swap to GPT.

That pairing matters because it links two different failure modes. zeeg framed the issue as performance without upside, while OpenRouter framed it as a tokenizer-driven cost increase. Those are separate complaints, but they stack badly when they hit the same release window.

Prompt churn

The official Claude Code releases around the rollout were unusually busy for such a short window. ClaudeCodeLog's 2.1.123 summary said the build shipped one CLI fix for an OAuth 401 retry loop when CLAUDE_CODE_DISABLE_EXPERIMENTAL_BETAS=1 was set, but the larger movement was in system prompts rather than CLI features.

According to ClaudeCodeLog's prompt diff, 2.1.123 removed an explicit rule that told Claude to never generate or guess URLs unless it was confident they were programming-related, and it also added a new system block covering GitHub-flavored Markdown output, denied tool-call repetition, prompt-injection warnings, and auto-compression near context limits. The previous day's 2.1.122 prompt update thread had already added the new "looking is not acting" safety preface, removed an earlier upfront tool-instruction block, and introduced a malware-analysis-only reminder in Anthropic's 2.1.122 changelog.

Those diffs do not prove why users disliked 4.7, but they do establish that the surrounding harness changed fast. The model, the prompt wrappers, and the CLI surface were all moving at once.

Further reading

Discussion across the web

Where this story is being discussed, in original context.

On X· 1 thread
Prompt churn1 post
Share on X