updateApril 16, 2026

Claude Code raises Opus 4.7 subscriber limits after token burn increases

Anthropic raised Claude subscriber limits and shipped Claude Code 2.1.112 after Opus 4.7's adaptive thinking and tokenizer changes increased token use. Users still report fast quota depletion and inconsistent cache or effort behavior across web and CLI sessions.

5 min read

Claude Code raises Opus 4.7 subscriber limits after token burn increases

TL;DR

According to Boris Cherny, Anthropic raised subscriber rate limits after Opus 4.7 started using more thinking tokens, and bridgemindai's usage screen shows some Claude Code sessions resetting to zero later that day.
The official migration guide says Opus 4.7 changed both the tokenizer and thinking behavior, which can push the same input to roughly 1.0 to 1.35x more tokens, while Matt Pocock's screenshot shows Claude Code also moved its default effort to xhigh.
ClaudeCodeLog's 2.1.112 note shows Anthropic shipped a same-day Claude Code patch for an auto mode error that falsely said Opus 4.7 was temporarily unavailable.
The quota mess predates 4.7: the 2.1.108 changelog added explicit 1-hour and 5-minute prompt-caching controls, while a fresh HN roundup still collected reports of broken usage meters and unusually fast burn.
On the web app, Ray Fernando's screenshot and Ethan Mollick's thread show a different problem, adaptive thinking replaced the old always-on extended thinking toggle, with no manual override exposed outside Claude Code or the API.

You can read Anthropic's API release note, the migration guide, and the new Claude Code best-practices post. The tiny 2.1.112 changelog is also worth a look because it shows how narrow the first shipping fix was.

Limits went up fast

Anthropic's public explanation was simple: Opus 4.7 uses more thinking tokens, so subscriber limits went up to compensate. In replies captured by kimmonismus, Cherny also said there was "no time limit" and "no plans to change it," which made the increase sound like a standing reset rather than a weekend promo.

That was a quick reversal from the hours just before the change, when bridgemindai posted a Max plan session already at 100 percent and kimmonismus was still asking for a rate reset. Christmas came early for Claude Code power users, but only after they had already tripped the breaker.

Token burn moved in three layers

Opus 4.7 did not just get "better" in a vague sense. It got more expensive to drive in several concrete ways.

The migration guide says the new tokenizer can turn the same text into roughly 1.0 to 1.35x more tokens.
The same guide says Opus 4.7 tends to think more at higher effort levels, especially in later turns of longer agentic sessions.
The best-practices post says Claude Code now defaults Opus 4.7 to xhigh, a new effort tier between high and max.
The what's new page adds higher resolution vision, up to 2576px on the long edge, which also increases token exposure on image-heavy workflows.

Those changes explain why a flat price sheet in the API release notes still felt like a quota shock to users on subscription plans.

2.1.112 patched the first obvious break

Claude Code 2.1.112 fixed one very specific launch-day problem: auto mode was falsely reporting that claude-opus-4-7 was temporarily unavailable. The official changelog entry lists only that fix.

ClaudeCodeLog's diff notes also flagged a rules restructure for custom defaults, which matters because model and effort routing was changing at the same time. When a release swaps the default model behavior, bumps the default effort, and ships a hotfix four hours later, "update Claude Code first" becomes part of the rollout story.

The quota complaints started before 4.7

Hacker News

Fresh discussion on Pro Max 5x quota exhausted in 1.5 hours despite moderate usage

754 upvotes · 656 comments

The quota panic around Opus 4.7 landed on top of an older caching fight. Claude Code 2.1.108 added ENABLE_PROMPT_CACHING_1H and FORCE_PROMPT_CACHING_5M, according to the release notes, after weeks of complaints that caching behavior had changed in ways users could feel.

The public paper trail is messy but consistent. A GitHub issue on cache TTL regression claimed the default prompt cache had fallen from one hour to five minutes in early March, and that issue summary pegged the impact at 20 to 32 percent higher cache-creation costs. The latest HN roundup added newer reports of usage meters showing nonzero burn before the first message and simple prompts jumping 15 to 20 percent.

Adaptive thinking is still split across surfaces

The least tidy part of the rollout is product consistency. In the API, the migration guide says adaptive thinking is off by default unless requested. In Claude Code, the best-practices post says xhigh is now the default effort. On the consumer web app, users saw the old Extended Thinking control replaced by an Adaptive thinking toggle that "thinks only when needed."

That left web users debugging a hidden router they could not steer. Mollick said analysis, writing, and research tasks often failed to trigger enough thinking or tool use, while another user found Opus 4.7 on web unable to compare itself with Opus 4.6 and unable to force web search mid-conversation. Same model family, very different knobs.