Skip to content
AI Primer
update

Claude Code users report 5-minute cache TTL and quota-meter regressions after March updates

GitHub issues and Hacker News threads added fresh evidence that Claude Code sessions still burn quota unexpectedly after the cache TTL change, with some users seeing usage before a prompt is sent and others recovering capacity by rolling back to 2.1.34. Watch cache reuse and metering behavior closely if you rely on long-running sessions.

5 min read
Claude Code users report 5-minute cache TTL and quota-meter regressions after March updates
Claude Code users report 5-minute cache TTL and quota-meter regressions after March updates

TL;DR

  • A GitHub issue analyzing Claude Code session logs says the default prompt-cache TTL flipped from 1 hour to 5 minutes in early March, after a month of 1-hour behavior across two machines and 119,866 API calls.
  • A separate quota bug report tied rapid exhaustion on Pro Max 5x to unexpectedly high effective token usage and asked whether cache_read tokens are being counted more aggressively than users assumed.
  • According to fresh HN discussion, some users still see the usage meter jump before sending a prompt, while an earlier discussion summary adds reports of simple prompts burning 15 to 20 percent of quota.
  • The v2.1.108 release notes, which landed on April 14, added ENABLE_PROMPT_CACHING_1H and FORCE_PROMPT_CACHING_5M, which is the clearest official sign yet that TTL selection had become a live user-facing control.
  • HN commenters and newer replies both surfaced the same workaround: rolling back to Claude Code 2.1.34, often with adaptive behavior disabled, restored more normal quota burn for at least some users.

You can read the original quota exhaustion bug, the larger TTL regression report, and the later docs complaint saying the env-var reference still omitted the new generic 1-hour and forced 5-minute controls. The weird bit is timing: the official v2.1.108 release added those knobs on April 14, after days of users reverse-engineering TTL behavior from local session logs and quota graphs.

Cache TTL

Y
Hacker News

Cache TTL silently regressed from 1h to 5m around early March 2026, causing quota and cost inflation · Issue #46829 · anthropics/claude-code

549 upvotes · 421 comments

Y
Hacker News

Anthropic downgraded cache TTL on March 6th

549 upvotes · 421 comments

The strongest evidence in this story is not a complaint thread, it is a log analysis. The author of the TTL regression issue says Claude Code's local JSONL session files exposed ephemeral_5m_input_tokens and ephemeral_1h_input_tokens, which made the cache tier visible per call.

That issue breaks the behavior into four phases: January with 5-minute cache only, February 1 through March 5 with 1-hour cache only, March 6 to 7 as a transition, and March 8 onward with 5-minute cache dominant. The claim is unusually specific, down to March 6 as the first day 5-minute cache tokens reappeared.

The cost math in the same issue is what turned a niche caching complaint into a broader outage-like story. Using Anthropic's published rates, the author estimated 20 to 32 percent higher cache-creation costs after the regression and argued that the same shift would also inflate subscription quota burn because expired context gets rewritten instead of cheaply read.

Quota meter

Y
Hacker News

[BUG] Pro Max 5x Quota Exhausted in 1.5 Hours Despite Moderate Usage · Issue #45756 · anthropics/claude-code

754 upvotes · 656 comments

Y
Hacker News

Pro Max 5x quota exhausted in 1.5 hours despite moderate usage

754 upvotes · 656 comments

The original bug report described a Pro Max 5x plan exhausting in 1.5 hours after mostly Q&A and light development, even though the same account had previously sustained five hours of heavier work. The report's internal numbers put one window at an 8.7 million effective-token-per-hour pace and explicitly questioned how cache_read is counted against limits.

That same ambiguity shows up in user reports. According to HN discussion excerpts, one commenter saw the usage meter start at 3 to 7 percent before a message was sent, while simple fresh prompts could jump another 15 to 20 percent. Matt Pocock added a more public version of the same frustration, saying Anthropic had not answered a subscription-usage question after a month.

Anthropic's position, as quoted by The Register, was that the TTL reduction should not have increased costs. That did not resolve the narrower question users were asking in GitHub and HN, which was whether quota accounting and the visible usage meter had also changed.

v2.1.108

Y
Hacker News

Fresh discussion on Pro Max 5x quota exhausted in 1.5 hours despite moderate usage

754 upvotes · 656 comments

The cleanest official update arrived in v2.1.108. Its release notes added two explicit controls: ENABLE_PROMPT_CACHING_1H, which opts into a 1-hour prompt-cache TTL across API key, Bedrock, Vertex, and Foundry setups, and FORCE_PROMPT_CACHING_5M, which forces a 5-minute TTL.

That release also added a recap feature for returning to a session and noted that users with telemetry disabled could force it with CLAUDE_CODE_ENABLE_AWAY_SUMMARY. The timing matters because a separate bug report had already claimed disabling telemetry could knock sessions back from 1-hour cache TTL to 5-minute TTL.

There was still documentation lag. The env-var docs documented older prompt-caching controls, while a docs issue filed the same day said the generic ENABLE_PROMPT_CACHING_1H and FORCE_PROMPT_CACHING_5M switches were missing.

2.1.34 rollback

Y
Hacker News

Discussion around Pro Max 5x quota exhausted in 1.5 hours despite moderate usage

754 upvotes · 656 comments

Y
Hacker News

Fresh discussion on Pro Max 5x quota exhausted in 1.5 hours despite moderate usage

754 upvotes · 656 comments

One practical detail surfaced only in the discussion threads: several users said the behavior was at least partly version-specific. One HN commenter said rolling back to Claude Code 2.1.34 fixed rapid quota exhaustion, and a fresher summary added that disabling adaptive behavior was part of the workaround for at least one user.

That does not settle whether the root cause was cache TTL, metering, adaptive behavior, or some combination across releases. It does show that by mid-April the story had moved beyond one March configuration change. Users were now treating Claude Code versions, telemetry settings, away-summary behavior, and prompt-cache TTL as a single entangled reliability surface.

Further reading

Discussion across the web

Where this story is being discussed, in original context.

On X· 1 thread
Quota meter1 post