Claude Code users report 5-minute cache TTL and quota-meter regressions after March updates
GitHub issues and Hacker News threads added fresh evidence that Claude Code sessions still burn quota unexpectedly after the cache TTL change, with some users seeing usage before a prompt is sent and others recovering capacity by rolling back to 2.1.34. Watch cache reuse and metering behavior closely if you rely on long-running sessions.

TL;DR
- A GitHub issue analyzing Claude Code session logs says the default prompt-cache TTL flipped from 1 hour to 5 minutes in early March, after a month of 1-hour behavior across two machines and 119,866 API calls.
- A separate quota bug report tied rapid exhaustion on Pro Max 5x to unexpectedly high effective token usage and asked whether
cache_readtokens are being counted more aggressively than users assumed. - According to fresh HN discussion, some users still see the usage meter jump before sending a prompt, while an earlier discussion summary adds reports of simple prompts burning 15 to 20 percent of quota.
- The v2.1.108 release notes, which landed on April 14, added
ENABLE_PROMPT_CACHING_1HandFORCE_PROMPT_CACHING_5M, which is the clearest official sign yet that TTL selection had become a live user-facing control. - HN commenters and newer replies both surfaced the same workaround: rolling back to Claude Code 2.1.34, often with adaptive behavior disabled, restored more normal quota burn for at least some users.
You can read the original quota exhaustion bug, the larger TTL regression report, and the later docs complaint saying the env-var reference still omitted the new generic 1-hour and forced 5-minute controls. The weird bit is timing: the official v2.1.108 release added those knobs on April 14, after days of users reverse-engineering TTL behavior from local session logs and quota graphs.
Cache TTL
Cache TTL silently regressed from 1h to 5m around early March 2026, causing quota and cost inflation · Issue #46829 · anthropics/claude-code
549 upvotes · 421 comments
Anthropic downgraded cache TTL on March 6th
549 upvotes · 421 comments
The strongest evidence in this story is not a complaint thread, it is a log analysis. The author of the TTL regression issue says Claude Code's local JSONL session files exposed ephemeral_5m_input_tokens and ephemeral_1h_input_tokens, which made the cache tier visible per call.
That issue breaks the behavior into four phases: January with 5-minute cache only, February 1 through March 5 with 1-hour cache only, March 6 to 7 as a transition, and March 8 onward with 5-minute cache dominant. The claim is unusually specific, down to March 6 as the first day 5-minute cache tokens reappeared.
The cost math in the same issue is what turned a niche caching complaint into a broader outage-like story. Using Anthropic's published rates, the author estimated 20 to 32 percent higher cache-creation costs after the regression and argued that the same shift would also inflate subscription quota burn because expired context gets rewritten instead of cheaply read.
Quota meter
[BUG] Pro Max 5x Quota Exhausted in 1.5 Hours Despite Moderate Usage · Issue #45756 · anthropics/claude-code
754 upvotes · 656 comments
Pro Max 5x quota exhausted in 1.5 hours despite moderate usage
754 upvotes · 656 comments
The original bug report described a Pro Max 5x plan exhausting in 1.5 hours after mostly Q&A and light development, even though the same account had previously sustained five hours of heavier work. The report's internal numbers put one window at an 8.7 million effective-token-per-hour pace and explicitly questioned how cache_read is counted against limits.
That same ambiguity shows up in user reports. According to HN discussion excerpts, one commenter saw the usage meter start at 3 to 7 percent before a message was sent, while simple fresh prompts could jump another 15 to 20 percent. Matt Pocock added a more public version of the same frustration, saying Anthropic had not answered a subscription-usage question after a month.
Anthropic's position, as quoted by The Register, was that the TTL reduction should not have increased costs. That did not resolve the narrower question users were asking in GitHub and HN, which was whether quota accounting and the visible usage meter had also changed.
v2.1.108
Fresh discussion on Pro Max 5x quota exhausted in 1.5 hours despite moderate usage
754 upvotes · 656 comments
The cleanest official update arrived in v2.1.108. Its release notes added two explicit controls: ENABLE_PROMPT_CACHING_1H, which opts into a 1-hour prompt-cache TTL across API key, Bedrock, Vertex, and Foundry setups, and FORCE_PROMPT_CACHING_5M, which forces a 5-minute TTL.
That release also added a recap feature for returning to a session and noted that users with telemetry disabled could force it with CLAUDE_CODE_ENABLE_AWAY_SUMMARY. The timing matters because a separate bug report had already claimed disabling telemetry could knock sessions back from 1-hour cache TTL to 5-minute TTL.
There was still documentation lag. The env-var docs documented older prompt-caching controls, while a docs issue filed the same day said the generic ENABLE_PROMPT_CACHING_1H and FORCE_PROMPT_CACHING_5M switches were missing.
2.1.34 rollback
Discussion around Pro Max 5x quota exhausted in 1.5 hours despite moderate usage
754 upvotes · 656 comments
Fresh discussion on Pro Max 5x quota exhausted in 1.5 hours despite moderate usage
754 upvotes · 656 comments
One practical detail surfaced only in the discussion threads: several users said the behavior was at least partly version-specific. One HN commenter said rolling back to Claude Code 2.1.34 fixed rapid quota exhaustion, and a fresher summary added that disabling adaptive behavior was part of the workaround for at least one user.
That does not settle whether the root cause was cache TTL, metering, adaptive behavior, or some combination across releases. It does show that by mid-April the story had moved beyond one March configuration change. Users were now treating Claude Code versions, telemetry settings, away-summary behavior, and prompt-cache TTL as a single entangled reliability surface.