updateApril 15, 2026

Claude Code users report 5-minute cache TTL and quota-meter regressions after March updates

GitHub issues and Hacker News threads added fresh evidence that Claude Code sessions still burn quota unexpectedly after the cache TTL change, with some users seeing usage before a prompt is sent and others recovering capacity by rolling back to 2.1.34. Watch cache reuse and metering behavior closely if you rely on long-running sessions.

5 min read

Claude Code users report 5-minute cache TTL and quota-meter regressions after March updates

TL;DR

A GitHub issue analyzing Claude Code session logs says the default prompt-cache TTL flipped from 1 hour to 5 minutes in early March, after a month of 1-hour behavior across two machines and 119,866 API calls.
A separate quota bug report tied rapid exhaustion on Pro Max 5x to unexpectedly high effective token usage and asked whether cache_read tokens are being counted more aggressively than users assumed.
According to fresh HN discussion, some users still see the usage meter jump before sending a prompt, while an earlier discussion summary adds reports of simple prompts burning 15 to 20 percent of quota.
The v2.1.108 release notes, which landed on April 14, added ENABLE_PROMPT_CACHING_1H and FORCE_PROMPT_CACHING_5M, which is the clearest official sign yet that TTL selection had become a live user-facing control.
HN commenters and newer replies both surfaced the same workaround: rolling back to Claude Code 2.1.34, often with adaptive behavior disabled, restored more normal quota burn for at least some users.

You can read the original quota exhaustion bug, the larger TTL regression report, and the later docs complaint saying the env-var reference still omitted the new generic 1-hour and forced 5-minute controls. The weird bit is timing: the official v2.1.108 release added those knobs on April 14, after days of users reverse-engineering TTL behavior from local session logs and quota graphs.

Cache TTL

Cache TTL silently regressed from 1h to 5m around early March 2026, causing quota and cost inflation · Issue #46829 · anthropics/claude-code

GitHub issue reports that Anthropic silently changed the prompt cache TTL default in Claude Code from 1 hour to 5 minutes around early March 2026, based on analysis of session JSONL files from Jan-Apr 2026. This led to 20-32% higher cache creation costs and increased quota usage. Requests confirmation of the change, clarification on intended TTL, restoration to 1h or configurability, and disclosure on cache_read quota counting. Issue created Apr 12, 2026 by seanGSISG, closed as not_planned by notitatall on same day. Labels: bug, has repro, area:cost, api:anthropic. Reactions: 215 thumbs up.

Anthropic downgraded cache TTL on March 6th

The key engineering takeaway is that Claude Code’s caching behavior may have changed in a way that materially affects cost, quota burn, and session persistence. The thread highlights how short TTLs can undermine workflows that rely on repeated context reuse, especially in agentic coding tools.

The strongest evidence in this story is not a complaint thread, it is a log analysis. The author of the TTL regression issue says Claude Code's local JSONL session files exposed ephemeral_5m_input_tokens and ephemeral_1h_input_tokens, which made the cache tier visible per call.

That issue breaks the behavior into four phases: January with 5-minute cache only, February 1 through March 5 with 1-hour cache only, March 6 to 7 as a transition, and March 8 onward with 5-minute cache dominant. The claim is unusually specific, down to March 6 as the first day 5-minute cache tokens reappeared.

The cost math in the same issue is what turned a niche caching complaint into a broader outage-like story. Using Anthropic's published rates, the author estimated 20 to 32 percent higher cache-creation costs after the regression and argued that the same shift would also inflate subscription quota burn because expired context gets rewritten instead of cheaply read.

Quota meter

[BUG] Pro Max 5x Quota Exhausted in 1.5 Hours Despite Moderate Usage · Issue #45756 · anthropics/claude-code

Open bug report filed on 2026-04-09 by molu0219 on the anthropics/claude-code GitHub repo. User reports Pro Max 5x (Opus) plan quota exhausting in 1.5 hours with moderate usage (mostly Q&A, light tasks), equating to 8.7M effective tokens/hour, far exceeding expected rates. Requests clarification on cache_read quota accounting. Labeled bug, area:cost, platform:wsl. Bot suggested possible duplicates and plans auto-close. Referenced in later issues on quota limits and cache TTL regression.

Pro Max 5x quota exhausted in 1.5 hours despite moderate usage

This is a useful signal on subscription-based coding-agent reliability: users are reporting steep quota consumption, suspected cache/accounting bugs, and version-specific regressions. The actionable bits are the prompt-caching TTL changes, the rollback workaround, and the broader concern that usage metering may be unstable or opaque.

The original bug report described a Pro Max 5x plan exhausting in 1.5 hours after mostly Q&A and light development, even though the same account had previously sustained five hours of heavier work. The report's internal numbers put one window at an 8.7 million effective-token-per-hour pace and explicitly questioned how cache_read is counted against limits.

That same ambiguity shows up in user reports. According to HN discussion excerpts, one commenter saw the usage meter start at 3 to 7 percent before a message was sent, while simple fresh prompts could jump another 15 to 20 percent. Matt Pocock added a more public version of the same frustration, saying Anthropic had not answered a subscription-usage question after a month.

Anthropic's position, as quoted by The Register, was that the TTL reduction should not have increased costs. That did not resolve the narrower question users were asking in GitHub and HN, which was whether quota accounting and the visible usage meter had also changed.

v2.1.108

Fresh discussion on Pro Max 5x quota exhausted in 1.5 hours despite moderate usage

Today’s new comments add a few concrete datapoints rather than a new consensus. One commenter says they still see a broken usage meter, with percentage usage appearing before a message is sent and simple prompts consuming far more quota than expected. Another posts a changelog snippet showing new prompt-caching controls, suggesting Anthropic may be exposing 1-hour and 5-minute cache TTL options, though the commenter thinks this still does not match subscriber behavior. A separate fresh reply says rolling back Claude Code to version 2.1.34 and disabling adaptive behavior fixed the rapid-quota-exhaustion problem for them, which adds a practical workaround but also reinforces that the issue may be version- or behavior-dependent.

The cleanest official update arrived in v2.1.108. Its release notes added two explicit controls: ENABLE_PROMPT_CACHING_1H, which opts into a 1-hour prompt-cache TTL across API key, Bedrock, Vertex, and Foundry setups, and FORCE_PROMPT_CACHING_5M, which forces a 5-minute TTL.

That release also added a recap feature for returning to a session and noted that users with telemetry disabled could force it with CLAUDE_CODE_ENABLE_AWAY_SUMMARY. The timing matters because a separate bug report had already claimed disabling telemetry could knock sessions back from 1-hour cache TTL to 5-minute TTL.

There was still documentation lag. The env-var docs documented older prompt-caching controls, while a docs issue filed the same day said the generic ENABLE_PROMPT_CACHING_1H and FORCE_PROMPT_CACHING_5M switches were missing.

2.1.34 rollback

Discussion around Pro Max 5x quota exhausted in 1.5 hours despite moderate usage

Thread discussion highlights: - jhogendorn on quota exhaustion / usage meter: "my max plan has gone down by about 80%... I have often seen my usage meter sit at 3-7% before a single message has been sent... simple fresh prompts jump usage 15-20%" - g4cg54g54 on prompt caching TTL: "Added ENABLE_PROMPT_CACHING_1H... and FORCE_PROMPT_CACHING_5M..." ... "still nowhere close to what subscribers get" - oldnewthing on rollback/workaround: "I rolled back to version 2.1.34... This seems to have fixed my running out of quota issues quickly problems."

Fresh discussion on Pro Max 5x quota exhausted in 1.5 hours despite moderate usage

One practical detail surfaced only in the discussion threads: several users said the behavior was at least partly version-specific. One HN commenter said rolling back to Claude Code 2.1.34 fixed rapid quota exhaustion, and a fresher summary added that disabling adaptive behavior was part of the workaround for at least one user.

That does not settle whether the root cause was cache TTL, metering, adaptive behavior, or some combination across releases. It does show that by mid-April the story had moved beyond one March configuration change. Users were now treating Claude Code versions, telemetry settings, away-summary behavior, and prompt-cache TTL as a single entangled reliability surface.

TL;DR

Cache TTL

Cache TTL silently regressed from 1h to 5m around early March 2026, causing quota and cost inflation · Issue #46829 · anthropics/claude-code

Anthropic downgraded cache TTL on March 6th

Quota meter

[BUG] Pro Max 5x Quota Exhausted in 1.5 Hours Despite Moderate Usage · Issue #45756 · anthropics/claude-code

Pro Max 5x quota exhausted in 1.5 hours despite moderate usage

v2.1.108

Fresh discussion on Pro Max 5x quota exhausted in 1.5 hours despite moderate usage

2.1.34 rollback

Discussion around Pro Max 5x quota exhausted in 1.5 hours despite moderate usage

Fresh discussion on Pro Max 5x quota exhausted in 1.5 hours despite moderate usage

Discussion across the web