Skip to content
AI Primer
release

Claude Code fixes prompt-cache bugs in 2.1.88 after quota-burn reports

Claude Code 2.1.88 added fixes for prompt-cache misses, repeated CLAUDE.md reinjection, and a multi-schema StructuredOutput bug after widespread reports of unexpectedly fast quota consumption. Update if you rely on long sessions, because uncached runs can burn through paid limits much faster than intended.

5 min read
Claude Code fixes prompt-cache bugs in 2.1.88 after quota-burn reports
Claude Code fixes prompt-cache bugs in 2.1.88 after quota-burn reports

TL;DR

  • Claude Code users spent March 30 posting screenshots and complaints about hitting limits after only a handful of prompts, including Pro users who said the same workflows had consumed a fraction of their quota days earlier user report cache-bug thread.
  • The most detailed public explanation pointed to prompt-cache invalidation, with one GitHub issue describing sessions that stopped reusing conversation history and fell back to large cache writes on later turns issue link reverse-engineering thread.
  • Anthropic shipped Claude Code 2.1.88 that night with fixes for prompt cache misses in long sessions, repeated nested CLAUDE.md reinjection, a StructuredOutput schema cache bug, and a --resume crash, according to the GitHub release and changelog release summary changelog thread.
  • The release also changed a misleading Rate limit reached error when the API actually returned an entitlement error, and tweaked /usage so Pro and Enterprise users no longer see a redundant Sonnet-only weekly bar changelog thread usage screenshot.
  • By early March 31, at least some affected users said the quota behavior had normalized again, after Anthropic had publicly said it was investigating official-response tweet user says fixed.

You can read the release tag, inspect the full changelog entry, and dig into the cache invalidation bug report that described conversation history dropping out mid-session. One of the stranger details is that the fix list spans both prompt caching and UI accounting, including /stats undercounting subagent usage and /usage hiding a plan-specific bar that was confusing people in the middle of the blowup.

Quota burn showed up first as broken sessions

The outage pattern was messy, not universal. Some users said they were hitting plan limits after a few prompts, while others running similar workflows were fine.

That matches the shape of the GitHub reports. In issue #40884, one user described active chats dying with API Error: Rate limit reached while the UI showed only 2 percent session usage and 41 percent weekly usage. Opening a new chat worked, but the old one stayed broken.

A separate screenshot circulating the same day showed how model choice changed the burn rate. One Max user reported 38 percent session usage after hours on Sonnet, with only 4 percent of the Sonnet weekly limit used, and contrasted that with Opus sessions that were exhausting the session bar much faster usage screenshot.

The cache invalidation report named the failure mode

The clearest technical diagnosis came from issue #40524, titled "Conversation history invalidated on subsequent turns." The reporter said later turns stopped caching prior history and reverted to reprocessing the system prompt plus large cache writes.

That lines up with the community theory in the main thread: once a session stopped hitting cache, token use jumped hard enough to chew through paid quotas reverse-engineering thread. The thread also linked a Reddit reverse-engineering writeup and claimed two compounding bugs, one involving the packaged Bun binary and another involving --resume, but those specific root-cause claims stayed community analysis rather than release-note language.

Anthropic did at least acknowledge the broader class of problem in public. A follow-up post shared an official response saying the team was investigating this issue and other possible causes official-response tweet.

2.1.88 patched the prompt path in several places

The official release is unusually blunt about cache-related fixes. The release notes and changelog list four changes that sit directly on the path people were complaining about:

  • Fixed prompt cache misses in long sessions caused by tool schema bytes changing mid-session.
  • Fixed nested CLAUDE.md files being re-injected dozens of times in long sessions that read many files.
  • Fixed StructuredOutput schema cache bug causing about a 50 percent failure rate when multiple schemas were used.
  • Fixed a --resume crash when transcripts contained a tool result from an older CLI version or an interrupted write.

There is a fifth fix nearby that matters for expensive failures: an autocompact thrash loop now stops after three consecutive immediate refills instead of continuing to burn API calls. That is a small line item with a very large blast radius for long-running sessions.

The release also cleaned up quota and accounting signals

Part of the confusion on March 30 was that the product looked inconsistent even before you got to the underlying cache bug. The changelog says 2.1.88 now shows the actual error, with actionable hints, when the API returns an entitlement error instead of labeling it Rate limit reached.

It also fixes /stats undercounting tokens by excluding subagent usage, and preserves more than 30 days of historical data when the stats cache format changes. Those are not the same bug as prompt-cache invalidation, but they affect how users reconstruct what actually happened after a bad session.

The release also changes /usage for Pro and Enterprise plans by hiding the redundant "Current week (Sonnet only)" bar usage screenshot. In the middle of a quota scare, even small UI cleanup like that matters because people were comparing screenshots line by line.

Some users said the burn stopped within hours

By early March 31, at least one previously affected user reported that rate-limit behavior seemed fixed again, though they could not tell whether the change came from a harness bug fix, a policy change, or account-level intervention.

That timing leaves an awkward but useful record. The public complaints, the GitHub cache issue, Anthropic's acknowledgement, and the 2.1.88 patch notes all landed inside roughly a day. Even if the exact root cause was more than one bug, the changelog makes clear that Claude Code's prompt path had several ways to waste tokens in long sessions, and multiple of them were patched at once.

Further reading

Discussion across the web

Where this story is being discussed, in original context.

On X· 4 threads
TL;DR3 posts
Quota burn showed up first as broken sessions1 post
The cache invalidation report named the failure mode1 post
The release also cleaned up quota and accounting signals1 post
Share on X