updateMarch 30, 2026

Claude Code fixes prompt cache misses in 2.1.88 after users report quota spikes

Users tied Claude Code quota spikes to cache misses and schema failures, and 2.1.88 shipped fixes for prompt caching plus multi-schema StructuredOutput bugs. Broken cache reuse can multiply token usage and throttle long coding sessions, so the patch matters for reliability.

Claude Code Reliability Rate Limits Developer Experience

5 min read

Claude Code fixes prompt cache misses in 2.1.88 after users report quota spikes

TL;DR

Claude Code users spent March 30 reporting abrupt quota blowups, including sessions that hit limits after a handful of prompts or a lightweight design chat, which pointed to a product regression more than ordinary heavy usage user report cache bug thread.
The strongest technical explanation was prompt cache invalidation: users and a GitHub bug report described conversation history dropping out of cache, which can make later turns far more expensive cache bug thread GitHub bug report.
Claude Code 2.1.88 shipped the next day with fixes for prompt cache misses in long sessions, a --resume crash, and a StructuredOutput schema cache bug that reportedly caused about 50 percent failures in multi-schema workflows 2.1.88 release changelog thread.
The release also cleaned up misleading quota messaging. One fix changes a false "Rate limit reached" error into the actual entitlement error, which matters because users were clearly struggling to tell policy limits apart from client bugs changelog thread community reaction.
By early March 31, at least one prominent power user said the rate limit issue seemed fixed, suggesting Anthropic either repaired the caching path quickly or adjusted service-side behavior while the CLI patch rolled out user says fixed 2.1.88 release.

The interesting part is how fast this story snapped into focus. Users first complained about quotas exploding user report, then a reverse engineered theory tied the spike to broken cache reuse cache bug thread, then Anthropic shipped a release with explicit prompt-cache fixes in the changelog via the official release, the changelog entry, and bug threads like conversation history invalidated on subsequent turns. For engineers using long sessions, subagents, and resume flows, the real story here is simple: cache correctness is product reliability.

Cache misses were probably the real quota killer

Alex Volkov

@altryne

·Follow

PSA: If you've been running out of Claude session quotas on Max tier, you're not alone. Read this. Some insane Redditor reverse engineered the Claude binaries with MITM to find 2 bugs that could have caused cache-invalidation. Tokens that aren't cached are 10x-20x more expensive Show more

Alex Volkov

@altryne

My feed is showing me a bunch of folks who tapped out their whole usage limits on Mon/Tue. Is this your experience? Please comment, I want to understand how widespread this is

5:55 PM · Mar 30, 2026

5.0K

Read 223 replies

The most credible explanation for the quota spike was not that Anthropic suddenly made Claude Code dramatically stingier. It was that cached context stopped being reused consistently.

That theory lines up with the public bug report in issue #40524, where a user described conversation history getting invalidated on later turns, leaving only the system prompt cached and forcing large cache rewrites. It also lines up with a separate issue #40851, where a Max $100 user said a mostly text-only session with about 15 prompts consumed 93 percent of quota.

In practice, that is the kind of bug power users notice immediately. If your workflow depends on long-running sessions, background agents, or --resume, cache misses do not just make the tool slower. They make the quota meter look insane.

2.1.88 patches the expensive paths

Claude Code Changelog

@ClaudeCodeLog

·Follow

Claude Code 2.1.88 has been released. 41 CLI changes, 3 system prompt changes Highlights: • Agent guidance adds 'never delegate understanding': agents must verify comprehension to avoid misdelegation • Fixed StructuredOutput schema cache bug causing ~50% failures in Show more

Watch on X

12:19 AM · Mar 31, 2026

995

Read 63 replies

Anthropic's 2.1.88 release reads like a direct response to exactly that failure mode. The headline fix is explicit: prompt cache misses in long sessions caused by tool schema bytes changing mid-session are fixed in the release notes.

The rest of the changelog matters too:

Changelog entry: fixed prompt cache misses in long sessions.
Fixed a StructuredOutput schema cache bug that reportedly caused about 50 percent failures in multi-schema workflows changelog thread.
Fixed a --resume crash when transcripts contained tool results from an older CLI version or an interrupted write changelog thread.
Fixed misleading "Rate limit reached" messages when the API actually returned an entitlement error changelog thread.
Fixed /stats undercounting tokens by excluding subagent and fork usage changelog thread.

That is a bigger deal than it sounds. Broken caching inflates cost. Broken error labels send people chasing the wrong root cause. Broken stats make it harder to prove what happened. 2.1.88 hits all three.

Agent-heavy workflows hit the wall first

Jeffrey Emanuel

@doodlestein

·Follow

Replying to @thsottiaux

Someone else pointed out that my clankers already did this integration (hard for me to keep up with everything they do across all my projects!): x.com/maksymsherman/…

Maksym Sherman

@MaksymSherman

For 3, look at github.com/Dicklesworthst…

7:36 AM · Mar 30, 2026

Usage started looking normal again

Jeffrey Emanuel

@doodlestein

·Follow

Claude Code rate limit issues seem to be fixed for me. Not sure if they fixed a bug in the harness, made a policy change decision for everyone, or just tweaked it for me (someone was in touch with me from Anthropic about it). Either way, I’ll take it! Feels good to be back.

3:24 AM · Mar 31, 2026

394

Read 78 replies

By the morning of March 31, one of the most vocal affected users said the problem seemed resolved, though he was unsure whether Anthropic had fixed a harness bug, changed policy, or adjusted his account specifically user says fixed. That uncertainty is annoying, but the timing is hard to miss.

Users spent March 30 reporting blown quotas user report, the cache invalidation theory gained traction cache bug thread, and the 2.1.88 release landed with cache and schema fixes 2.1.88 release. For engineers deciding whether to trust long Claude Code sessions again, that sequence matters more than the drama did.

🧾 More sources

TL;DR3 tweets

Top-line claims that connect the quota spike reports, cache invalidation theory, the 2.1.88 fixes, and early signs of recovery.

Cache misses were probably the real quota killer2 tweets

Primary evidence and corroborating bug reports pointing to cache invalidation as the main cause of sudden quota inflation.

2.1.88 patches the expensive paths1 tweets

Official release evidence showing prompt-cache, schema-cache, resume, and quota-message fixes in Claude Code 2.1.88.

Agent-heavy workflows hit the wall first1 tweets

Commentary from power users showing why long-running, concurrent, and hook-heavy workflows were hit hardest.

Usage started looking normal again1 tweets

Follow-up evidence that at least some affected users saw the quota behavior return to normal after the fixes landed.