Claude Code users reported steeper caps and week-long waits while sharing ways to cut usage, including /context audits, /clear, smaller models, and RTK log compression. The posts point to token burn from mounted MCP servers, long chat history, raw logs, and multi-agent concurrency, so teams may need to trim runtime load.

/context audit plus config and prompt cleanup "cut token usage by 60%" while finding that "35% of my context" was gone before coding even started workflow thread context audit.CLAUDE.md instructions, long chat histories, verbose prompts, and raw terminal output; the same thread says unused MCP tools and Skills dropped preloaded context from 35% to 10% after cleanup context audit and warns that Claude Code "sends the full conversation history" on every message session management.The clearest practitioner takeaway is that token burn often starts before the first real prompt. In aibuilderclub's /context breakdown, the initial audit showed that "35% of my context was already gone" because too many MCP servers and Skills were mounted, and pruning that setup dropped the baseline to 10%.
The same thread says persistent instructions and session carryover are the next two leaks. According to the session post, CLAUDE.md is loaded into every session, so verbose standing instructions compound on every run, while Claude Code "sends the full conversation history with every message," making /clear or a fresh session a direct cost-control tool when work shifts topics.
Model choice and output hygiene matter too. Aibuilderclub's model advice says Opus and Ultrathink should be reserved for "complex architecture and logical reasoning," with Sonnet covering routine edits and e2e tasks. For shell-heavy workflows, the RTK post recommends compressing raw logs before they enter context; Jason Zhou's shared screenshot shows RTK reporting 1.7M tokens saved, or 64.4%, across 1,227 commands.
Several users describe a second constraint beyond total token spend: rate limits that break multi-agent patterns. Doodlestein's thread says the new pain is "limits on the number of requests per minute," which "basically penaliz[es] the use of concurrent agents," forcing them to move most work to Codex even though they still prefer Claude Code features like session search, looping check-ins, and pre-tool hooks.
That complaint lines up with other reports of stricter ceilings across coding tools. Kol Tregaskes' screenshot shows a Codex lockout with a reset almost a week away, alongside a separate reliability complaint that desktop app threads "keep disappearing" after restart. A supporting post from dexhorthy token smarter post frames the broader shift as moving from "token harder" to "token smarter": less brute-force spend, more attention to avoiding "slop architecture" and wasted context in day-to-day agent workflows.
1/ Claude Code users: token-saving tactics that actually work 💰 My Claude Code token usage started climbing fast, and my subscription limit wasn't enough. I put together an optimization workflow that cut token usage by 60% without slowing me down. Here are the core steps Show more
Since Claude Code is nearly useless to me until these new draconian rate limits go away (note: I’m not talking about usage limits; these are limits on the number of requests per minute, basically penalizing the use of concurrent agents), I thought I’d list the 3 biggest features Show more
Okay, these are getting more severe. I'll have to wait nearly a week. Also, my threads in the Code Windows desktop app keep disappearing; apparently, they still exist, but I cannot see them after I restart the app. This has happened over and over.