Provider limits, throttling, and capacity constraints.
Claude Code users reported steeper caps and week-long waits while sharing ways to cut usage, including /context audits, /clear, smaller models, and RTK log compression. The posts point to token burn from mounted MCP servers, long chat history, raw logs, and multi-agent concurrency, so teams may need to trim runtime load.
Users report new request-per-minute caps that trigger after three to four concurrent agents, and Boris Cherny says efficiency work is underway. The issue hits the multi-agent workflows Anthropic has been promoting, separate from five-hour usage buckets.
Anthropic confirmed new peak-time metering that burns through 5-hour Claude sessions faster, and multiple power users posted 529 overloaded errors and early exhaustion. If you rely on Max plans for coding, watch for session limits and consider moving daily work to Codex.
Anthropic said free, Pro, and Max users will hit 5-hour Claude session limits faster on weekdays from 5am to 11am PT, while weekly caps stay the same. Shift long Claude Code jobs off-peak and watch prompt-cache misses.
Z.ai released GLM-5-Turbo as a faster GLM-5 variant for OpenClaw-style tool use, with 202K context, OpenRouter access, and higher off-peak limits. Try it as a cheaper speed tier for agent workflows, but benchmark completion quality on your own tasks before wider use.
Anthropic is doubling Claude usage outside peak hours from Mar. 13 to Mar. 27, with the bonus applied automatically across Free, Pro, Max, Team, and Claude Code. Shift long runs and bulk jobs to off-peak windows to stretch limits without changing plans.
xAI released Grok 4.20 Beta in the API with reasoning, non-reasoning, and multi-agent variants, a 2M-token window, and lower pricing than Grok 4. Test it for long-context and speed-sensitive workloads, but compare coding performance against top rivals on your own evals.
Google AI Studio now lets developers set experimental per-project spend caps for Gemini API usage. Use it as a native billing guardrail, but account for roughly 10-minute enforcement lag and possible batch-job overshoot.
OpenAI says Codex capacity is lagging a demand spike, leaving some sessions choppy while the team adds more compute. If you depend on Codex in production workflows, plan for transient instability and keep fallback review or execution paths ready.
OpenAI acknowledged a Codex session hang that left some requests unresponsive, later said the issue had been stable for hours, and promised a rate-limit reset. Teams relying on Codex should re-check long runs and confirm quota restoration after the incident.