updateMarch 28, 2026

Claude Code limits concurrent agents as users report RPM caps

Users report new request-per-minute caps that trigger after three to four concurrent agents, and Boris Cherny says efficiency work is underway. The issue hits the multi-agent workflows Anthropic has been promoting, separate from five-hour usage buckets.

3 min read

Claude Code limits concurrent agents as users report RPM caps

TL;DR

Claude Code users are reporting a new class of limits that looks separate from the usual five-hour usage buckets: in one widely shared report, concurrent workflows started failing with "API Error: Rate limit reached" once "3 or 4 agents" were running.
The clearest practitioner complaint is about throughput, not total allowance. In a follow-up post, the same user said the problem is "rate limits for sending too many requests per minute," which breaks a workflow built around "concurrent agents."
Anthropic engineer Boris Cherny acknowledged the issue in a reply, saying the team is "working on improving this" with "efficiency wins incoming," but did not give a timeline or detail the new caps.
Users are also reading the behavior as time-dependent. One developer said in a separate post they were "shifting my working hours to avoid the peak hours penalty," suggesting the limits are already changing when high-throughput coding gets done.

What exactly is failing in Claude Code?

The reported breakage is on request rate, not just overall usage. In the original complaint, a Claude Code user said the service had become "useless" because it now "kicks in with like 3 or 4 agents going at once," and the attached terminal screenshot shows repeated "API Error: Rate limit reached" messages while four tasks are open, with only one in progress [img:0|Rate limit screenshot].

That distinction matters for engineering workflows because Anthropic's coding product has been pushing multi-agent patterns. In a clarification, the same user said they can work around ordinary usage limits by buying more Max accounts, but not RPM-style caps: "my entire workflow is to maximize the number of concurrent agents." In a later reply, they said the change had "cratered my throughput by like 99%" and argued they were now "paying for usage" they could not use effectively.

Timing is part of the complaint. In another reply in the thread, the user said the new behavior would at least make sense during "peak business hours" but was instead hitting on "Saturday night." Separately, another developer described a "peak hours penalty" strongly enough that they were moving their schedule to avoid it.

What has Anthropic said, and why does this matter now?

Anthropic has not publicly documented the cap in these posts, but it has acknowledged the problem. In Boris Cherny's reply, the response was brief: "Working on improving this. A bunch of efficiency wins incoming." That reads more like backend mitigation than a policy clarification, and it leaves open whether the constraint is temporary load shedding, a permanent concurrency guardrail, or both.

The timing matters because multi-agent coding is exactly where request bursts happen. Even outside the Claude Code thread, one user describing sub-agents said "mini-swarms" make it easier to burn through allotted compute, which matches the failure mode being reported here: not a single long task, but many short overlapping requests from parallel agents. That makes RPM ceilings a much bigger operational constraint than five-hour buckets for teams using Claude Code as a high-concurrency coding surface.

TL;DR

What exactly is failing in Claude Code?

What has Anthropic said, and why does this matter now?

Discussion across the web