releaseMarch 9, 2026

Claude Code launches Code Review: parallel PR agents flag bugs at $15–25 per review

Anthropic launched Code Review in research preview for Team and Enterprise, using multiple agents to inspect pull requests, verify findings, and post one summary with inline comments. Teams shipping more AI-written code can try it to increase review depth, but should plan for higher token spend.

3 min read

Claude Code launches Code Review: parallel PR agents flag bugs at $15–25 per review

TL;DR

Anthropic launched Code Review for Claude Code as a research preview for Team and Enterprise, with a system that sends “a team of agents” across each new pull request and returns one summary comment plus inline bug flags, according to Anthropic's launch thread.
Anthropic says the feature is optimized for depth rather than a quick skim: agents search in parallel, “verify each bug to reduce false positives,” and rank findings by severity, as described in the launch thread and echoed by a summary thread.
The company’s internal test numbers are unusually specific: PRs with substantive review comments rose from 16% to 54%, engineers marked fewer than 1% of findings incorrect, and PRs over 1,000 lines surfaced issues 84% of the time with 7.5 findings on average, per Anthropic's data and an Anthropic engineer.
Anthropic is also setting expectations on cost: its pricing note says reviews generally average $15–25 in token usage, which is higher than the lighter-weight open-source GitHub Action referenced in a follow-up post.

What shipped and how it works

Code Review adds an automated PR-review path to Claude Code. When a pull request opens, Anthropic says Claude “dispatches a team of agents to hunt for bugs,” with those agents running independently, cross-checking one another’s findings, and then collapsing the output into a single high-signal summary plus inline comments in the diff, according to Anthropic's launch thread and the product post.

This is explicitly a deeper pass, not the company’s existing lightweight PR tooling. Anthropic says Code Review “optimizes for depth,” while a follow-up note says the older open-source pr-review Skill remains available but the new setup is “far more powerful.”

What Anthropic says it found in testing

Anthropic built the feature against its own workflow first. An Anthropic engineer said code output per engineer is up “200% this year” and that reviews had become the bottleneck, which frames why the company is spending more tokens on deeper review passes rather than faster, cheaper checks Engineer context.

The internal metrics are the main technical payload in this launch. Anthropic says PRs with substantive review comments increased from 16% to 54%, and fewer than 1% of review findings were marked incorrect by engineers Anthropic's data. On PRs larger than 1,000 lines, it says 84% produced findings with an average of 7.5 issues each large-PR stats. A supporting walkthrough adds that a typical review takes about 20 minutes and cites examples including a one-line auth change that “would have broken production authentication” and a type mismatch that silently cleared an encryption-key cache supporting walkthrough.

Where it fits in the review stack

Anthropic is positioning Code Review above its existing GitHub Action in both depth and price. The company says reviews generally average $15–25 and scale with PR complexity, while a pricing summary reiterates that this is “more expensive than lighter-weight options” and is currently limited to Team and Enterprise in research preview.

The immediate practitioner pushback is about independence, not capability. One reaction argues that “creation and verification are different engineering problems” and questions whether the same vendor stack should both write and review code trust critique; another puts it more bluntly, saying they “wouldn't want the same system reviewing my code that wrote it” skeptical reaction. That makes this launch less a generic AI-review feature than a bet that multi-agent verification can raise review depth enough to justify both the extra spend and the governance questions.

TL;DR

What shipped and how it works

What Anthropic says it found in testing

Where it fits in the review stack

Discussion across the web