breakingJune 4, 2026

Anthropic reports Claude wrote 80% of merged code

Anthropic published internal metrics showing Claude wrote 80% of merged code, with 8x engineer output and 52x training-code speedups in Mythos Preview. The post matters because it gives a rare lab-side look at AI-assisted engineering gains, while still saying research judgment remains a bottleneck and recursive self-improvement is unproven.

6 min read

Anthropic reports Claude wrote 80% of merged code

TL;DR

Anthropic says Claude now authors more than 80% of the code merged into its own codebase, up from low single digits before Claude Code’s February 2025 research preview, according to AnthropicAI's launch post and When AI builds itself.
The same post says the typical Anthropic engineer now merges 8x as much code as in 2024, while Alex Albert's summary and RLanceMartin's repost both highlight a 50 point jump on the hardest open-ended tasks in six months.
Anthropic’s headline benchmark is a training-code optimization test where Mythos Preview reached a roughly 52x speedup, while AnthropicAI's correction says the earlier Opus 4 comparison point was May 2025, not May 2024.
Anthropic also claims Mythos Preview picked a better next research step than the human actually took 64% of the time, but AnthropicAI's caveat thread says research taste and judgment still block full recursive self-improvement.
The company paired the RSI post with a policy line that scaling01's quote surfaced, saying it would be good for the world to have the option to slow or temporarily pause frontier AI development, while Mythos access itself remains limited through Project Glasswing.

You can read Anthropic’s full RSI essay, skim the main Hacker News thread, and check Anthropic’s separate Project Glasswing update, which quietly says Mythos Preview is still being expanded to selected organizations rather than broadly shipped.

Claude's internal throughput numbers

Anthropic’s most concrete disclosure is not the recursive-self-improvement framing. It is the lab-side operating metric: Claude now writes most of Anthropic’s merged code.

According to When AI builds itself, more than 80% of code merged into Anthropic’s codebase in May 2026 was authored by Claude. The same post says lines of code merged per engineer per day stayed roughly flat from 2021 through 2024, then inflected upward in 2025 when Claude started running code instead of only suggesting snippets.

The post ties that second jump to longer autonomous runs in 2026. In Anthropic’s telling, the typical engineer is now merging about 8x as much code as in 2024, and many researchers have gone months without hand-writing code at all, as Alex Albert's metric summary puts it.

A separate reaction from Ethan Mollick says the 80% number matches outside signals that AI coding throughput is still climbing, but Anthropic’s own post is explicit that line count is an imperfect proxy for productivity.

The evals Anthropic used

Anthropic did not just publish one productivity chart. It bundled several internal evals that all point in the same direction: longer task horizons, bigger engineering speedups, and better recovery when research goes off track.

The most striking number is the training-code optimization test:

A skilled human typically needs 4 to 8 hours to get a 4x speedup, according to AnthropicAI's speedup benchmark.
Anthropic says Opus 4 averaged about a 3x speedup in May 2025, per AnthropicAI's correction.
Mythos Preview reached about a 52x speedup in April 2026, per AnthropicAI's speedup benchmark.

Anthropic’s research-assistance eval is different. It takes a real research session where a human made a bad next-step decision, shows the model the transcript up to that point, and asks what to do next. On that setup, AnthropicAI's next-step benchmark says Mythos Preview beat the human’s actual choice 64% of the time, up from 22% in 2024.

The official essay adds two more useful datapoints: open-ended engineering task success rose from roughly 26% to 76% over six months, and METR measured Mythos Preview at "at least" 16 hours of autonomous work, which Ethan Mollick noted arrived earlier than some superforecasters expected.

Research judgment is still the stated bottleneck

Anthropic’s post is unusually blunt about what it does not think Claude can do yet. The missing piece is not code generation or experiment execution. It is deciding which problems are worth pursuing and when a line of work is a dead end.

According to AnthropicAI's caveat thread, recursive self-improvement is not guaranteed, and Anthropic does not yet know whether Claude has strong enough research judgment to drive the whole loop. When AI builds itself frames that gap as the current human comparative advantage.

That caveat matters because Anthropic is still arguing for compounding acceleration even under the conservative case. As scaling01's quote thread excerpted, the company says much of frontier-advancing work already looks automatable even if Claude never develops strong research taste.

The post then sketches three futures: a stall, a world of much smaller teams getting much more done with humans still setting direction, and full recursive self-improvement where progress is limited mostly by compute. Anthropic calls the middle path the likeliest one today, according to kimmonismus's long summary.

The caveats showed up fast

The post landed with exactly the kind of objections you would expect from engineers reading vendor self-measurement: quantity is not quality, smoothing can hide step changes, and one benchmark needed a same-day correction.

Anthropic corrected one of its most widely repeated claims within hours. The earlier model comparison in the training-code speedup benchmark was May 2025, not May 2024, and AnthropicAI's correction adds that backtests on models from May 2024 showed no speedup at all.

The main Hacker News discussion latched onto a separate issue: Anthropic’s own essay says lines of code are an imperfect measure and probably overstate the true productivity gain. eliebakouch's reply also questioned whether the chart presentation makes Mythos adoption look smoother and cleaner than a raw step-function view would.

None of that negates the internal trend line. It does narrow the safest reading: Anthropic has published a rare internal operations snapshot, not an externally audited proof that engineering productivity rose by exactly the same multiple as its line-count chart.

Mythos is still a gated system

The RSI post reads like a frontier capability memo, but Anthropic’s own rollout tells a simpler story about where Mythos actually is: still restricted, still partner-gated, and still being expanded selectively.

Anthropic’s Project Glasswing update says it is extending access to Mythos Preview to about 150 new organizations in more than 15 countries. Wes Roth's Glasswing note surfaced that expansion the day before the RSI post.

That pairing is the most useful context around the whole announcement. Anthropic is publishing internal evidence for recursive-self-improvement pressure while keeping the model that generated the headline numbers inside a controlled access program aimed at cybersecurity and trusted partners, not a general API release.

Even the loudest community reactions kept that split in view. bridgemindai's restricted-access reaction called the productivity curve wild, but also noted Mythos is still gated to a small set of organizations. For now, the most consequential part of Anthropic’s story is not that recursive self-improvement has arrived. It is that one lab is showing what its own engineering organization looks like when a preview model is already writing most of the code.