MIT study reports 300% more files but 30% more releases after AI coding adoption
MIT-linked analysis says AI coding tools sharply raise local code output, but most of the gain disappears by review and release. Teams should watch downstream throughput, since project creation rose without matching demand signals in separate Hugging Face Spaces data.

TL;DR
- rohanpaul_ai's study summary says the new NBER working paper tracked more than 100,000 GitHub developers and found a steep drop from AI-driven coding activity to shipped software.
- According to rohanpaul_ai's breakdown, autocomplete, interactive agents, and autonomous agents raised commits by about 40%, 140%, and 180%, while the biggest gain shrank to roughly 50% for projects and 30% for releases.
- The paper abstract on RePEc, echoed by rohanpaul_ai's FT-linked thread, frames the gap as a weak-link problem: code generation sped up faster than review, integration, testing, and launch.
- petergostev's Hugging Face Spaces check extends the paper's marketplace result with a separate community dataset, arguing that generated supply is rising faster than demand signals.
- Commentary from levie's post and justsisyphus's benchmark critique pushes the same point from opposite ends: distribution and real-world codebase messiness still decide whether extra code turns into product value.
You can read the working paper, skim the full abstract and quantitative summary, and see the FT's pickup of the same result. The weirdly useful detail is the shape of the drop, commits to projects to releases, not just the headline percent. A separate Lobsters discussion zeroes in on the same choke points, while petergostev's Spaces thread argues the demand side may already look saturated.
Production funnel
The paper's central move is to measure software production as a pipeline instead of a single coding metric. The authors combine GitHub activity with AI usage telemetry, then follow work from local code output through review and all the way to release, according to the RePEc abstract.
The step-down is the story:
- Files created or edited: nearly 300% higher, per rohanpaul_ai's FT-linked thread
- Reviewed work: roughly 150% higher, per the same thread
- Releases: about 30% higher, per rohanpaul_ai's study summary
- For autonomous agents specifically, commits rose 180%, projects rose about 50%, and releases rose about 30%, per the same summary
That makes the current AI coding boom look less like an end-to-end throughput jump and more like a local acceleration inside one stage of the pipeline.
Weak links
The paper's abstract estimates an elasticity of substitution between AI and human effort of about 0.25, per RePEc. In plain English, big improvements in code generation only replace a small amount of the surrounding human work.
The bottlenecks named in the paper and the surrounding discussion are concrete:
- Human review
- Integration with the rest of the product
- Testing and edge-case cleanup
- Packaging and release
- Product judgment about what is worth shipping
kilocode's one-line summary compresses the whole argument into a cleaner phrase: the bottleneck moved. Levie adds a second layer in levie's post, arguing that for enterprise software the expensive part is often distribution, implementation, and support, not raw code production.
Supply without demand
The paper does not stop at shipped software. According to the RePEc abstract, the authors also looked across four major app marketplaces and found more new apps without a corresponding increase in total usage.
Peter Gostev's Hugging Face Spaces thread in petergostev's Hugging Face Spaces check is not part of the paper, but it points in the same direction: more generated projects, flat likes. That mirrors the paper's claim that software supply is rising faster than user demand.
This is where the study gets sharper than the usual "AI writes more code" chart. More artifacts showed up in repositories and marketplaces, but the evidence cited so far does not show a matching jump in adoption.
Dirty contextual code
One reason this paper will land with engineers is that it matches a complaint benchmark wins usually dodge. justsisyphus's benchmark critique argues that agent benchmarks still miss "dirty, contextual code," which is where review, integration, and maintenance costs pile up.
The Lobsters thread around the paper raises the same implementation-level concern from another angle: even when code generation gets cheaper, contextual understanding, ownership, and production hardening still gate what ships. That does not contradict the benchmark gains. It explains why benchmark-like speedups can coexist with a much smaller release bump in production data.