ProgramBench

Benchmark for program synthesis

Visit site

Meta research benchmark for evaluating program synthesis and code-generation models.

Recent stories

1 linked story

newsPRIMARY2026-05-05

ProgramBench reports 0% on ffmpeg, SQLite, and ripgrep rebuilds without internet

The SWE-Bench team released ProgramBench, which asks models to rebuild real software from executables alone, and the initial complete-pass score is 0% across models. It matters as a harsher long-horizon coding benchmark, though its all-tests-pass metric and simpler harness make it a stress test rather than a direct proxy for production agents.