FrontierSWE
Benchmarking software engineering skill at the edge of human ability.
A public benchmark for evaluating coding agents on ultra-long-horizon software engineering tasks across implementation, performance engineering, and ML research.
Recent stories
0 linked stories
No linked stories yet.