ProgramBench
Can language models rebuild programs from scratch?
Open-source benchmark software that evaluates whether coding agents can rebuild programs from scratch given only a compiled executable and documentation.

Recent stories
0 linked stories
No linked stories yet.