Blueprint-Bench 2
Benchmark for evaluating AI agents
A benchmark product from Andon Labs for evaluating AI agents on realistic task workflows.

Recent stories
0 linked stories
No linked stories yet.
Benchmark for evaluating AI agents
A benchmark product from Andon Labs for evaluating AI agents on realistic task workflows.
