SLEIGHT-Bench
Benchmark suite for model and agent evaluation
Benchmark-style software product for evaluating model or agent behavior under the SLEIGHT-Bench task suite.

Recent stories
0 linked stories
No linked stories yet.
Benchmark suite for model and agent evaluation
Benchmark-style software product for evaluating model or agent behavior under the SLEIGHT-Bench task suite.
