SkillsBench
Benchmarking How Well Agent Skills Work
SkillsBench is an AI-agent benchmark and evaluation framework/dataset for measuring how well agent skills work and how effectively models and agent harnesses use skills on expert-curated real-work tasks across diverse domains.
Recent stories
0 linked stories
No linked stories yet.