ClawMark
A Living-World Benchmark for Multi-Day, Multimodal Coworker Agents
A living-world benchmark for evaluating coworker agents across multi-day, multimodal, cross-service tasks with deterministic rule-based scoring.

Recent stories
0 linked stories
No linked stories yet.