ClawMark
A Living-World Benchmark for Multi-Day, Multimodal Coworker Agents
A living-world benchmark for evaluating coworker agents across multi-day, multimodal tasks in dynamic cross-service environments.
Recent stories
0 linked stories
No linked stories yet.