Odysseys
Benchmarking Web Agents on Realistic Long Horizon Tasks
Odysseys is an open-source benchmark and evaluation toolkit for long-horizon web agents, with 200 realistic web tasks derived from browsing sessions and rubric-based live-Internet scoring.
Recent stories
0 linked stories
No linked stories yet.