AppWorld
Benchmark and execution environment for generalist agents
An open-source benchmark and execution environment for evaluating generalist agents across multiple apps and end-to-end tasks.

Recent stories
0 linked stories
No linked stories yet.
Benchmark and execution environment for generalist agents
An open-source benchmark and execution environment for evaluating generalist agents across multiple apps and end-to-end tasks.
