AppWorld

Benchmark and execution environment for generalist agents

An open-source benchmark and execution environment for evaluating generalist agents across multiple apps and end-to-end tasks.

Recent stories

0 linked stories

No linked stories yet.