AA-Briefcase

Agentic Knowledge Work Benchmark

AA-Briefcase is a private Artificial Analysis benchmark/evaluation for frontier agentic capability in long-horizon knowledge work. It evaluates models across four multi-week professional workflow projects with thousands of input files and 91 tasks, requiring deliverables such as spreadsheets, presentations, memos, financial models, board presentations, and design mock-ups, and scores submissions using rubric checks plus pairwise analytical-quality and presentation comparisons.

Recent stories

0 linked stories

No linked stories yet.