AA-Briefcase
Agentic Knowledge Work Benchmark
AA-Briefcase is a private Artificial Analysis benchmark/evaluation for frontier agentic capability in long-horizon knowledge work. It evaluates models across four multi-week professional workflow projects with thousands of input files and 91 tasks, requiring deliverables such as spreadsheets, presentations, memos, financial models, board presentations, and design mock-ups, and scores submissions using rubric checks plus pairwise analytical-quality and presentation comparisons.
Recent stories
0 linked stories
No linked stories yet.