Evals Benchmarks Voice Agents Realtime AI LLM as Judge

Artificial Analysis

AI model benchmarks and rankings

Visit site

AI model benchmarking and analysis platform for comparing model performance, rankings, and pricing across providers.

Screenshot of Artificial Analysis website

Recent stories

2 linked stories

newsPRIMARY2026-06-18

Artificial Analysis launches AA-Briefcase with Claude Fable 5 at 1587 Elo

Artificial Analysis launched AA-Briefcase, a benchmark for multi-week knowledge-work projects with thousands of source files, and Claude Fable 5 leads at 1587 Elo. The first results show a wide cost spread, so teams should compare both quality and task cost before choosing a model.

newsPRIMARY2026-05-28

Artificial Analysis launches AA-WER Streaming with Cartesia Ink-2 at 3.7% WER

Artificial Analysis launched AA-WER Streaming to benchmark streaming speech-to-text models on accuracy and latency for voice agents. The first leaderboard puts Cartesia Ink-2 and ElevenLabs Scribe v2 on the price-latency frontier, so teams should compare cost against latency before choosing a model.