releaseJune 25, 2026

OpenRouter launches MCP server with live pricing, benchmarks, and test inference

OpenRouter released an MCP server that lets agents query live model pricing, benchmark scores, provider data, docs, and run test inference from the CLI. That replaces stale model knowledge with current routing data inside long-running agent workflows.

5 min read

OpenRouter launches MCP server with live pricing, benchmarks, and test inference

TL;DR

OpenRouter shipped an MCP server that lets agents query live model pricing, benchmark scores, latency, popularity, docs, and test inference instead of relying on stale model knowledge, according to OpenRouter's launch post and OpenRouter's feature list.
Setup is intentionally tiny: OpenRouter's install post shows a two-command flow using claude mcp add --transport http openrouter and claude mcp login openrouter.
The benchmark layer is backed by OpenRouter's new Benchmarks API, which OpenRouter's API post says exposes live scores including Artificial Analysis and Design Arena.
Provider choice is part of the product, not background plumbing: OpenRouter's GLM-5.2 routing post points users to a :nitro model slug that automatically selects the fastest provider from live traffic data.
Early reaction landed on the same point. omarsar0's reply framed the tool as a way for long-running agents to pick the right intelligence level, while altryne's throughput post argued OpenRouter is where sustained real-world provider speed is easiest to compare.

You can watch the launch demo, inspect the Benchmarks API docs, and jump straight to the live provider rankings. The weirdly useful detail is that model selection, price checks, and even prompt-side testing now sit inside the same agent loop. OpenRouter also used GLM-5.2 as the first showcase case, with Design Arena feeding the benchmark layer and live routing posts turning provider competition into part of the interface.

OpenRouter MCP

The pitch is straightforward: agents already write code and call tools, but they still guess at model choice from frozen training data. OpenRouter's MCP server turns that missing context into a callable surface inside the CLI, so the agent can ask for current model rankings, pricing, latency, and documentation while it is working.

The install flow in OpenRouter's install post is only two commands:

claude mcp add --transport http openrouter ...
claude mcp login openrouter

OpenRouter also says in the same post that it works with "all your favorite agents," which matters because MCP has become the default interop layer for agent tools faster than most vendor SDK efforts. That showed up immediately in reaction, with omarsar0's post joking that "MCP won."

The data surface

The first useful thing here is the actual inventory. OpenRouter's feature list says connected agents get:

a 400+ model catalog
benchmark scores
per-provider pricing
per-provider latency
test inference
docs search

That list is broader than a pricing plugin or a model directory. The built-in examples in OpenRouter's feature list are also specific enough to hint at usage patterns: asking for "top coding models under $2/M," finding the most popular image model, or running a real prompt to compare answers and cost without leaving the CLI.

OpenRouter's follow-up in its official MCP reminder post adds one more practical angle: avoiding the wrong model slug getting hard-coded into a repo. That is a small detail, but it is exactly the kind of mistake long-running agent loops make when their model map is out of date.

Benchmarks API

The rankings exposed through MCP come from a separate Benchmarks API. According to OpenRouter's API post, that API includes live scores from Artificial Analysis and Design Arena, then makes them queryable by the agent.

OpenRouter used one concrete result to show the idea working: its post about the API said Z.ai's GLM-5.2 was currently the best available model for both coding and design. Design Arena's post confirmed its benchmark now helps power the MCP product.

That matters because the MCP server is not just returning OpenRouter's internal preferences. It is exposing a benchmark aggregation layer that can be queried alongside price and latency, then used in the same workflow as test inference.

Provider routing

The more interesting layer sits below the model name. altryne's post argued that peak tok/s on Artificial Analysis and sustained production throughput are different things, then pointed to OpenRouter as the easiest place to compare how providers actually behave under live usage.

The screenshots attached to that post and Wafer's speed claim show the shape of the data OpenRouter is surfacing: price per million tokens, latency, throughput in tps, and uptime by provider for the same model.

OpenRouter has been building toward this for a while. its earlier cost-reduction thread links the MCP launch to a broader run of tooling around usage analytics, benchmark Pareto curves, subagent orchestration, and cost simulation.

GLM-5.2 Nitro

The launch's most concrete routing example came a few hours later. In OpenRouter's GLM-5.2 update, the company told users to call z-ai/glm-5.2:nitro, a model slug that continuously selects the fastest provider based on live traffic data.

That post also named new fast variants from Wafer and Fireworks. Wafer then pushed harder on the same claim: its launch post advertised 150 to 250 tok/s in production for serverless users, while a later Wafer post said GLM-5.2 Fast had moved to the top of OpenRouter's throughput table.

OpenRouter linked the public rankings page in its live rankings post, which turns provider competition into something agents and humans can inspect directly. That is a more interesting end state than a static model leaderboard, because the same model slug can now route differently as live traffic changes.