Skip to content
AI Primer
breaking

OpenRouter reports four open-weight models handle agents; Chinese models hit 45% of traffic

OpenRouter said four open-weight models now handle real agentic workloads, and a JPMorgan report put Chinese models at about 45% of platform traffic. The shift matters because teams are optimizing for price, hosting, and task fit instead of defaulting to frontier APIs.

5 min read
OpenRouter reports four open-weight models handle agents; Chinese models hit 45% of traffic
OpenRouter reports four open-weight models handle agents; Chinese models hit 45% of traffic

TL;DR

  • OpenRouter's June 2026 post says DeepSeek V4 Flash, GLM 5.2, MiniMax M3, and Nemotron 3 Ultra have crossed from hobbyist curiosity into real agentic use, with each model winning on a different mix of price, planning, multimodality, or deployment story.
  • bridgemindai's market-share chart and rohanpaul_ai's JPMorgan summary both point to the same shift: Chinese models now account for roughly 45% to 47% of OpenRouter traffic, while U.S. model share fell from about 73% to 33% over the last year.
  • kylebrussell's internal benchmark screenshot gives the cleanest workload-level datapoint in the evidence pool, GLM 5.2 beat Opus 4.8 on an internal mortgage-servicing knowledge-base update while costing less than half as much.
  • OpenRouter's MCP launch thread shows where this is heading product-wise: model selection is being pulled into the agent loop itself, with live pricing, benchmarks, provider latency, and test inference exposed over MCP.

You can read OpenRouter's model roundup, open the MCP docs, and check the new OpenWebUI integration. J.P. Morgan's report is the source behind the 45% traffic claim, and OpenRouter's correction reply quietly fixes one table row from "44 (pro)" to "40," which is a good reminder that even the comparison cards were still moving.

Four open-weight models

OpenRouter's shortlist is unusually specific. It is not "open models are catching up." It is four named models, each attached to a concrete use case.

  • DeepSeek V4 Flash: the cheapest frontier-class coding option in OpenRouter's table, at $0.054 input and $0.242 output per million tokens, with roughly 84 tok/s in the attached card from OpenRouter's post.
  • GLM 5.2: the "top open" planning pick in the same table, priced at $0.447 input and $3.31 output per million, according to OpenRouter's comparison card.
  • MiniMax M3: the multimodal long-context entry for screenshot, UI, diagram, and document-heavy work, per OpenRouter's comparison card.
  • Nemotron 3 Ultra: the U.S.-built, fully open option on NVIDIA's stack, again from OpenRouter's comparison card.

The interesting footnote is that OpenRouter corrected the DeepSeek row after publication. In OpenRouter's reply, the company says one row should have read "40" instead of "44 (pro)."

Chinese token share

The biggest market signal in the evidence is not a benchmark. It is token flow.

According to bridgemindai's repost of the OpenRouter chart, U.S. models went from about 73% of OpenRouter token share in June 2025 to 33% in June 2026, while Chinese models rose from about 17% to 47%. rohanpaul_ai's Bloomberg-linked post gives the same directional read in plainer language: routine work is moving toward models that are cheaper, easier to customize, and less dependent on frontier-lab permissioning.

rohanpaul_ai's JPMorgan thread adds the price claim that gives this shift teeth, Chinese models can be up to 50 times cheaper on a per-token basis, while a follow-up post in the same thread says late-February OpenRouter traffic was already 5.3 trillion of 8.7 trillion tokens for Chinese models among the top 10. That same post names MiniMax, Moonshot AI, and Zhipu AI as the top three by token volume.

GLM as the Opus alternative

The most concrete "would you actually swap this into production" datapoint comes from kylebrussell's internal benchmark screenshot. On an internal mortgage-servicing process knowledge-base update, GLM passed 49 of 69 cases versus Opus at 47 of 68 under one judge, then 45 of 69 versus 42 of 68 under GPT-5.5, while costing $13.19 versus $28.08.

Speed is the other half of the pitch. wafer_ai's OpenRouter launch post advertised GLM 5.2 Fast at 150 to 250 tok/s in production, and wafer_ai's provider screenshot showed Wafer Fast at 203 tok/s with 100% uptime in the displayed period.

That does not settle every workload question. In altryne's reply about 1M context, altryne argues long-context quality for open models is still an open question, especially near the advertised maximum window.

Model choice inside the agent

OpenRouter's MCP is the product move that matches the traffic shift. Instead of asking developers to remember stale model lore, OpenRouter's launch thread exposes a live model catalog, benchmarks, provider pricing and latency, test inference, and docs search directly inside an agent.

The command surface is short in OpenRouter's setup example:

  • claude mcp add --transport http openrouter ...
  • claude mcp login openrouter

The most useful detail is in OpenRouter's follow-up: this is partly about avoiding bad model slugs and outdated defaults getting baked into codebases. OpenRouter's Benchmarks API post says the rankings can pull from sources including Artificial Analysis and Design Arena, so the agent can query live scores rather than rely on its frozen training cutoff.

OpenRouter becomes the interface layer

The MCP launch was not an isolated feature ship. OpenRouter's OpenWebUI announcement positions OpenRouter as both inference layer and chat surface, with one UI, one bill, and access to 400-plus models through a single API.

The surrounding ecosystem points the same way. SakanaAILabs on Fugu-Ultra brought Sakana's compound model onto OpenRouter, while Teknium's OpenRouter token milestone said Hermes Agent alone had already pushed past 1 trillion tokens in a day on the platform. The platform story is getting less about a single best model, and more about routing, packaging, and stitching many models into one operational layer.

Further reading

Discussion across the web

Where this story is being discussed, in original context.

On X· 4 threads
Chinese token share2 posts
GLM as the Opus alternative2 posts
Model choice inside the agent1 post
OpenRouter becomes the interface layer1 post
Share on X