An AI model routing platform and API that provides a single OpenAI-compatible interface to many models from multiple providers.

Recent stories

47 linked stories

newsPRIMARY2026-07-07

OpenRouter benchmarks 1,730 visual-reasoning questions on low-detail image costs

OpenRouter tested 1,730 visual-reasoning questions across five models and found low-detail images often reduced accuracy while increasing reasoning-token spend. Caps on reasoning effort had the biggest billing impact.

workflowPRIMARY2026-07-05

OpenRouter claims 24x inference-cost savings with MCP model routing

OpenRouter published an MCP workflow that it says cut inference costs 24x at comparable quality. The MCP lets the model choose providers using codebase context plus OpenRouter benchmark, aggregate-usage, and live-performance data.

newsPRIMARY2026-06-27

OpenRouter reports four open-weight models handle agents; Chinese models hit 45% of traffic

OpenRouter said four open-weight models now handle real agentic workloads, and a JPMorgan report put Chinese models at about 45% of platform traffic. The shift matters because teams are optimizing for price, hosting, and task fit instead of defaulting to frontier APIs.

releasePRIMARY2026-06-25

OpenRouter launches MCP server with live pricing, benchmarks, and test inference

OpenRouter released an MCP server that lets agents query live model pricing, benchmark scores, provider data, docs, and run test inference from the CLI. That replaces stale model knowledge with current routing data inside long-running agent workflows.

releasePRIMARY2026-06-24

OpenRouter launches Image API with typed capabilities and exact USD cost

OpenRouter released a dedicated Image API that normalizes request shapes across 30-plus models from eight providers. Agents can inspect limits, passthrough options, streaming, and exact per-call cost without hardcoding vendor quirks.

releasePRIMARY2026-06-14

OpenRouter launches Fusion API with model panels and judge routing

OpenRouter launched Fusion, a server-side panel API that sends prompts to multiple models and combines one answer. Early logs also showed a web-path issue where Fusion still invoked Claude Opus 4.8 as judge and billed for it until API-side control was clarified.

newsSECONDARY2026-06-14

Fable users compare GLM-5.2, GPT-5.5, and model panels on one-shot UI work

Two days after Fable 5 went offline, developers started testing GLM-5.2, GPT-5.5, and multi-model panels against the kinds of one-shot frontend and greenfield builds Fable handled well. The early pattern is that replacements cover much of the work, but Fable still leads on UI taste and first-pass product completion.

releasePRIMARY2026-06-13

OpenRouter launches Fusion API with DRACO panel tests at 1% of Fable

OpenRouter launched Fusion, a server-side panel API that fans prompts to multiple models, judges the outputs, and returns one synthesized answer. The company said DRACO landed within 1% of Fable at roughly half the price, but the published evals do not cover long-horizon tasks.

newsSECONDARY2026-06-10

Fable 5 users report 90-minute Max caps and June 23 plan cutoff

One day after Fable 5 launched, users reported burning through Max quotas in about 90 minutes while Anthropic told subscribers the model will leave Claude plans on June 23 until capacity improves. If you depend on Fable, plan for quota pressure and route critical jobs elsewhere.

newsSECONDARY2026-06-09

OpenRouter, OpenCode, and 5 others add Claude Fable 5 on launch day

OpenRouter, OpenCode, Lovable, Cline, Browser Use Terminal, Nous Portal, and Venice all added Fable 5 within hours of launch. The rollouts put the model into gateways, coding agents, browser agents, and chat clients on day one.

newsPRIMARY2026-06-06

OpenRouter adds cache-hit pricing telemetry as Devin exposes adaptive routing

Vendors pushed routing and spend controls closer to the default app layer, including OpenRouter's cache-hit pricing telemetry and Devin's adaptive routing. The discussion frames model choice more as a budget-control problem than a pure quality setting.

releasePRIMARY2026-06-03

OpenRouter launches Pareto Code with min_coding_score and 1B routed tokens per day

OpenRouter launched Pareto Code, a free experimental coding router that filters by min_coding_score and says it is already handling about 1 billion tokens a day. The release adds a tunable routing path for coding workloads where cost and model quality need to be balanced.

releaseSECONDARY2026-05-31

MiniMax M3 launches with 1M context and 59.0 SWE-Bench Pro

MiniMax shipped M3 with a 1M-token context window, native multimodal input, and frontier coding claims across SWE-Bench Pro, Terminal Bench, and MCP Atlas. It also appeared on OpenRouter, Ollama Cloud, Venice, Hermes, Cline, Together, and Arena on day one.

releasePRIMARY2026-05-30

OpenRouter launches Guardrails with budget caps, ZDR, and prompt-injection filters

OpenRouter released Guardrails to apply budget limits, provider restrictions, zero-data-retention rules, prompt-injection defense, and DLP checks across routed traffic. Google Model Armor and Lakera Guard connectors are in beta, so plan around limited availability.

newsSECONDARY2026-05-28

Agent tools add Claude Opus 4.8 to Cursor, Warp, OpenRouter, and Perplexity on day one

Independent IDEs, gateways, and agent runtimes rolled out Claude Opus 4.8 within hours of launch, including Cursor, Warp, OpenRouter, and Perplexity. That matters because teams can benchmark or swap the model into existing workflows without waiting for connector lag.

releaseSECONDARY2026-05-26

Warp Agent adds OpenRouter URLs and /model aliases for custom endpoints

Warp now lets agents connect directly to an OpenRouter endpoint and switch providers through remembered model aliases. The change reduces endpoint setup friction for teams routing across hosted models inside Warp Agent.

newsPRIMARY2026-05-26

OpenRouter raises $113M Series B as weekly volume hits 25T tokens

OpenRouter announced a $113M Series B led by CapitalG and said weekly routed volume grew from 5T to 25T tokens in six months. The funding matters because the company is pitching itself as production infrastructure for multi-model deployments, not just an API convenience layer.

releaseSECONDARY2026-05-22

Warp adds BYOK to Warp Agent with OpenAI-compatible endpoints

Warp Agent now accepts user-supplied OpenAI, Anthropic, and Gemini keys plus OpenAI-compatible endpoints such as OpenRouter and DeepSeek. The change removes the paid-plan requirement for inference access and gives terminal users more routing options.

releaseSECONDARY2026-05-21

Qwen3.7 Max launches with 1M context, 35-hour autonomy, and 56.6 AA Index

Alibaba launched Qwen3.7 Max as its new flagship agent model with 1M context, stronger coding and reasoning scores, and cross-harness benchmarks. OpenRouter, Together, AI Gateway, and Kilo support it on day one, making it ready for immediate deployment.

releasePRIMARY2026-05-19

OpenRouter adds openrouter:web_search and Parallel results at $0.005 per request

OpenRouter replaced its old web plugin path with agentic web search and fetch tools that use a common schema across models. Migrate to the new tools if you need multi-search turns, domain filtering, or Parallel/exa-native routing.

releasePRIMARY2026-05-15

OpenRouter adds multi-key BYOK routing with fallback tiers

OpenRouter updated BYOK workspaces so teams can attach multiple provider keys, scope them to specific models or users, and choose prioritized versus fallback use. It changes how rate-limit isolation, dev and prod separation, and failover routing are handled inside one workspace.

releaseSECONDARY2026-05-12

Perceptron releases Mk1 with 2 FPS video reasoning, 32K context, and $0.15 per 1M input

Perceptron launched Mk1, a multimodal model for video and embodied reasoning with native 2 FPS video, 32K context, and structured spatial outputs. OpenRouter access and the low input price make it usable for deployment, not just demos.

newsSECONDARY2026-05-12

Claude Opus 4.7 opens fast mode with ~2.5x speed as Cursor, v0, Droid, and OpenRouter add support

Anthropic rolled fast mode for Opus 4.7 into Claude Code and tools including Cursor, v0, Droid, Conductor, and OpenRouter. Use it where latency matters, but watch pricing: Cursor disclosed a 6x multiplier and others treat it as premium.

releaseSECONDARY2026-05-10

Hermes Agent adds LINE gateway with `hermes update` support

Hermes Agent added an official LINE gateway and OpenRouter published Pareto Code setup docs while users shared Discord and mobile SSH/TUI workflows. The change matters because Hermes is moving from ranking chatter into more concrete distribution channels and repeatable operator setups.

releasePRIMARY2026-05-09

OpenRouter launches Pareto Code with min_coding_score tiers and Nitro routing

OpenRouter released Pareto Code, which routes requests to the cheapest coding model above a chosen score threshold and can re-rank for speed with Nitro. Use the API to trade cost against latency with benchmark-based routing controls.

newsSECONDARY2026-05-09

Hermes Agent reports No. 1 OpenRouter rank after v0.13.0

Nous said Hermes Agent hit No. 1 among AI apps on OpenRouter after v0.13.0 shipped and added credential pools for rotating provider keys. Independent posts also tracked migrations from OpenClaw and early routing support in the same stack.

releaseSECONDARY2026-05-07

Google releases Gemini 3.1 Flash Lite GA with 1M context and $0.25 input pricing

Google moved Gemini 3.1 Flash Lite from preview to GA, and OpenRouter added the model with 1 million context and low-cost multimodal pricing. The preview endpoint now has a shutdown schedule, and users should verify whether the GA model differs from the March preview.

releasePRIMARY2026-05-02

OpenRouter launches Response Caching with X-OpenRouter-Cache and 80-300 ms hits

OpenRouter added response caching across chat, responses, messages, and embeddings with per-key isolation, TTL controls, and cached stream replay. The beta matters because identical retries and test runs can return in milliseconds without provider charges or rate-limit hits.

releaseSECONDARY2026-04-30

Grok 4.3 drops to $1.25/$2.50 with 1M context

Provider and benchmark trackers listed Grok 4.3 with 1M context and lower token pricing, and OpenRouter and Venice exposed it through their APIs. The model undercuts Opus 4.7 and GPT-5.5 on price while independent evaluations show stronger legal and finance performance than general coding.

newsSECONDARY2026-04-29

Stripe Projects adds OpenRouter, Daytona, Vercel, and Render provisioning commands

Stripe Projects added agent-friendly provisioning commands for OpenRouter, Daytona, Vercel, Render, and related tools. That lets agents buy model access, sandboxes, and hosting from the terminal instead of dashboard-driven setup.

releaseSECONDARY2026-04-28

Poolside releases Laguna M.1 and XS.2 coding models with 225B/23B and 33B/3B MoEs

Poolside opened Laguna M.1 and Laguna XS.2 as its first public coding models, with Apache 2.0 weights and same-day provider support. That gives teams open coding models that can run locally or through standard serving stacks.

releaseSECONDARY2026-04-28

Nemotron 3 Nano Omni launches 30B-A3B multimodal model with 256K context

NVIDIA opened Nemotron 3 Nano Omni, a 30B-A3B model for text, image, audio, and video, with day-one serving support. That lets teams run one open model for perception-heavy agents instead of stitching separate components.

workflowPRIMARY2026-04-26

OpenRouter launches `create-headless-agent` for Bun-based multi-model CLIs

OpenRouter released a new skill and guide that scaffold a headless agent CLI on top of its Agent SDK. The template packages multi-model inference, tool calling, and Bun-based CLI setup into a reusable starting point.

releaseSECONDARY2026-04-26

Hermes Agent updates model lists via hosted JSON for Nous Portal and OpenRouter

Hermes now pulls provider model lists from hosted JSON so new releases appear without client updates. The same update batch also auto-switches to a local browser when an agent needs localhost access.

releaseSECONDARY2026-04-24

OpenAI opens GPT-5.5 API with 1M context and Responses support

OpenAI added GPT-5.5 and GPT-5.5 Pro to the API and Playground with 1M context and Responses support. Partners including OpenRouter, Perplexity, GitHub Copilot, Vercel, Warp, and Devin rolled it out the same day, widening access beyond Codex.

releaseSECONDARY2026-04-23

Tencent launches Hy3 preview with 295B/21B, 256K context, and day-one OpenRouter, vLLM, and SGLang support

Tencent open-sourced Hy3 preview, a 295B MoE with 21B active parameters and 256K context, then pushed it into OpenRouter, OpenCode, OpenClaw, vLLM, and SGLang immediately. That matters because engineers can test and deploy a new reasoning-agent model on day one instead of waiting for the runtime ecosystem to catch up.

releasePRIMARY2026-04-23

OpenRouter launches Workspaces with BYOK and per-project routing controls

OpenRouter introduced Workspaces to separate API keys, BYOK, routing, plugins, and observability by environment or team. Billing stays unified at the account level while staging and production settings split cleanly.

releaseSECONDARY2026-04-23

DeepSeek releases V4-Pro and V4-Flash with 1M context and $0.14/M input

DeepSeek open-sourced V4-Pro and V4-Flash under MIT, with 1M context and aggressive Flash pricing. Day-one support in SGLang, vLLM, and OpenRouter pushes open-weight agentic coding closer to closed frontier models.

releaseSECONDARY2026-04-22

Xiaomi MiMo-V2.5-Pro releases with 57.2 SWE-Bench Pro, 1M context, and OpenRouter access

Xiaomi’s MiMo-V2.5-Pro and MiMo-V2.5 arrived with million-token context windows, stronger coding and agentic claims, and immediate access through OpenRouter plus agent harnesses. The rollout adds another low-cost Chinese frontier model that engineers can route into coding workflows without waiting for a proprietary IDE deal.

newsSECONDARY2026-04-22

GitHub Copilot adds bring-your-own keys across Free, Pro, Business, and Enterprise

GitHub added bring-your-own-model keys to Copilot in VS Code, letting users connect local or cloud providers instead of only bundled models. Teams can keep the Copilot harness while routing prompts through approved backends such as LM Studio or OpenRouter.

newsPRIMARY2026-04-21

OpenRouter adds Firecrawl web search with full-page markdown grounding

OpenRouter added Firecrawl as a search provider, letting models ground responses in scraped full web pages instead of snippet-only search. The launch folds crawling into the existing plugin settings flow and includes a capped free plan on the Firecrawl side.

newsSECONDARY2026-04-20

Kimi K2.6 adds day-one support across vLLM, SGLang, Ollama, and OpenRouter

Kimi K2.6 shipped across vLLM, SGLang, OpenRouter, Baseten, Ollama, OpenCode, Hermes Agent, and Droid within hours of launch. That cuts the usual lag between model release and production trials, so mixed-provider agent stacks can test it sooner.

newsSECONDARY2026-04-11

Hermes Agent ranks #1 on OpenRouter for coding apps

Nous said Hermes became the top coding app on OpenRouter while shipping an OpenClaw migration patch, Telegram agent-to-agent messaging, and new memory controls. If you run long-lived agents, watch the migration path and memory settings before moving chats or skills hubs.

releaseSECONDARY2026-04-07

Z.ai releases GLM-5.1, a 744B open model with 58.4 SWE-Bench Pro and 8-hour agent runs

Z.ai released GLM-5.1, a 744B open model built for long-horizon agentic coding and ranked first among open systems on SWE-Bench Pro. Day-0 support in OpenRouter, Ollama, SGLang, vLLM, OpenCode, and local quantization paths makes it ready to test in existing stacks.

newsPRIMARY2026-04-03

OpenRouter says Qwen3.6-Plus hits 1.4T tokens in a day

OpenRouter said Qwen3.6-Plus became its first model to exceed about 1.4 trillion tokens in a day, and Qwen said the model also moved to No. 1 on the service. The milestone adds a concrete deployment signal beyond benchmark scores and preview availability, so track usage data alongside evals.

releaseSECONDARY2026-03-15

Z.ai releases GLM-5-Turbo with 202K context for OpenClaw-style agent workflows

Z.ai released GLM-5-Turbo as a faster GLM-5 variant for OpenClaw-style tool use, with 202K context, OpenRouter access, and higher off-peak limits. Try it as a cheaper speed tier for agent workflows, but benchmark completion quality on your own tasks before wider use.

releaseSECONDARY2026-03-12

NVIDIA releases Nemotron 3 Super on OpenRouter with 1M context and free access

NVIDIA released Nemotron 3 Super, a 120B open model with 12B active parameters and a 1M-token window, on OpenRouter with free access. Evaluate it for low-cost agent backends, especially if you need local or self-hosted deployment options.