Skip to content
AI Primer
TOPIC48 stories

Model Routing

Choosing, composing, or switching models inside applications.

NEWS27th June
OpenRouter reports four open-weight models handle agents; Chinese models hit 45% of traffic

OpenRouter said four open-weight models now handle real agentic workloads, and a JPMorgan report put Chinese models at about 45% of platform traffic. The shift matters because teams are optimizing for price, hosting, and task fit instead of defaulting to frontier APIs.

RELEASE27th June
Sakana Fugu Ultra opens on Vercel AI Gateway

Sakana made Fugu Ultra available through Vercel AI Gateway, while new technical writeups described the trained routing head and multi-step orchestration behind it. The integration matters because teams can invoke Fugu’s model-selection workflow through existing gateway plumbing instead of standing up custom routing.

RELEASE25th June
OpenRouter launches MCP server with live pricing, benchmarks, and test inference

OpenRouter released an MCP server that lets agents query live model pricing, benchmark scores, provider data, docs, and run test inference from the CLI. That replaces stale model knowledge with current routing data inside long-running agent workflows.

RELEASE24th June
OpenRouter launches Image API with typed capabilities and exact USD cost

OpenRouter released a dedicated Image API that normalizes request shapes across 30-plus models from eight providers. Agents can inspect limits, passthrough options, streaming, and exact per-call cost without hardcoding vendor quirks.

RELEASE23rd June
Kilo Code launches Auto Efficient routing with KiloBench model selection

Kilo Code added an Auto Efficient mode that routes each request to the cheapest model that clears its benchmark bar using public KiloBench results. The router stays session-aware and falls back to stronger paid models when confidence is low.

NEWS22nd June
Fugu Ultra testers report 30-minute runs and 17x GLM cost after launch

Sakana launched Fugu Ultra on AI Gateway and published a technical report, with early testers sharing mixed results. Reports mention polished outputs on some tasks, but also 30-minute runs, uneven coding quality, and much higher cost than GLM-5.2.

RELEASE21st June
Morph supports Qwen, GLM-5.2, MiniMax M3, DeepSeek v4 with 20-35% higher code acceptance

Morph said its code-serving stack now exposes Qwen, GLM-5.2, MiniMax M3, and DeepSeek v4 with code-tuned speculative decoding. It claims 20-35% higher acceptance than Eagle 3.1 or DFlash, plus kernels for cheaper hardware.

WORKFLOW1w ago
Codex supports open-weight models via Ollama, vLLM, and Responses-compatible endpoints

Codex workflows can now run against open-weight models served through compatible Responses API endpoints, with Ollama and vLLM publishing direct paths for GLM-5.2 and Kimi K2.7 Code. That matters because teams can keep the Codex interface while swapping to self-hosted or lower-cost inference backends.

RELEASE2w ago
OpenRouter launches Fusion API with model panels and judge routing

OpenRouter launched Fusion, a server-side panel API that sends prompts to multiple models and combines one answer. Early logs also showed a web-path issue where Fusion still invoked Claude Opus 4.8 as judge and billed for it until API-side control was clarified.

RELEASE2w ago
OpenRouter launches Fusion API with DRACO panel tests at 1% of Fable

OpenRouter launched Fusion, a server-side panel API that fans prompts to multiple models, judges the outputs, and returns one synthesized answer. The company said DRACO landed within 1% of Fable at roughly half the price, but the published evals do not cover long-horizon tasks.

NEWS2w ago
Fable 5 users report 90-minute Max caps and June 23 plan cutoff

One day after Fable 5 launched, users reported burning through Max quotas in about 90 minutes while Anthropic told subscribers the model will leave Claude plans on June 23 until capacity improves. If you depend on Fable, plan for quota pressure and route critical jobs elsewhere.

NEWS2w ago
OpenRouter, OpenCode, and 5 others add Claude Fable 5 on launch day

OpenRouter, OpenCode, Lovable, Cline, Browser Use Terminal, Nous Portal, and Venice all added Fable 5 within hours of launch. The rollouts put the model into gateways, coding agents, browser agents, and chat clients on day one.

NEWS3w ago
OpenRouter adds cache-hit pricing telemetry as Devin exposes adaptive routing

Vendors pushed routing and spend controls closer to the default app layer, including OpenRouter's cache-hit pricing telemetry and Devin's adaptive routing. The discussion frames model choice more as a budget-control problem than a pure quality setting.

RELEASE3w ago
OpenRouter launches Pareto Code with min_coding_score and 1B routed tokens per day

OpenRouter launched Pareto Code, a free experimental coding router that filters by min_coding_score and says it is already handling about 1 billion tokens a day. The release adds a tunable routing path for coding workloads where cost and model quality need to be balanced.

RELEASE3w ago
Perplexity Computer adds hybrid agentic inference with local-cloud model splits

Perplexity said Computer will split tasks between on-device models and frontier cloud models, keeping some data on the local machine while escalating harder work remotely. That matters for privacy-sensitive workflows and for reducing token-heavy cloud usage on laptop-class hardware.

RELEASE3w ago
Factory introduces Router with 25% lower AI spend and 99% of Opus 4.7 Terminal-Bench 2

Factory put Router into private preview in its CLI and desktop app to route coding tasks across models, claiming 20-25% lower spend. The launch targets rising agent costs, though session continuity and routing behavior remain active points of debate.

NEWS4w ago
Coding-agent builders add shared memory, provider routing, and app launchers

Independent developers shipped sidecars that let Claude Code, Cursor, and Codex share memory, hot-swap model providers, package local projects as apps, and automate browser QA. Try these reusable tools if you want memory, routing, QA automation, and app packaging outside editor-specific features.

RELEASE4w ago
OpenRouter launches Guardrails with budget caps, ZDR, and prompt-injection filters

OpenRouter released Guardrails to apply budget limits, provider restrictions, zero-data-retention rules, prompt-injection defense, and DLP checks across routed traffic. Google Model Armor and Lakera Guard connectors are in beta, so plan around limited availability.

NEWS4w ago
Agent tools add Claude Opus 4.8 to Cursor, Warp, OpenRouter, and Perplexity on day one

Independent IDEs, gateways, and agent runtimes rolled out Claude Opus 4.8 within hours of launch, including Cursor, Warp, OpenRouter, and Perplexity. That matters because teams can benchmark or swap the model into existing workflows without waiting for connector lag.

NEWS4w ago
Hermes Agent integrates MCP Catalog, Qwen3.7 Max, Venice, and Krea 2 in one window

Hermes Agent added a built-in MCP Catalog while separate builders shipped Qwen3.7 Max support, Venice private-model workflows, and Krea 2 image generation. The cluster shows Hermes moving beyond a single-model assistant toward a broader agent shell with tool, model, and media providers.

NEWS4w ago
Ramp reports business AI token spend at 13x January 2025 levels

Ramp data and operator reports said enterprise AI token spending is rising far faster than budget controls and procurement cycles. Teams should plan for routing, cheaper defaults, and spend caps to become core engineering infrastructure.

NEWS4w ago
OpenRouter raises $113M Series B as weekly volume hits 25T tokens

OpenRouter announced a $113M Series B led by CapitalG and said weekly routed volume grew from 5T to 25T tokens in six months. The funding matters because the company is pitching itself as production infrastructure for multi-model deployments, not just an API convenience layer.

RELEASE4w ago
Warp Agent adds OpenRouter URLs and /model aliases for custom endpoints

Warp now lets agents connect directly to an OpenRouter endpoint and switch providers through remembered model aliases. The change reduces endpoint setup friction for teams routing across hosted models inside Warp Agent.

RELEASE1mo ago
Warp adds BYOK to Warp Agent with OpenAI-compatible endpoints

Warp Agent now accepts user-supplied OpenAI, Anthropic, and Gemini keys plus OpenAI-compatible endpoints such as OpenRouter and DeepSeek. The change removes the paid-plan requirement for inference access and gives terminal users more routing options.

NEWS1mo ago
OpenCode, Kilo, Replicate, and Mastra support Gemini 3.5 Flash on day one

OpenCode, Kilo, Replicate, and Mastra exposed Gemini 3.5 Flash on launch day across coding agents, routers, and hosted APIs. The fast uptake gives engineers multiple harnesses to test Google's 1M-context model despite mixed first-party app reports.

RELEASE1mo ago
OpenRouter adds openrouter:web_search and Parallel results at $0.005 per request

OpenRouter replaced its old web plugin path with agentic web search and fetch tools that use a common schema across models. Migrate to the new tools if you need multi-search turns, domain filtering, or Parallel/exa-native routing.

NEWS1mo ago
Hermes Agent adds SuperGrok subscription support for xAI workflows

Nous Research added SuperGrok support to Hermes Agent, letting users plug a Grok subscription directly into the framework. It broadens Hermes beyond OpenAI runtimes and local setups into another mainstream agent model path.

RELEASE1mo ago
OpenRouter adds multi-key BYOK routing with fallback tiers

OpenRouter updated BYOK workspaces so teams can attach multiple provider keys, scope them to specific models or users, and choose prioritized versus fallback use. It changes how rate-limit isolation, dev and prod separation, and failover routing are handled inside one workspace.

RELEASE1mo ago
Nous Research integrates Codex app-server into Hermes Agent for OpenAI tool runs

Hermes Agent can now route core tool calls through the Codex app-server when it is using OpenAI models. The integration gives Hermes users access to Codex runtime behavior with a `hermes update`, without changing the rest of their agent stack.

NEWS1mo ago
Pi community ships pi-treebase, Miko voice mode, and OpenCode Go guides

Builders shipped pi-treebase, a Miko voice mode for pi-listens, devrage support, and a Japanese OpenCode Go guide after the first Pi extension burst. The releases arrive as Pi’s provider abstraction gets stress-tested by OpenClaw-scale multi-provider use.

NEWS1mo ago
OpenCode adds Ring 2.6 1T with 256K context and free limited-time access

OpenCode made Ring 2.6 1T available in the editor with reasoning enabled and free access for a limited period. Follow-on posts from Kilo and others claim frontier-level results on AIME 26, ClawEval, Gaia2-search, and Tau2-Bench Telecom.

NEWS1mo ago
Hermes Agent reports No. 1 OpenRouter rank after v0.13.0

Nous said Hermes Agent hit No. 1 among AI apps on OpenRouter after v0.13.0 shipped and added credential pools for rotating provider keys. Independent posts also tracked migrations from OpenClaw and early routing support in the same stack.

RELEASE1mo ago
OpenRouter launches Pareto Code with min_coding_score tiers and Nitro routing

OpenRouter released Pareto Code, which routes requests to the cheapest coding model above a chosen score threshold and can re-rank for speed with Nitro. Use the API to trade cost against latency with benchmark-based routing controls.

RELEASE1mo ago
OpenRouter launches Response Caching with X-OpenRouter-Cache and 80-300 ms hits

OpenRouter added response caching across chat, responses, messages, and embeddings with per-key isolation, TTL controls, and cached stream replay. The beta matters because identical retries and test runs can return in milliseconds without provider charges or rate-limit hits.

RELEASE1mo ago
OpenClaw 2026.4.29 adds agent-native group chats, follow-up commitments, and NVIDIA model catalogs

OpenClaw 2026.4.29 shipped a new group-chat flow, opt-in follow-up commitments, tighter exec controls, and first-class NVIDIA provider catalogs. The release matters because it pushes OpenClaw toward safer multi-user agent workflows instead of single-session chat hacks.

RELEASE1mo ago
Grok 4.3 drops to $1.25/$2.50 with 1M context

Provider and benchmark trackers listed Grok 4.3 with 1M context and lower token pricing, and OpenRouter and Venice exposed it through their APIs. The model undercuts Opus 4.7 and GPT-5.5 on price while independent evaluations show stronger legal and finance performance than general coding.

RELEASE2mo ago
OpenClaw 2026.4.27 adds DeepInfra support and forward-proxy routing

OpenClaw 2026.4.27 bundles DeepInfra support, better non-image attachments, explicit forward-proxy routing, and stricter model selection. The update broadens provider access while hardening operator-run deployments against routing and session failures.

WORKFLOW2mo ago
DeepSeek V4 supports Anthropic-compatible routing into Claude Code and Cowork for ~90% lower cost

Independent guides showed DeepSeek V4 running inside Claude Cowork and Claude Code via Anthropic-compatible endpoints, and Ollama added launch commands for Claude-style wrappers. The workflow matters because teams can keep Claude-centered agent UX while sharply lowering model spend, with provider compatibility and setup still the main caveats.

RELEASE2mo ago
Hermes Agent updates model lists via hosted JSON for Nous Portal and OpenRouter

Hermes now pulls provider model lists from hosted JSON so new releases appear without client updates. The same update batch also auto-switches to a local browser when an agent needs localhost access.

NEWS2mo ago
DeepSeek V4 adds day-1 support from vLLM, SGLang, Ollama, OpenCode, Venice, and Together

Within a day of launch, vLLM, SGLang, Ollama cloud, OpenCode, Venice, Together, and Baseten added support or hosted access for DeepSeek V4. That makes Flash and Pro easier to test across local, routed, and managed agent stacks.

RELEASE2mo ago
OpenRouter launches Workspaces with BYOK and per-project routing controls

OpenRouter introduced Workspaces to separate API keys, BYOK, routing, plugins, and observability by environment or team. Billing stays unified at the account level while staging and production settings split cleanly.

NEWS2mo ago
Kimi K2.6 adds free Hermes and Cline access plus Replicate, Perplexity, and Together support

A day after Kimi K2.6’s launch, providers and tools opened new access paths including temporary free use in Hermes and Cline plus availability on Replicate, Together, Perplexity, and Tinker. Engineers can test the open model across agent harnesses and hosted runtimes without standing up their own stack first.

NEWS2mo ago
GitHub Copilot adds bring-your-own keys across Free, Pro, Business, and Enterprise

GitHub added bring-your-own-model keys to Copilot in VS Code, letting users connect local or cloud providers instead of only bundled models. Teams can keep the Copilot harness while routing prompts through approved backends such as LM Studio or OpenRouter.

NEWS2mo ago
OpenRouter adds Firecrawl web search with full-page markdown grounding

OpenRouter added Firecrawl as a search provider, letting models ground responses in scraped full web pages instead of snippet-only search. The launch folds crawling into the existing plugin settings flow and includes a capped free plan on the Firecrawl side.

NEWS2mo ago
Kimi K2.6 adds day-one support across vLLM, SGLang, Ollama, and OpenRouter

Kimi K2.6 shipped across vLLM, SGLang, OpenRouter, Baseten, Ollama, OpenCode, Hermes Agent, and Droid within hours of launch. That cuts the usual lag between model release and production trials, so mixed-provider agent stacks can test it sooner.

RELEASE2mo ago
Hermes Agent launches Tool Gateway with 300+ models and bundled tools

Hermes Agent added Tool Gateway, bundling 300+ models with web, browser, image, terminal, and TTS tools behind one subscription. Firecrawl, Browser Use, Fal image models, and Gemini Voice shipped at launch.

RELEASE2mo ago
Anthropic adds beta advisor tool to Messages API for Opus calls

Anthropic added a beta advisor tool to the Messages API so Sonnet or Haiku can call Opus mid-run inside one request. Anthropic says Sonnet plus Opus scored 2.7 points higher on SWE-bench Multilingual while cutting per-task cost 11.9%.

RELEASE3mo ago
Hermes Agent adds Hugging Face provider with 28 curated models

Hermes Agent now treats Hugging Face as a first-class inference provider and surfaces 28 curated models in its picker, plus a custom path to the broader catalog. That broadens model choice for a persistent local agent workflow without requiring users to wire a provider manually.

AI PrimerAI Primer

Your daily guide to AI tools, workflows, and creative inspiration.

© 2026 AI Primer. All rights reserved.