Model Routing
Choosing, composing, or switching models inside applications.
Stories
Filter storiesBuilders shipped pi-treebase, a Miko voice mode for pi-listens, devrage support, and a Japanese OpenCode Go guide after the first Pi extension burst. The releases arrive as Pi’s provider abstraction gets stress-tested by OpenClaw-scale multi-provider use.
OpenCode made Ring 2.6 1T available in the editor with reasoning enabled and free access for a limited period. Follow-on posts from Kilo and others claim frontier-level results on AIME 26, ClawEval, Gaia2-search, and Tau2-Bench Telecom.
OpenRouter released Pareto Code, which routes requests to the cheapest coding model above a chosen score threshold and can re-rank for speed with Nitro. Use the API to trade cost against latency with benchmark-based routing controls.
Nous said Hermes Agent hit No. 1 among AI apps on OpenRouter after v0.13.0 shipped and added credential pools for rotating provider keys. Independent posts also tracked migrations from OpenClaw and early routing support in the same stack.
OpenRouter added response caching across chat, responses, messages, and embeddings with per-key isolation, TTL controls, and cached stream replay. The beta matters because identical retries and test runs can return in milliseconds without provider charges or rate-limit hits.
OpenClaw 2026.4.29 shipped a new group-chat flow, opt-in follow-up commitments, tighter exec controls, and first-class NVIDIA provider catalogs. The release matters because it pushes OpenClaw toward safer multi-user agent workflows instead of single-session chat hacks.
Provider and benchmark trackers listed Grok 4.3 with 1M context and lower token pricing, and OpenRouter and Venice exposed it through their APIs. The model undercuts Opus 4.7 and GPT-5.5 on price while independent evaluations show stronger legal and finance performance than general coding.
OpenClaw 2026.4.27 bundles DeepInfra support, better non-image attachments, explicit forward-proxy routing, and stricter model selection. The update broadens provider access while hardening operator-run deployments against routing and session failures.
Independent guides showed DeepSeek V4 running inside Claude Cowork and Claude Code via Anthropic-compatible endpoints, and Ollama added launch commands for Claude-style wrappers. The workflow matters because teams can keep Claude-centered agent UX while sharply lowering model spend, with provider compatibility and setup still the main caveats.
Hermes now pulls provider model lists from hosted JSON so new releases appear without client updates. The same update batch also auto-switches to a local browser when an agent needs localhost access.
Within a day of launch, vLLM, SGLang, Ollama cloud, OpenCode, Venice, Together, and Baseten added support or hosted access for DeepSeek V4. That makes Flash and Pro easier to test across local, routed, and managed agent stacks.
OpenRouter introduced Workspaces to separate API keys, BYOK, routing, plugins, and observability by environment or team. Billing stays unified at the account level while staging and production settings split cleanly.
A day after Kimi K2.6’s launch, providers and tools opened new access paths including temporary free use in Hermes and Cline plus availability on Replicate, Together, Perplexity, and Tinker. Engineers can test the open model across agent harnesses and hosted runtimes without standing up their own stack first.
GitHub added bring-your-own-model keys to Copilot in VS Code, letting users connect local or cloud providers instead of only bundled models. Teams can keep the Copilot harness while routing prompts through approved backends such as LM Studio or OpenRouter.
OpenRouter added Firecrawl as a search provider, letting models ground responses in scraped full web pages instead of snippet-only search. The launch folds crawling into the existing plugin settings flow and includes a capped free plan on the Firecrawl side.
Kimi K2.6 shipped across vLLM, SGLang, OpenRouter, Baseten, Ollama, OpenCode, Hermes Agent, and Droid within hours of launch. That cuts the usual lag between model release and production trials, so mixed-provider agent stacks can test it sooner.
Hermes Agent added Tool Gateway, bundling 300+ models with web, browser, image, terminal, and TTS tools behind one subscription. Firecrawl, Browser Use, Fal image models, and Gemini Voice shipped at launch.
Anthropic added a beta advisor tool to the Messages API so Sonnet or Haiku can call Opus mid-run inside one request. Anthropic says Sonnet plus Opus scored 2.7 points higher on SWE-bench Multilingual while cutting per-task cost 11.9%.
Hermes Agent now treats Hugging Face as a first-class inference provider and surfaces 28 curated models in its picker, plus a custom path to the broader catalog. That broadens model choice for a persistent local agent workflow without requiring users to wire a provider manually.