breakingApril 20, 2026

Kimi K2.6 adds day-one support across vLLM, SGLang, Ollama, and OpenRouter

Kimi K2.6 shipped across vLLM, SGLang, OpenRouter, Baseten, Ollama, OpenCode, Hermes Agent, and Droid within hours of launch. That cuts the usual lag between model release and production trials, so mixed-provider agent stacks can test it sooner.

4 min read

Kimi K2.6 adds day-one support across vLLM, SGLang, Ollama, and OpenRouter

TL;DR

Kimi_Moonshot's launch thread positioned Kimi K2.6 as an open-weight coding and agent model with 4,000-plus tool calls over 12-plus hours, 300 parallel sub-agents, and multimodal benchmarks spanning coding, browsing, and visual tasks.
The deployment lag was basically zero: vllm_project's day-0 post announced vLLM 0.19.1 support, while lmsysorg's day-0 post put the model in SGLang with a live SGLang cookbook.
Hosted access showed up just as fast, with OpenRouter's listing post and baseten's launch post both going live on April 20, and bridgemindai's pricing screenshot showing OpenRouter at $0.95 per million input tokens and $4 per million output tokens for a 262K context model.
Agent shells piled on within hours: ollama's integration thread wired K2.6 into OpenClaw, Hermes, and Claude Code on Ollama Cloud, while NousResearch's Hermes post and opencode's OpenCode post added more ways to drop it into existing coding harnesses.

You can already hit the OpenRouter model page, browse the Ollama model page, and inspect the Ollama integrations docs. The weirder part is how many agent-specific wrappers showed up on day zero, from OpenClaw and Hermes in ollama's thread to Droid in FactoryAI's launch post.

Runtime support

The fastest signal here was not the benchmark chart. It was infra vendors shipping parser-level support immediately.

According to vllm_project's day-0 post, vLLM support landed in 0.19.1 with explicit --tool-call-parser kimi_k2, --enable-auto-tool-choice, and --reasoning-parser kimi_k2 flags. That is a stronger claim than generic compatibility, because it points to custom handling for K2.6's tool and reasoning formats rather than raw text completion only.

lmsysorg's cookbook link put the same story on the SGLang side. lmsysorg's day-0 post repeated Moonshot's headline numbers, but the useful bit is simpler: there was already a runnable cookbook instead of a wait-for-support period.

Hosted endpoints

The second wave was immediate hosted access, which matters because most teams will test a new open model through a provider long before they self-host it.

OpenRouter's model page exposed the model with a 262,144 token context window, and bridgemindai's pricing screenshot captured day-one pricing at $0.95 per million input tokens and $4 per million output tokens. OpenRouter's own copy in OpenRouter's listing post framed K2.6 as a long-horizon coding model tuned for sustained agentic work.

Baseten used its launch post to surface stack details that normally stay buried in provider marketing. baseten's launch post said its K2.6 deployment uses KV-aware routing, NVFP4 weights on Blackwell, multimodal hierarchical caching, and prefill-decode disaggregation. OpenRouter's Cloudflare repost added one more hosted surface, showing K2.6 running on Cloudflare Workers AI through OpenRouter.

Agent shells

This is where the rollout starts to look like an ecosystem event rather than a single model launch.

Ollama's integration thread did not just add a model slug. ollama's integration thread showed ollama launch targets for OpenClaw, Hermes, and Claude Code, which turns K2.6 into a drop-in backend for several existing agent workflows.

The rest of the shell layer filled in fast:

NousResearch's Hermes post added K2.6 to Hermes Agent via hermes update and provider selection.
opencode's OpenCode post put K2.6 in OpenCode.
FactoryAI's launch post brought it to Droid with Fireworks as the hosting partner.
opencode's limits poll hinted that demand spiked fast enough for OpenCode to ask publicly about tripling K2.6 limits.

Kimi's own surfaces

Moonshot's own UI shipped more than one model toggle.

The launch thread in Kimi_Moonshot's launch thread bundled four concrete product claims into one release:

K2.6 Instant
K2.6 Thinking
K2.6 Agent
K2.6 Agent Swarm

The screenshot in testingcatalog's Kimi Chat screenshot shows those modes live in Kimi Chat, with Agent Swarm labeled beta and ordinary Agent mode aimed at research, slides, websites, docs, and sheets. That makes the public product framing unusually explicit: Moonshot is not selling one model personality, it is selling a stack of runtimes around the same base release.

Quiet rollout

K2.6 appears to have leaked into Moonshot's own site before the official thread went live.

On April 18, both AiBattle_'s early sighting and koltregaskes' early sighting posted chat screenshots where the assistant identified itself as Kimi K2.6, two days before Kimi_Moonshot's launch thread formalized the release. That early exposure explains why the support wave felt so compressed: some of the ecosystem was already watching the model surface before Moonshot finished the announcement cycle.