Skip to content
AI Primer
breaking

Kimi K2.6 adds day-one support across vLLM, SGLang, Ollama, and OpenRouter

Kimi K2.6 shipped across vLLM, SGLang, OpenRouter, Baseten, Ollama, OpenCode, Hermes Agent, and Droid within hours of launch. That cuts the usual lag between model release and production trials, so mixed-provider agent stacks can test it sooner.

4 min read
Kimi K2.6 adds day-one support across vLLM, SGLang, Ollama, and OpenRouter
Kimi K2.6 adds day-one support across vLLM, SGLang, Ollama, and OpenRouter

TL;DR

You can already hit the OpenRouter model page, browse the Ollama model page, and inspect the Ollama integrations docs. The weirder part is how many agent-specific wrappers showed up on day zero, from OpenClaw and Hermes in ollama's thread to Droid in FactoryAI's launch post.

Runtime support

The fastest signal here was not the benchmark chart. It was infra vendors shipping parser-level support immediately.

According to vllm_project's day-0 post, vLLM support landed in 0.19.1 with explicit --tool-call-parser kimi_k2, --enable-auto-tool-choice, and --reasoning-parser kimi_k2 flags. That is a stronger claim than generic compatibility, because it points to custom handling for K2.6's tool and reasoning formats rather than raw text completion only.

lmsysorg's cookbook link put the same story on the SGLang side. lmsysorg's day-0 post repeated Moonshot's headline numbers, but the useful bit is simpler: there was already a runnable cookbook instead of a wait-for-support period.

Hosted endpoints

The second wave was immediate hosted access, which matters because most teams will test a new open model through a provider long before they self-host it.

OpenRouter's model page exposed the model with a 262,144 token context window, and bridgemindai's pricing screenshot captured day-one pricing at $0.95 per million input tokens and $4 per million output tokens. OpenRouter's own copy in OpenRouter's listing post framed K2.6 as a long-horizon coding model tuned for sustained agentic work.

Baseten used its launch post to surface stack details that normally stay buried in provider marketing. baseten's launch post said its K2.6 deployment uses KV-aware routing, NVFP4 weights on Blackwell, multimodal hierarchical caching, and prefill-decode disaggregation. OpenRouter's Cloudflare repost added one more hosted surface, showing K2.6 running on Cloudflare Workers AI through OpenRouter.

Agent shells

This is where the rollout starts to look like an ecosystem event rather than a single model launch.

Ollama's integration thread did not just add a model slug. ollama's integration thread showed ollama launch targets for OpenClaw, Hermes, and Claude Code, which turns K2.6 into a drop-in backend for several existing agent workflows.

The rest of the shell layer filled in fast:

Kimi's own surfaces

Moonshot's own UI shipped more than one model toggle.

The launch thread in Kimi_Moonshot's launch thread bundled four concrete product claims into one release:

  • K2.6 Instant
  • K2.6 Thinking
  • K2.6 Agent
  • K2.6 Agent Swarm

The screenshot in testingcatalog's Kimi Chat screenshot shows those modes live in Kimi Chat, with Agent Swarm labeled beta and ordinary Agent mode aimed at research, slides, websites, docs, and sheets. That makes the public product framing unusually explicit: Moonshot is not selling one model personality, it is selling a stack of runtimes around the same base release.

Quiet rollout

K2.6 appears to have leaked into Moonshot's own site before the official thread went live.

On April 18, both AiBattle_'s early sighting and koltregaskes' early sighting posted chat screenshots where the assistant identified itself as Kimi K2.6, two days before Kimi_Moonshot's launch thread formalized the release. That early exposure explains why the support wave felt so compressed: some of the ecosystem was already watching the model surface before Moonshot finished the announcement cycle.

Further reading

Discussion across the web

Where this story is being discussed, in original context.

On X· 5 threads
TL;DR2 posts
Runtime support1 post
Hosted endpoints2 posts
Agent shells3 posts
Kimi's own surfaces1 post