releaseJune 12, 2026

Moonshot releases Kimi K2.7 Code: +21.8% on Kimi Code Bench v2, 30% fewer reasoning tokens

Moonshot open-sourced Kimi K2.7 Code and says it outperforms K2.6 by 21.8% on Kimi Code Bench v2 while using 30% fewer reasoning tokens. The release includes open weights and API access, so teams can test the 180 tok/s HighSpeed rollout and early Cline/OpenCode support.

4 min read

Moonshot releases Kimi K2.7 Code: +21.8% on Kimi Code Bench v2, 30% fewer reasoning tokens

TL;DR

Moonshot open-sourced Kimi K2.7 Code, and Kimi_Moonshot's launch post says it ships today in both Kimi Code and the Kimi API, with weights on Hugging Face.
According to Kimi_Moonshot's benchmark summary, K2.7 Code improves over K2.6 by 21.8% on Kimi Code Bench v2, 11.0% on Program Bench, and 31.5% on MLS Bench Lite.
Kimi_Moonshot's launch post also frames the release around efficiency, claiming 30% lower reasoning-token usage than K2.6, while the Hugging Face model card adds full benchmark tables and deployment details.
Moonshot is treating this as a specialized coding branch, not a new default model, because Kimi_Moonshot's follow-up reply says K2.7 Code is for coding while K2.6 remains the recommendation for general-purpose work in Kimi Work and the Kimi app.
The rollout already spread into agent surfaces, with cline's announcement adding K2.7 Code support, opencode's post listing it in Go, and Kimi_Moonshot's HighSpeed thread previewing a limited 6x faster mode.

You can inspect the full model card, check the live Kimi Code product page, and browse the API pricing page. Moonshot also buried two practical rollout details outside the headline post: a HighSpeed mode announcement with 180 tok/s to 260 tok/s claims, and a launch promotion page offering extra quota through July 2.

Kimi Code and API

Moonshot shipped K2.7 Code across three surfaces on day one: the Kimi Code coding agent, the Kimi API platform, and open weights on Hugging Face.

That launch also split the K2 family more explicitly. In Kimi_Moonshot's reply to a user, the company says K2.7 Code is built specifically for coding, while K2.6 stays the recommended option for general-purpose and non-coding tasks.

Benchmark deltas

Moonshot's headline table is simple:

Kimi Code Bench v2: 50.9 to 62.0, per the Hugging Face model card
Program Bench: 48.3 to 53.6, per the Hugging Face model card
MLS Bench Lite: 26.7 to 35.1, per the Hugging Face model card
Kimi Claw 24/7 Bench: 42.9 to 46.9, per the Hugging Face model card
MCP Atlas: 69.4 to 76.0, per the Hugging Face model card
MCPMark Verified: 72.8 to 81.1, per the Hugging Face model card

The more interesting number is token use. Kimi_Moonshot's launch post says reasoning-token usage drops by 30%, while bridgemindai's breakdown points to a concrete Program Bench example, 176k tokens per task on K2.6 versus 102k on K2.7 Code.

Model card details

The model card fills in the parts the tweet skipped. K2.7 Code is a 1T-parameter MoE model with 32B active parameters, 384 experts, an 8-expert routing scheme, 160K vocabulary, and a 256K context window.

It also keeps Kimi's multimodal shape. The same card lists a 400M-parameter MoonViT vision encoder, tags the repo as image-text-to-text, and says the official API supports OpenAI-compatible and Anthropic-compatible calls.

Deployment guidance is unusually concrete for a launch post. The card recommends vLLM, SGLang, and KTransformers, requires transformers >=4.57.1, says the architecture matches K2.5 and K2.6 closely enough to reuse deployment methods, and notes that thinking is forced on for the official API.

HighSpeed and integrations

Moonshot's follow-on thread claims a new HighSpeed mode can run around 180 tok/s on coding tasks with median-length inputs and up to 260 tok/s on shorter-context work. The same thread says access is rolling out first to Beta Program members, API developers, and Kimi Business users because capacity is still limited.

Ecosystem support showed up immediately. cline's announcement says K2.7 Code is usable in Cline, and opencode's post says the model is available in Go with text and image support at similar pricing to 2.6.

Launch promotion

Moonshot paired the model launch with a quota incentive that only appeared in a later post and the promotion docs. Top-ups from $100 to $299 get a 20% bonus, $300 to $999 gets 25%, and $1,000 or more gets 30%.

The API platform page also puts hard numbers on the base rate: $0.95 per million input tokens, $0.19 per million cache-hit tokens, and $4.00 per million output tokens. The promotion runs from June 11 to July 2, and the docs limit each organization ID to one reward.