Skip to content
AI Primer
release

Moonshot releases Kimi K2.7 Code: +21.8% on Kimi Code Bench v2, 30% fewer reasoning tokens

Moonshot open-sourced Kimi K2.7 Code and says it outperforms K2.6 by 21.8% on Kimi Code Bench v2 while using 30% fewer reasoning tokens. The release includes open weights and API access, so teams can test the 180 tok/s HighSpeed rollout and early Cline/OpenCode support.

4 min read
Moonshot releases Kimi K2.7 Code: +21.8% on Kimi Code Bench v2, 30% fewer reasoning tokens
Moonshot releases Kimi K2.7 Code: +21.8% on Kimi Code Bench v2, 30% fewer reasoning tokens

TL;DR

You can inspect the full model card, check the live Kimi Code product page, and browse the API pricing page. Moonshot also buried two practical rollout details outside the headline post: a HighSpeed mode announcement with 180 tok/s to 260 tok/s claims, and a launch promotion page offering extra quota through July 2.

Kimi Code and API

Moonshot shipped K2.7 Code across three surfaces on day one: the Kimi Code coding agent, the Kimi API platform, and open weights on Hugging Face.

That launch also split the K2 family more explicitly. In Kimi_Moonshot's reply to a user, the company says K2.7 Code is built specifically for coding, while K2.6 stays the recommended option for general-purpose and non-coding tasks.

Benchmark deltas

Moonshot's headline table is simple:

The more interesting number is token use. Kimi_Moonshot's launch post says reasoning-token usage drops by 30%, while bridgemindai's breakdown points to a concrete Program Bench example, 176k tokens per task on K2.6 versus 102k on K2.7 Code.

Model card details

The model card fills in the parts the tweet skipped. K2.7 Code is a 1T-parameter MoE model with 32B active parameters, 384 experts, an 8-expert routing scheme, 160K vocabulary, and a 256K context window.

It also keeps Kimi's multimodal shape. The same card lists a 400M-parameter MoonViT vision encoder, tags the repo as image-text-to-text, and says the official API supports OpenAI-compatible and Anthropic-compatible calls.

Deployment guidance is unusually concrete for a launch post. The card recommends vLLM, SGLang, and KTransformers, requires transformers >=4.57.1, says the architecture matches K2.5 and K2.6 closely enough to reuse deployment methods, and notes that thinking is forced on for the official API.

HighSpeed and integrations

Moonshot's follow-on thread claims a new HighSpeed mode can run around 180 tok/s on coding tasks with median-length inputs and up to 260 tok/s on shorter-context work. The same thread says access is rolling out first to Beta Program members, API developers, and Kimi Business users because capacity is still limited.

Ecosystem support showed up immediately. cline's announcement says K2.7 Code is usable in Cline, and opencode's post says the model is available in Go with text and image support at similar pricing to 2.6.

Launch promotion

Moonshot paired the model launch with a quota incentive that only appeared in a later post and the promotion docs. Top-ups from $100 to $299 get a 20% bonus, $300 to $999 gets 25%, and $1,000 or more gets 30%.

The API platform page also puts hard numbers on the base rate: $0.95 per million input tokens, $0.19 per million cache-hit tokens, and $4.00 per million output tokens. The promotion runs from June 11 to July 2, and the docs limit each organization ID to one reward.

Share on X