Skip to content
AI Primer
update

Kimi K2.6 adds free Hermes and Cline access plus Replicate, Perplexity, and Together support

A day after Kimi K2.6’s launch, providers and tools opened new access paths including temporary free use in Hermes and Cline plus availability on Replicate, Together, Perplexity, and Tinker. Engineers can test the open model across agent harnesses and hosted runtimes without standing up their own stack first.

6 min read
Kimi K2.6 adds free Hermes and Cline access plus Replicate, Perplexity, and Together support
Kimi K2.6 adds free Hermes and Cline access plus Replicate, Perplexity, and Together support

TL;DR

You can open the OpenRouter model page, check Together AI's K2.6 listing, and browse Unsloth's K2.6 deployment guide. The weirdly practical part is how many agent shells wired it up immediately: NousResearch's Hermes post, CosineAI's availability post, warpdotdev's rollout, and even ollama's cloud command post all made K2.6 something engineers could poke from tools they already use.

Free access windows

Moonshot launched K2.6 on April 20, but the story a day later was distribution. NousResearch's free-access post offered 24 hours of free use through Nous Portal inside Hermes, and cline's free-access post followed with a three-day free window in Cline.

The mechanics were short and concrete:

That matters because K2.6 is not cheap to self-host at full size. These promos turned a 1T-parameter open model into a casual afternoon test instead of a deployment project.

Hosted runtimes

The hosted rollout hit nearly every familiar lane at once: API platforms, consumer chat surfaces, and inference clouds. replicate's launch post, togethercompute's availability post, perplexity_ai's subscriber rollout, OpenRouter's availability post, and warpdotdev's rollout all landed within roughly two days of the main launch.

A quick inventory of the access paths:

Together also attached fresh benchmark and ops claims to the hosted version. In togethercompute's availability post, the company highlighted 80.2% SWE-Bench Verified, 89.6% LiveCodeBench v6, 79.4% MMMU-Pro, up to 300 sub-agents, and a 99.9% SLA on its cloud.

Agent harnesses

K2.6's real adoption story is less "new model added" and more "new model slotted into every agent shell people already have open." CosineAI's availability post plugged it into Cosine Swarm, NousResearch's Hermes availability post wired it into Hermes Agent, opencode's availability post added it to OpenCode, and warpdotdev's rollout put it into Warp.

The pitch across those tools was unusually consistent:

Moonshot's own launch thread made the same bet. In Kimi_Moonshot's partner post, the company described 4,000-plus tool calls over 12-plus hours, 300 parallel sub-agents with 4,000 steps per run, and named Hermes and OpenClaw as examples of proactive agent deployments.

Benchmarks and price cards

The benchmark story here is not one clean leaderboard. It is a stack of provider-specific receipts that all point in the same direction: K2.6 is close enough to frontier closed models that the price delta becomes the headline.

Some of the more concrete numbers:

  • cline's benchmark card put K2.6 at 66.7 on Terminal-Bench 2.0, ahead of Claude Opus 4.6 and GPT-5.4 at 65.4, while listing K2.6 at $0.80 in and $3.50 out on that surface.
  • Kimi_Moonshot's OpenRouter ranking post said K2.6 was already No. 1 on OpenRouter's programming leaderboard.
  • arena's leaderboard summary placed it as the No. 2 open model in Code Arena and No. 1 open model in Vision and Document Arena.
  • arena's vision and document update added a 9-point gain over K2.5-Thinking in Vision Arena and a 14-point gain in Document Arena.
  • replicate's launch post highlighted a 13-hour autonomous refactor of an eight-year-old trading engine, with 4,000-plus lines changed, 1,000-plus tool calls, and a claimed 185% throughput gain.

There are also caveats inside the same wave. scaling01's PencilPuzzleBench post showed K2.6 improving over K2.5 on that benchmark without putting it near the very top, and petergostev's BullshitBench update said pushback improved in non-reasoning mode while heavier reasoning settings hurt that specific eval.

Local and cloud footprints

The last useful twist is that K2.6 is spreading downward into weird hardware as fast as it is spreading upward into inference partners. UnslothAI's GGUF release said the 1T model could be packed into a 340 GB Dynamic GGUF with selective upcasting, run on CPU, GPU, and SSD setups, and hit more than 40 tok/s on 350 GB RAM or VRAM systems.

That splits the deployment map into three very different options:

That is a nice little Christmas-come-early moment for coding agent nerds. The same model now shows up as a free dropdown in Hermes and Cline, a managed endpoint on OpenRouter and Together, and a giant but runnable GGUF for people with absurd boxes and good excuses.

Further reading

Discussion across the web

Where this story is being discussed, in original context.

On X· 6 threads
TL;DR5 posts
Free access windows3 posts
Hosted runtimes7 posts
Agent harnesses5 posts
Benchmarks and price cards3 posts
Local and cloud footprints2 posts