Kimi K2.6 adds free Hermes and Cline access plus Replicate, Perplexity, and Together support
A day after Kimi K2.6’s launch, providers and tools opened new access paths including temporary free use in Hermes and Cline plus availability on Replicate, Together, Perplexity, and Tinker. Engineers can test the open model across agent harnesses and hosted runtimes without standing up their own stack first.

TL;DR
- NousResearch's free-access post and cline's free-access post turned Kimi K2.6 into a zero-setup test drive this week, with 24 hours free in Hermes via Nous Portal and three days free in Cline.
- The access wave spread fast across hosted surfaces: replicate's launch post put K2.6 on Replicate, togethercompute's availability post added it to Together AI, and perplexity_ai's subscriber rollout surfaced it inside Perplexity's model picker.
- Moonshot is pitching the model as a long-run coding system, not a chat demo. In Kimi_Moonshot's API announcement, the company listed thinking and non-thinking modes, tool calls, JSON mode, web search, multimodal input, and a 256K context window.
- The ecosystem rollout centered on agent harnesses. CosineAI's Swarm post tied K2.6 to 4,000-plus sequential tool calls, while warpdotdev's early testing post said Warp saw faster tool calls at lower cost.
- Access is widening in both directions at once: OpenRouter's availability post and togethercompute's model page link post cover hosted inference, while UnslothAI's GGUF release shrank the model to a 340 GB Dynamic GGUF for CPU, GPU, and SSD setups.
You can open the OpenRouter model page, check Together AI's K2.6 listing, and browse Unsloth's K2.6 deployment guide. The weirdly practical part is how many agent shells wired it up immediately: NousResearch's Hermes post, CosineAI's availability post, warpdotdev's rollout, and even ollama's cloud command post all made K2.6 something engineers could poke from tools they already use.
Free access windows
Moonshot launched K2.6 on April 20, but the story a day later was distribution. NousResearch's free-access post offered 24 hours of free use through Nous Portal inside Hermes, and cline's free-access post followed with a three-day free window in Cline.
The mechanics were short and concrete:
- Hermes: run
hermes update, thenhermes model, per Teknium's setup note and NousResearch's Hermes availability post. - Cline: K2.6 showed up as a temporary free option backed by Vercel AI Gateway in cline's free-access post.
- cto.new also joined the promo pile, according to ctodotnew's short announcement.
That matters because K2.6 is not cheap to self-host at full size. These promos turned a 1T-parameter open model into a casual afternoon test instead of a deployment project.
Hosted runtimes
The hosted rollout hit nearly every familiar lane at once: API platforms, consumer chat surfaces, and inference clouds. replicate's launch post, togethercompute's availability post, perplexity_ai's subscriber rollout, OpenRouter's availability post, and warpdotdev's rollout all landed within roughly two days of the main launch.
A quick inventory of the access paths:
- API first party: Kimi_Moonshot's API announcement priced K2.6 at $0.16 per million cache-hit input tokens, $0.95 cache-miss input, and $4 output.
- Inference routers: OpenRouter's availability post put it behind a standard OpenRouter endpoint, and togethercompute's model page link post pointed to Together's hosted listing.
- App surfaces: perplexity_ai's subscriber rollout added K2.6 for Pro and Max users, while _akhaliq's HuggingChat post showed it available in HuggingChat.
- Model clouds: replicate's launch post, baseten's production post, and genspark_ai's day-zero post all positioned K2.6 as ready for production inference.
Together also attached fresh benchmark and ops claims to the hosted version. In togethercompute's availability post, the company highlighted 80.2% SWE-Bench Verified, 89.6% LiveCodeBench v6, 79.4% MMMU-Pro, up to 300 sub-agents, and a 99.9% SLA on its cloud.
Agent harnesses
K2.6's real adoption story is less "new model added" and more "new model slotted into every agent shell people already have open." CosineAI's availability post plugged it into Cosine Swarm, NousResearch's Hermes availability post wired it into Hermes Agent, opencode's availability post added it to OpenCode, and warpdotdev's rollout put it into Warp.
The pitch across those tools was unusually consistent:
- Long-horizon runs: CosineAI's Swarm modes post and CosineAI's availability post both emphasized 4,000-plus sequential tool calls.
- Fast tool use: warpdotdev's rollout said early tests showed significantly faster tool calls.
- Cheap enough to leave running: cline's free-access post compared K2.6 with Opus 4.6 and GPT-5.4 on Terminal-Bench pricing, and Kimi_Moonshot's API announcement kept output pricing at $4 per million.
- Multi-agent framing: testingcatalog's rollout screenshot showed Moonshot exposing four surfaces in Kimi Chat itself, Instant, Thinking, Agent, and Agent Swarm Beta.
Moonshot's own launch thread made the same bet. In Kimi_Moonshot's partner post, the company described 4,000-plus tool calls over 12-plus hours, 300 parallel sub-agents with 4,000 steps per run, and named Hermes and OpenClaw as examples of proactive agent deployments.
Benchmarks and price cards
The benchmark story here is not one clean leaderboard. It is a stack of provider-specific receipts that all point in the same direction: K2.6 is close enough to frontier closed models that the price delta becomes the headline.
Some of the more concrete numbers:
- cline's benchmark card put K2.6 at 66.7 on Terminal-Bench 2.0, ahead of Claude Opus 4.6 and GPT-5.4 at 65.4, while listing K2.6 at $0.80 in and $3.50 out on that surface.
- Kimi_Moonshot's OpenRouter ranking post said K2.6 was already No. 1 on OpenRouter's programming leaderboard.
- arena's leaderboard summary placed it as the No. 2 open model in Code Arena and No. 1 open model in Vision and Document Arena.
- arena's vision and document update added a 9-point gain over K2.5-Thinking in Vision Arena and a 14-point gain in Document Arena.
- replicate's launch post highlighted a 13-hour autonomous refactor of an eight-year-old trading engine, with 4,000-plus lines changed, 1,000-plus tool calls, and a claimed 185% throughput gain.
There are also caveats inside the same wave. scaling01's PencilPuzzleBench post showed K2.6 improving over K2.5 on that benchmark without putting it near the very top, and petergostev's BullshitBench update said pushback improved in non-reasoning mode while heavier reasoning settings hurt that specific eval.
Local and cloud footprints
The last useful twist is that K2.6 is spreading downward into weird hardware as fast as it is spreading upward into inference partners. UnslothAI's GGUF release said the 1T model could be packed into a 340 GB Dynamic GGUF with selective upcasting, run on CPU, GPU, and SSD setups, and hit more than 40 tok/s on 350 GB RAM or VRAM systems.
That splits the deployment map into three very different options:
- Local-ish heavy iron: UnslothAI's GGUF release listed 340 GB for Dynamic 2-bit and about 595 GB for its lossless Q8 path.
- Cloud shells with almost no friction: ollama's cloud command post exposed
ollama run kimi-k2.6:cloud, and AskVenice's Venice launch thread offered a browser chat path. - Production inference stacks: baseten's production post called out KV-aware routing, NVFP4 on Blackwell, multimodal hierarchical caching, and prefill-decode disaggregation for K2.6 specifically.
That is a nice little Christmas-come-early moment for coding agent nerds. The same model now shows up as a free dropdown in Hermes and Cline, a managed endpoint on OpenRouter and Together, and a giant but runnable GGUF for people with absurd boxes and good excuses.