Kilo Code launches Auto Efficient routing with KiloBench model selection
Kilo Code added an Auto Efficient mode that routes each request to the cheapest model that clears its benchmark bar using public KiloBench results. The router stays session-aware and falls back to stronger paid models when confidence is low.

TL;DR
- Kilo Code shipped an
Auto Efficientmode that routes each coding request to the cheapest model that still clears its quality bar, according to kilocode's launch thread and kilocode's availability post. - The router uses public
KiloBenchrankings built from real Kilo usage tasks, with kilocode's benchmark explainer saying the same benchmark data is exposed on the leaderboard. - kilocode's session-aware note says the router avoids swapping models mid-thread, while kilocode's fallback note says uncertain calls drop to Balanced so cheap models do not become the floor.
- Teams can tune routing per project, with kilocode's settings post showing
Best accuracy per dollarandBest accuracymodes on top of the default router.
You can read the public benchmark writeup from kilocode's follow-up, inspect the KiloBench leaderboard via kilocode's reply about public data, and even watch a two-minute demo from kilocode's demo post. The interesting bit is not model routing by itself. It is that Kilo is exposing the scorecard it routes on, then letting users override the policy when they want a different cost-quality tradeoff.
Auto Efficient
Kilo's pitch is brutally simple: stop paying top-tier model prices for low-stakes edits. kilocode's launch thread frames the split as cheap models for tasks like variable renames, stronger models for harder work like migration planning.
The product is live now. kilocode's availability post says Auto Efficient appears directly in the model picker, where the screenshot marks it as the recommended default.
KiloBench
The routing logic is tied to KiloBench, not a hidden heuristic. kilocode's benchmark explainer says KiloBench runs continuously across the model catalog on tasks pulled from real Kilo usage, and kilocode's leaderboard note says users can inspect the same rankings themselves.
That makes this more legible than the usual black-box router. The evidence points to a loop with three parts:
- Benchmark models on real coding tasks, per kilocode's benchmark explainer.
- Publish the rankings on the Kilo leaderboard, per kilocode's leaderboard note.
- Route each request using that leaderboard lookup, per kilocode's leaderboard note.
A week earlier, kilocode's eight-model test also argued for the premise behind the feature: the cheapest model in one controlled code review run was the only one that caught every bug.
Session-aware routing
Kilo is not claiming per-turn roulette. kilocode's session-aware note says the router stays on a model that is already working and only switches when a cheaper option clearly fits, which is meant to avoid context loss and inconsistent output inside a thread.
The fallback policy matters as much as the cheap path:
- If the routing call is clear, Auto Efficient can pick a cheaper model, per kilocode's session-aware note.
- If the call is unclear, it falls back to Balanced, per kilocode's fallback note.
- Balanced is described in that same post as a capable paid model, which sets a quality floor, per kilocode's fallback note.
Routing modes
Kilo also exposed a policy knob instead of hard-coding one cost target. kilocode's settings post shows three routing choices in settings: use the default, optimize for best accuracy per dollar, or optimize for best accuracy.
That means the router is not just choosing a model. It is choosing against a user-selected objective. The launch thread does not spell out the exact threshold math, but the UI evidence shows Kilo treating routing policy as a project-level setting rather than a global account switch.
Visibility and limits
One useful wrinkle surfaced in follow-up replies. In kilocode's reply about public data, Kilo says there is no API endpoint that exposes routing choices directly, but the benchmark inputs to Auto Efficient are public and users can inspect usage history to see which models were used and at what volume.
So the transparency story is partial, not total. The benchmark is public, the usage log is visible, and the routing engine itself is still exposed through product behavior rather than a dedicated API.