releaseJune 29, 2026

Cognition launches Devin Fusion with mid-session routing and 35% lower Fable-class cost

Cognition launched Devin Fusion, a hybrid coding harness that reroutes work mid-task and says it cuts Fable-class cost by 35%. Use it when upfront routing misses late complexity; the router can re-evaluate after investigation starts.

5 min read

Cognition launches Devin Fusion with mid-session routing and 35% lower Fable-class cost

TL;DR

Cognition shipped Devin Fusion as a new routing layer inside Devin, and cognition's launch thread says it is live now.
According to cognition's benchmark thread, Fusion cuts the cost of Fable-class performance by 35%, while Scott Wu's follow-up broadens that claim to a 30 to 40% range.
cognition's architecture explainer says Fusion keeps a frontier model in charge of planning and review, while a cheaper sidekick agent handles exploration, edits, tests, and bug fixes.
The routing decision is not fixed at the first prompt: cognition's mid-session routing post says Fusion can switch back toward the smarter model after the task turns out to be harder than it looked.
Community reactions from imjaredz's read and apoorv03's comparison both frame Fusion as part of a broader coding-agent shift toward planning on expensive models and executing on cheaper ones.

You can see the benchmark chart where Fusion plus Fable 5 slightly clears plain Fable 5 on score, the sidekick diagram where the main agent keeps review authority, and the routing example where Cognition hands code reading and implementation steps to the cheaper model before pulling the frontier model back in for planning and final review.

FrontierCode

Cognition is pitching Fusion against a failure mode it thinks ordinary routers miss. In cognition's benchmark claim, the company says conventional routing can pass benchmarks while still producing code you would not merge.

That is why the launch leans on FrontierCode, which cognition's FrontierCode post describes as an eval for whether a PR looks production-worthy, not just whether a task technically completes. Scott Wu's thread makes the same point more bluntly: coding models can all pass the task, but their behavior and style are not interchangeable.

The chart attached to cognition's benchmark claim gives the concrete numbers:

Fusion + Fable 5: score 57.6 at $3.00 per task
Fable 5 medium: score 57.0 at $5.12 per task
Fusion: score 47.9 at $2.38 per task
Opus 4.8 high: score 48.8 at $3.24 per task
GPT-5.5 high: score 44.8 at $3.64 per task

That leaves Fusion in an interesting spot. By Cognition's own chart, the hybrid setup is not only cheaper than Fable 5 medium, it edges it out on score too.

Sidekick

Fusion's first core mechanic is a two-agent split. cognition's sidekick explainer says a smaller agent runs in parallel with the frontier agent, while the frontier agent keeps ownership of planning, ambiguity, and final review.

The attached architecture diagram breaks the split into distinct responsibilities:

Sidekick starts with code exploration
Main agent turns that into file snippets and a plan
Sidekick writes code, tests, and lint fixes
Main agent reviews the result and requests edits
Sidekick fixes bugs
Main agent delivers final code

Matt Lam's read in mattlam_'s commentary surfaces the important implementation detail: this is not just task-level dispatch to separate subagents. It can split one user task across turns, with the main and sidekick agents alternating inside the same run.

Mid-session routing

The second core mechanic is when routing happens. cognition's routing post says engineering tasks often reveal their real difficulty late, after the agent has already started reading code and building context.

Cognition's example image shows the sequence explicitly. The frontier model initializes the session, loads notes, and locates the code. The cheaper model then checks out the branch, reads relevant files, and analyzes external APIs. After that, the frontier model returns to make the plan, delegates implementation and checks back out, then comes back again for review and PR creation. A cheaper model handles review-bot comments and CI monitoring at the end too.

Scott Wu, Cognition's CEO, frames the same problem in Scott Wu's thread as a routing blind spot at prompt time: "fix xyz bug" might be a one-line patch or a repo-wide redesign, and you do not know which until investigation is already underway.

Harness engineering

Several reactions treated the launch less as a one-off feature and more as a pattern. apoorv03's comparison explicitly pairs Fusion with Cursor's Auto mode and describes the emerging coding setup as frontier planning plus cheaper execution.

That commentary adds two concrete ideas to the launch:

Multi-model harnesses may outperform a single frontier model on some tasks because they introduce different working styles into the run, according to imjaredz's thread.
The current system appears to use one sidekick today, but mattlam_'s commentary and imjaredz's thread both point toward more opinionated sidekicks as the obvious next extension.

Even the praise post from eliebakouch's reaction focuses on the release framing, not a new base model. The interesting thing shipped here is the harness.

Model-agnostic positioning

Fusion also gives Cognition a clean way to lean into model-agnosticity. dabit3's thread says the point of being model-agnostic is being able to offer "the best" models rather than "our best" models, and presents Fusion as the most advanced version of that strategy so far.

That framing matches the cost story running through adjacent commentary. In imjaredz's enterprise-cost thread, Cognition's Jared Kaplan does not appear, but imjaredz argues that enterprise buyers have been token-sensitive from day one and that model routing, spend controls, and coding-specific evals are now part of the product surface, not back-office plumbing. Fusion is the first launch from Cognition that puts all three on the front page at once.