releaseJune 2, 2026

Factory introduces Router with 25% lower AI spend and 99% of Opus 4.7 Terminal-Bench 2

Factory put Router into private preview in its CLI and desktop app to route coding tasks across models, claiming 20-25% lower spend. The launch targets rising agent costs, though session continuity and routing behavior remain active points of debate.

5 min read

Factory introduces Router with 25% lower AI spend and 99% of Opus 4.7 Terminal-Bench 2

TL;DR

Factory put FactoryAI's launch post into private research preview in its CLI and Desktop App, with the company saying Router cuts token spend by 20 to 25 percent while preserving frontier-level coding performance; the underlying Factory blog post says it appears in the model picker once enabled for an org.
On Factory's own benchmarks, FactoryAI's announcement and the official writeup claim Terminal-Bench 2 reaches 99 percent of Claude Opus 4.7's pass rate at 20 percent lower cost per session, while Legacy-Bench reaches 96 percent at 25 percent lower cost.
According to FactoryAI's follow-up post, admins can feed Router org-specific rules and context, and the launch post says the same policy surface can allow or block Router org-wide.
Factory says in FactoryAI's reply on switching that the model is chosen at session start and only escalates when the benefit of switching outweighs the cost, which is exactly where critics like glennko's response and sqs's thread think production routing gets tricky.

You can read the full launch post, check the Terminal-Bench 2 leaderboard, and skim a separate Red Hat routing writeup that lands on a very different caveat: routing accuracy gets ugly fast once the classifier misses realistic production traffic.

Session routing

Factory is pitching Router as a per-session model selector for coding agents, not a static default model. The official announcement says each Droid session starts on the model Factory thinks fits the work, with cheaper models taking routine work and stronger frontier models staying available for harder runs.

According to FactoryAI's admin-rules post, orgs can shape that automatic selection with their own rules and context. The blog post adds two concrete knobs:

org-wide model policy, including whether Router is allowed at all
routing rules and context, including workflow patterns, codebase areas, toolchains, and model preferences

That makes the product less like a blind auto-switcher and more like a policy-driven broker sitting inside Factory's existing control plane.

Benchmarks and Pareto frontier

The headline numbers come from Factory's own engineering evals, summarized in the launch post:

Terminal-Bench 2: 99% of Claude Opus 4.7's pass rate at 20% lower cost per session
Legacy-Bench: 96% of Claude Opus 4.7's pass rate at 25% lower cost per session
cost per successful run: 80.5% of Opus on Terminal-Bench 2, 78.0% on Legacy-Bench

The more interesting detail is the curve behind those numbers. Factory says Router operates on the flat part of the cost-performance Pareto frontier, where cost drops before pass rate bends downward sharply. In the same post, the company says its most aggressive routing cut Terminal-Bench 2 cost to 56 percent of Opus, but pass rate fell to 81 percent; on Legacy-Bench, 30 percent of Opus cost came with only 49 percent of Opus pass rate.

That framing also answers a common cheap-model objection. As rohanpaul_ai's summary puts it, the idea is not replacing frontier models with weaker ones, but peeling off the easy sessions and leaving the hard work on Opus-class models.

Reliability and admin rules

Factory's blog is doing two jobs at once. One is cost optimization, the other is uptime. The post says Router can fail over across providers when an endpoint degrades, and claims 99.9%+ request reliability through routing across models, providers, and capacity sources.

The same writeup lists four reliability layers:

provider failover when one path is unavailable
dedicated TPM for enterprise throughput
continued access to frontier model classes for harder work
US-hosted open-source models for teams that want cheaper or more controlled options

That is broader than the launch tweet. Router is being sold as a cost-control feature, but the official page also positions it as a reliability layer for long-running Droid and Mission Worker sessions.

Switching costs

Factory's clearest implementation detail showed up in a reply, not the launch page. According to FactoryAI's switching reply, Router picks a model at the start of the session and only escalates when the benefit of switching outweighs the cost.

That is the exact fault line in the reaction. glennko's response argued that moving between models can break production behavior and that only the customer can really evaluate those tradeoffs. In a longer thread, sqs from Sourcegraph said Amp has tested cheaper models on real coding-agent tasks and often found the frontier model was still the fastest and cheapest end to end, because weaker models burned extra tokens and time recovering from mistakes.

The broader routing debate is older than this launch. In a same-day Red Hat article, the company said a pretrained semantic router hit only 80 percent accuracy on a four-tier classification task, which meant one in five requests was misrouted under realistic workloads. Factory's bet is narrower and more opinionated: session-level routing inside a coding-agent product, with benchmark tuning and org-specific rules, can do better than generic prompt classification.

Scott Leffler, Box CTO, made the bullish version of that case in levie's thread, calling routing an inevitable response to rising token budgets and a likely point of differentiation for the applied AI layer.

TL;DR

Session routing

Benchmarks and Pareto frontier

Reliability and admin rules

Switching costs

Discussion across the web