Skip to content
AI Primer
breaking

OpenRouter adds cache-hit pricing telemetry as Devin exposes adaptive routing

Vendors pushed routing and spend controls closer to the default app layer, including OpenRouter's cache-hit pricing telemetry and Devin's adaptive routing. The discussion frames model choice more as a budget-control problem than a pure quality setting.

6 min read
OpenRouter adds cache-hit pricing telemetry as Devin exposes adaptive routing
OpenRouter adds cache-hit pricing telemetry as Devin exposes adaptive routing

TL;DR

  • OpenRouter added cache-hit pricing telemetry to model pricing pages, so teams can now see real-time cache hit rate, historical traffic, and a rolling "effective price" instead of just list price; its pricing docs say repeated context can make effective input cost 60 to 80 percent cheaper through prompt caching (OpenRouter pricing page).
  • dabit3's reply about Devin confirmed Adaptive routing is already live in Devin CLI and Desktop, and Devin's Adaptive docs describe it as an automatic model router that picks the best model for each task.
  • Scott Wu and cognition framed the next budget fight as output, not tokens, then paired that with an AI Productivity Guarantee that refunds usage credits if measured engineering value falls short, up to $10 million.
  • FactoryAI pitched routing as a reliability feature as much as a savings feature, while Factory's Router announcement claims 20 to 25 percent lower token spend with provider failover and dedicated throughput.
  • Even the gateway layer is moving from rate limits to cost controls: threepointone's Cloudflare retweet pointed to new Cloudflare AI Gateway spend limits that block requests when a dollar budget is exhausted.

You can browse OpenRouter's pricing telemetry, check Devin's Adaptive router docs, and read Cognition's measurement writeup. Factory published both a Router launch post and docs page, while Cloudflare quietly turned AI Gateway into a budget enforcer with spend limits and dynamic routing.

OpenRouter pricing telemetry

OpenRouter's new move is small but unusually concrete: it exposes cache hit rate and historical traffic directly on the pricing tab for a model page, not in a separate observability product. The Claude Opus 4.8 pricing page shows provider-level price tables, weighted average input and output pricing, and a 30 day rolling effective price chart.

The important distinction is between list price and effective price. OpenRouter's own page says list price is the headline rate, while effective price is what you actually pay after prompt caching, and repeated context can make that 60 to 80 percent cheaper.

The routing docs add one useful implementation detail that OpenRouter surfaced in reply form: once a conversation gets cached by a provider, OpenRouter pins both model and provider until the cache expires. The Auto Router docs say that stickiness lasts until 5 minutes of inactivity, resets on each successful request, and can be forced earlier with session_id.

Devin Adaptive

Cognition is packaging routing one layer higher, inside the coding agent itself. The Adaptive docs say Devin can route each task to a different underlying model, with /model adaptive, --model adaptive, or a config default selecting it in the CLI, and a model picker exposing it in Desktop.

dabit3's productivity thread bundled that router into a broader cost-control pitch:

  1. Adaptive Routing: automatically choose the best model for each task.
  2. Spend Attribution: classify sessions by outcome, like feature work, bug fixing, migrations, or tests.
  3. Advanced automations: trigger agent work from incidents, deploy failures, and alerts.
  4. AI Productivity Guarantee: fund usage if measured value comes in below spend.

That guarantee is only interesting because Cognition published the measuring stick. The technical writeup says the company built its estimator from 258,258 Devin sessions across 126,126 users, using live interviews and surveys to estimate how long the same work would have taken without Devin. The guarantee page limits the offer to enterprise Devin Cloud deployments that meet technical and engagement requirements.

This is the sharpest shift in the story: routing is no longer being sold only as a way to shave model bills. It is being bundled with accounting, task classification, and an explicit claim that the unit of measurement should be engineering value, not raw token volume.

Factory Router and model-agnostic harnesses

Factory is making the most direct enterprise router pitch. Factory's launch post claims Router cuts token spend by 20 to 25 percent while keeping 99 percent of Claude Opus 4.7's Terminal-Bench 2 pass rate and 96 percent of its Legacy-Bench pass rate, then routes across providers when an endpoint degrades.

The Factory Router docs put it in Research Preview for the Factory CLI and App. The company says it mixes session-level and per-request routing, preserves prompt cache efficiency where possible, and applies the same org, project, and user controls as other models.

kilocode's benchmark comparison supplied the cleanest outside cost spread in the evidence set: MiMo-V2.5-Pro at 47.6 percent on Terminal Bench 2.0 for $4.92 a run, versus GPT-5.5 at 74.2 percent for $72.63. That gap helps explain why kilocode's harness post and kilocode's install page link keep stressing model swapability under a stable agent surface.

matanSF made the other half of the router argument in one line: reliability. Factory says Router includes auto-failover across providers, while FactoryAI's repost adds the stronger claim that customers can get more nines of reliability from the routed layer than from any single upstream model provider.

AI Gateway spend limits

The final piece of the stack is below the app layer. Cloudflare's AI Gateway spend-limits changelog says requests can now be blocked when cumulative dollar spend crosses a configured budget, with limits scoped by model, provider, or custom metadata like user or app.

That is materially different from ordinary rate limiting. The spend-limits docs say the system tracks actual cost from token usage and model pricing, then enforces fixed or sliding budget windows. A team can cap a user at $200 per day, a model at $50 per day, or a whole gateway at $10,000 per day.

Cloudflare's dynamic routing docs describe the same architectural direction showing up elsewhere in this story: routing logic, quotas, and fallbacks move into a configurable control plane, and the app no longer has to hard-code a single model. Once spend limits, sticky routing, and per-task model choice all land in that layer, model selection stops looking like a prompt setting and starts looking like infrastructure.

Further reading

Discussion across the web

Where this story is being discussed, in original context.

On X· 2 threads
Devin Adaptive1 post
Factory Router and model-agnostic harnesses5 posts
Share on X