Cursor releases Composer 2 with $0.50/M input and 61.7 Terminal-Bench 2.0
Cursor shipped Composer 2 with gains on CursorBench, Terminal-Bench 2.0, and SWE-bench Multilingual, plus a fast tier and an early Glass interface alpha. It resets the price-performance baseline for coding agents and shows Cursor is now a model company as much as an IDE.

TL;DR
- Cursor launched Composer 2 inside Cursor, with standard pricing at $0.50 per million input tokens and $2.50 per million output tokens, while its launch thread also introduced a faster tier at $1.50/$7.50 per million.
- Cursor says the model improved to 61.3 on CursorBench, 61.7 on Terminal-Bench 2.0, and 73.7 on SWE-bench Multilingual, with the launch post attributing the jump to its first continued pretraining run plus scaled reinforcement learning.
- The pricing chart in the release graphic positions Composer 2 near GPT-5.4-level CursorBench performance at materially lower median cost per task, shifting the price-performance baseline for IDE-native coding agents.
- Cursor also shipped an early alpha of Glass, a new interface for working with agents, and early practitioner feedback in one user report says Composer 2 is already useful for “targeted fixes” and quick refactors in large codebases.
What exactly shipped
Cursor released Composer 2 as an in-house coding model available directly in the editor, with two serving tiers. The launch thread lists standard at $0.50/M input and $2.50/M output tokens, and fast at $1.50/M input and $7.50/M output tokens; Cursor's blog post says the model is included in usage pools for individual plans and paired with an “early alpha” interface.
That interface is Glass, which Cursor describes in the Glass page as a simpler environment for working with AI agents. Cursor framed the release as both a model update and a product-surface update: the model ships now, while Glass is being shared as an early alpha rather than a finished default UI.
How strong is the model, and why did it improve
Cursor's published numbers show a large jump over prior Composer versions: 61.3 on CursorBench, 61.7 on Terminal-Bench 2.0, and 73.7 on SWE-bench Multilingual, versus Composer 1.5 at 44.2, 47.9, and 65.9 respectively in the benchmark table. The same release material says these gains came from “our first continued pretraining run” and reinforcement learning on “long-horizon coding tasks” that require “hundreds of actions” launch post.
On Terminal-Bench 2.0, the comparison chart places Composer 2 above Opus 4.6 at 61.7 versus 58.0, though still behind GPT-5.4 at 75.1. That makes the claim narrower than “best coding model”: Cursor is showing frontier-adjacent scores, not category leadership across every benchmark, but it is closing the gap while moving a lot of the curve on cost.
Why the price-performance story matters in practice
The key engineering story is the economics. Cursor's price-performance chart places Composer 2 around GPT-5.4's CursorBench range while cutting median cost per task to roughly the low end of GPT-5.4 and far below Opus 4.6 high settings; the fast tier also appears much cheaper than competing fast modes in the speed and price chart.
Early usage reports suggest that tradeoff is already useful even when teams still prefer a stronger model for hardest tasks. One developer wrote that for “a large codebase,” Composer 2 works well for “targeted fixes, quick refactors, and getting specific questions answered” without the long waits, while also saying “it doesn't reach the quality of GPT-5.4” practitioner feedback. Another head-to-head shared by Dan Shipper said Composer 2 beat GPT-5.4 on a production-QA optimization prompt as judged by GPT-5.4 and Opus 4.6, which is anecdotal but consistent with Cursor's pitch that the model is now good enough for real workflow slices rather than just cheap fallback usage.