Skip to content
AI Primer
update

Cursor Composer 2.5 ranks #3 on Artificial Analysis Coding Agent Index at $0.07/task

Artificial Analysis put Composer 2.5 at 62 on its Coding Agent Index, third overall, with standard mode at about $0.07 per task and Fast at $0.44. The update matters because Cursor is now benchmarking as a low-cost agent option, not just a bundled fallback model.

4 min read
Cursor Composer 2.5 ranks #3 on Artificial Analysis Coding Agent Index at $0.07/task
Cursor Composer 2.5 ranks #3 on Artificial Analysis Coding Agent Index at $0.07/task

TL;DR

  • According to Artificial Analysis' benchmark thread, Cursor's Composer 2.5 scored 62 on the Coding Agent Index, up from 48 for Composer 2 and good for third place behind higher-effort Claude Opus 4.7 and GPT-5.5 variants.
  • Artificial Analysis' cost comparison put Composer 2.5 standard at about $0.07 per task and Fast at $0.44, versus $4.10 for Opus 4.7 max in Claude Code and $4.82 for GPT-5.5 xhigh in Codex.
  • The big benchmark jump came on SWE-Bench-Pro-Hard-AA, where Artificial Analysis measured Composer 2.5 at 47 percent versus 12 percent for Composer 2, a 35 point gain.
  • Speed is part of the pitch too: Artificial Analysis' Fast-vs-standard breakdown measured Fast at 6.7 minutes per task versus 9.3 minutes for standard, with the faster mode costing about 6x more.
  • Cursor's own launch post in Cursor's launch thread framed Composer 2.5 as better on long-running tasks and complex instructions, while altryne's repost of Cursor's compute chart highlighted that 85 percent of its compute came from Cursor's additional training and RL rather than the Kimi base.

You can browse Artificial Analysis' coding-agent index, compare it with Cursor's launch chart, and see Cursor quietly telegraph the next step in testingcatalog's screenshot of the SpaceXAI note. The strange bit is that the headline here is not just a new bundled model, it is a bundled model suddenly showing up as the cheap frontier-ish option.

Artificial Analysis ranking

Artificial Analysis put Composer 2.5 in third place on its Coding Agent Index at 62. Only Claude Opus 4.7 max in Claude Code, at 66, and GPT-5.5 xhigh in Codex, at 65, ranked higher in that test setup.

The more interesting number is price. In Artificial Analysis' cost comparison, the standard variant landed at $0.07 per task and Fast at $0.44, which is why the firm called it the cheapest agent scoring above 60 on the index.

Benchmark gains

The gain over Composer 2 was not evenly distributed. Artificial Analysis broke it out like this:

  • SWE-Bench-Pro-Hard-AA: 12% to 47%, +35 points
  • Terminal-Bench v2: 64% to 66%, +2 points
  • SWE-Atlas-QnA: 69% to 72%, +3 points

Cursor's own launch materials in the launch thread used a different comparison table, but landed in the same neighborhood: near-parity with Opus 4.7 on Terminal-Bench 2.0 and SWE-Bench Multilingual, and a 63.2 percent score on CursorBench v3.1 for harder tasks.

Fast and standard

Artificial Analysis said Cursor is serving the same Composer 2.5 model in two variants. Its measurements put Fast at 6.7 minutes per task versus 9.3 minutes for standard, roughly 39 percent faster, with token pricing rising from $0.50 and $2.50 per million input and output tokens to $3.00 and $15.00.

That makes Fast the default convenience tier and standard the weirdly cheap one. Artificial Analysis' full thread still places both on the cost-quality frontier, but the bargain headline belongs to standard mode.

Kimi base, Cursor training

Composer 2.5 is still built on Moonshot's Kimi K2.5 base, according to Artificial Analysis' model details and Kimi_Moonshot's repost of Cursor's note. The unusual disclosure is the compute split: the shared chart says 85 percent of Composer 2.5's compute came from Cursor's additional training and RL, with Kimi K2 and K2.5 each contributing 7.5 percent.

Cursor employees also hinted that the model had already been running internally before launch. In Dan Perks' post, he said most of the company had their chats redirected to Composer 2.5 for two days and he did not notice the switch. Separate posts from a retweet of Michael Truell's usage note and testingcatalog's screenshot of Cursor's next-model note added two rollout details: Composer 2.5 quickly became Cursor's most-chosen model, and Cursor says the next model is being trained from scratch with SpaceXAI using 10x more compute.

Further reading

Discussion across the web

Where this story is being discussed, in original context.

On X· 4 threads
TL;DR1 post
Artificial Analysis ranking1 post
Fast and standard1 post
Kimi base, Cursor training4 posts
Share on X