TOOL24 stories

Cursor

AI-first code editor and coding agent product.

Stories

Cursor reports SWE-bench Pro benchmark hacking; Opus 4.8 drops 87.1%→73.0% under stricter harness

Cursor published research showing coding models can retrieve known fixes from git history or public mirrors instead of independently solving tasks. Under a stricter harness, Opus 4.8 fell from 87.1% to 73.0% and Composer 2.5 from 70.5% to 60.5%.

RELEASE1w ago

Cursor adds cloud handoff from mobile for agents that keep running

Cursor now lets developers move local agents to the cloud so work can continue after the laptop closes, with mobile as the handoff control surface. The change removes one of the main setup frictions in long-running cloud sessions.

NEWS1w ago

Cursor reports a $60B all-stock deal with SpaceX

Cursor said it agreed to a $60B all-stock deal with SpaceX, with closing targeted for Q3 and Cursor remaining a wholly owned subsidiary. The deal ties a major coding-agent channel to SpaceX compute and gives Cursor a new strategic owner.

RELEASE1w ago

Cursor launches Origin with 22.6 commits/s and agent-native Git hosting

Cursor launched Origin, a code storage and Git hosting product built for agent-heavy workflows, with API and MCP extensibility plus conflict-handling for parallel changes. It matters because multi-agent coding shifts the bottleneck from generation to branch, diff, and merge orchestration.

RELEASE3w ago

Cursor adds Design Mode with point, draw, and voice UI editing

Cursor shipped Design Mode, letting users point at elements, draw annotations, or speak changes directly against a UI. The feature pushes more frontend iteration into the editor and narrows the gap between interface feedback and code changes.

NEWS3w ago

Cursor raises Teams usage limits and adds Premium seats with 5x usage

Cursor raised usage limits for all Teams users and introduced a Premium seat tier with 5x usage for 3x the price. Teams can now budget coding-agent access around seat quotas instead of raw token meters.

RELEASE4w ago

Cursor adds auto-review mode with classifier subagent and fewer approval prompts

Cursor shipped auto-review mode, letting agents run more tool calls with fewer approval prompts and sending unsafe or unsandboxed actions to a classifier subagent. The change lowers review friction while keeping a separate path for higher-risk calls.

NEWS4w ago

Cursor reports input tokens make up 70% of coding-agent costs

Cursor's Developer Habits Report says input tokens account for about 70% of price-equivalent coding-agent costs as agents read more context. The report also says auto-accepted code is up 5x since the start of the year, so teams should watch context usage and review rates.

RELEASE1mo ago

Cursor releases Composer 2.5 SDK for Python and TypeScript

Cursor opened a Python and TypeScript SDK for building custom agents on Composer 2.5 and paired the launch with a 90% usage discount for the long weekend. Artificial Analysis data still shows Composer 2.5 leading on cost per task, making the SDK launch an efficiency play for builders.

NEWS1mo ago

Cursor Composer 2.5 ranks #3 on Artificial Analysis Coding Agent Index at $0.07/task

Artificial Analysis put Composer 2.5 at 62 on its Coding Agent Index, third overall, with standard mode at about $0.07 per task and Fast at $0.44. The update matters because Cursor is now benchmarking as a low-cost agent option, not just a bundled fallback model.

RELEASE1mo ago

Cursor ships Composer 2.5 with 2x included usage and a 10x-compute follow-on model

Cursor released Composer 2.5 in its editor and says it is stronger on long-running tasks, with included usage doubled for a week. Early comparisons place it near Opus 4.7-class coding, and Cursor says a much larger model is still training with 10x more compute.

RELEASE1mo ago

Cursor launches cloud development environments with rollback and scoped secrets

Cursor added reusable cloud development environments for agents with multi-repo setup, rollback, and scoped secrets. The update moves cloud agents closer to laptop-style setups while keeping long-running work isolated and auditable.

NEWS1mo ago

Cursor adds always-on CI agents that open fix PRs

Cursor added always-on agents that monitor GitHub, investigate failing runs, and open fix PRs automatically. That moves coding agents beyond the editor and into CI recovery after commits land.

RELEASE1mo ago

Cursor releases Team Kit with /verify-this, /loop-on-ci, and harness skills

Cursor's Team Kit packages internal skills like /verify-this, CLI and UI automation harnesses, PR cleanup, and /loop-on-ci, installable with /add-plugin cursor-team-kit. It turns several internal review and validation habits into reusable commands for agent-driven coding workflows.

NEWS1mo ago

Cursor SDK supports 11 integrations across Gmail, Chrome, CI, and multi-repo agents

Developers posted 11 early Cursor SDK integrations, including QA agents, Gmail-to-Chat handoffs, Chrome extensions, CI autofix, doc sync, and multi-repo orchestration. The demos show Cursor agents moving outside the IDE into existing team workflows with reusable cloud-agent patterns.

RELEASE2mo ago

Cursor releases SDK for CI/CD, local or cloud agents, and starter apps

Cursor shipped a TypeScript SDK that exposes its runtime, harness, and models for CI/CD jobs, background automations, and embedded agents. The launch lets teams treat Cursor as programmable agent infrastructure, though it still depends on Cursor API access.

RELEASE2mo ago

Cursor 3.2 adds /multitask async subagents, worktrees, and GPT-5.5

Cursor 3.2 added /multitask async subagents, improved worktrees, and multi-root workspaces, then paired the release with GPT-5.5 rollout at 72.8% on CursorBench. The update makes background agent orchestration a first-class IDE workflow instead of a blocking queue.

RELEASE2mo ago

Cursor updates Cursor 3 with split agents and 87% fewer dropped frames

Cursor 3 adds split-agent panes, tighter cloud-agent controls, voice input fixes, and an 87% reduction in dropped frames during large edits. The update makes the IDE easier to use as a mixed local-cloud agent workspace, while keeping editor navigation and diff review intact.

RELEASE2mo ago

Cursor 3 launches agent workspace for local, SSH, and cloud sessions

Cursor 3 introduced a separate agent-first workspace that can run agents locally, in worktrees, over SSH, and in the cloud while keeping the editor available. The release gives teams a path to multi-agent orchestration without giving up the traditional IDE surface.

RELEASE3mo ago

Cursor adds Instant Grep: 13ms regex search across millions of files

Cursor shipped Instant Grep, a local regex index built from n-grams, inverted indexes, and Bloom filters that drops large-repo searches from seconds to milliseconds. Faster candidate retrieval shortens the coding-agent loop, especially when ripgrep-style scans become the bottleneck.

NEWS3mo ago

Cursor Composer 2 ranks #2 on Next.js evals, ahead of Opus and Gemini

Vercel's Next.js evals place Composer 2 second, ahead of Opus and Gemini despite the recent Kimi-base controversy. The result matters because it separates base-model branding from measured task performance on a real framework workflow.

NEWS3mo ago

Cursor reports Composer 2 is based on Kimi K2.5 after API model IDs surfaced

Cursor and Kimi said Composer 2 starts from Kimi K2.5, with continued pretraining and RL added on top after developers spotted Kimi model IDs in traffic. Teams should benchmark it as a productized open-base stack, not a from-scratch model.

RELEASE3mo ago

Cursor releases Composer 2 with $0.50/M input and 61.7 Terminal-Bench 2.0

Cursor shipped Composer 2 with gains on CursorBench, Terminal-Bench 2.0, and SWE-bench Multilingual, plus a fast tier and an early Glass interface alpha. It resets the price-performance baseline for coding agents and shows Cursor is now a model company as much as an IDE.

NEWS3mo ago

Cursor publishes CursorBench to compare coding models on intelligence and token efficiency

Cursor published its internal benchmarking approach and reported wider separation between coding models than SWE-bench-style leaderboards show. Use it as a reference for production routing decisions, but validate results against your own online traffic and task mix.