Codex Benchmarks DX Tooling Agent Product Launch DX Cost DX Reliability Cursor Orchestration Coding Agents Evals Computer Use Red Teaming Security Claude Code GitHub Copilot

GPT-5.5

A new class of intelligence for real work

Visit site

OpenAI's GPT-5.5 is a frontier language model released on April 23, 2026 for complex professional work, with a 1,050,000-token context window and availability in ChatGPT, Codex, and the API.

Pricing

Model profile · Current snapshot

Input / 1M

$5.00

Output / 1M

$30.00

Blended / 1M

$11.25

Output TPS

62.84

TTFT (s)

0.9

Model Intelligence

Context window

1,050,000 tokens

Arena ranking

Benchmarkable

Yes

Model level

release

Intelligence Index

40.9

Coding Index

48.6

GPQA

0.77

HLE

0.13

SciCode

0.47

IFBench

0.46

LCR

0.56

TerminalBench Hard

0.49

TAU2

0.69

Recent stories

14 linked stories

newsSECONDARY2026-05-04

Copilot users report $221 for 15 GPT-5.5 messages before June 1 billing switch

Ahead of GitHub Copilot's June 1 usage-based billing switch, users documented GPT-5.5 sessions hitting 60M tokens and $221 across 15 messages on the legacy per-message plan. The examples show why flat message buckets break once single requests can run for hours and consume extreme token counts.

newsSECONDARY2026-05-02

Codex users report one-shot fixes and 1.7B-token days vs Claude Code

Developers posted side-by-side reports of faster one-shot fixes, 1.7B-token workdays, and fewer limit warnings with GPT-5.5 fast mode after OpenAI added Claude Code import. The comparisons matter because they turn migration talk into a concrete workflow choice.

newsSECONDARY2026-05-01

ValsAI updates Terminal Bench 2 after `tool_choice` bug, moving GPT-5.5 to #1 with +11%

ValsAI found that undocumented `tool_choice` behavior was skewing Terminal Bench 2 scores when no native tools were used, then reran the evals. The correction lifted GPT-5.5 by 11% to the top slot and showed how much harness settings can move coding-agent results.

newsSECONDARY2026-05-01

ARC Prize reports GPT-5.5 at 0.43% and Opus 4.7 at 0.18% on ARC-AGI-3

ARC Prize published frontier-model results on ARC-AGI-3 and said GPT-5.5 and Opus 4.7 both stayed below 1%, with failures in world modeling, abstraction, and reward reinforcement. That shows strong coding and benchmark models still break on novel interactive reasoning tasks, and follow-up comparisons even had Opus 4.6 slightly ahead of 4.7.

releaseSECONDARY2026-04-30

Codex adds `/goal`, role-based workflows, and 20% faster browser use

OpenAI expanded Codex with role-based work-flows, app connections, in-app previews, and the `/goal` command, while also improving browser use by about 20%. The update lets Codex keep working across docs, slides, spreadsheets, and web actions instead of staying in a single coding thread.

newsPRIMARY2026-04-30

GPT-5.5 ranks at 71.4% on UK AISI cyber eval with 2/10 TLO completions

Multiple summaries of the UK AISI report say GPT-5.5 roughly matches Claude Mythos Preview on long-horizon cyber tasks, including 2 of 10 end-to-end TLO completions. That matters because the model is broadly usable today, shifting cyber-workflow choices toward availability and mitigations rather than gated access alone.

releaseSECONDARY2026-04-28

Codex adds macOS computer use, in-app browser, and artifact previews

Codex gained background macOS control, page inspection, image generation, plugins, artifacts, and follow-up automations. That gives it one agent thread for desktop apps, frontend debugging, and recurring work.

newsPRIMARY2026-04-26

Users report GPT-5.5 speeds up coding and cuts over-editing in low-reasoning runs

New evals and day-three user tests show GPT-5.5 performing well at low or medium reasoning, with benchmark gains over GPT-5.4 in coding-heavy use. That matters because stronger results no longer require xhigh runs, though some users still flag sycophancy.

newsPRIMARY2026-04-25

Tool vendors add GPT-5.5 to Cursor, Databricks, Droid, and ml-intern within 24 hours

Independent tools and platforms shipped GPT-5.5 support within a day of the API rollout, spanning IDEs, hosted research agents, enterprise stacks, and coding agents. That shortens evaluation time because teams can test the model inside existing workflows instead of rebuilding around a single OpenAI surface.

newsPRIMARY2026-04-25

GPT-5.5 users report 4-10x shorter runs and smoother tool calls one day after launch

Users and third-party evals reported shorter runs, stronger long-context scores, and faster rollout into Cursor and other tools a day after GPT-5.5 hit the API. Higher per-token pricing may be partly offset by lower loop time and fewer tool-call stalls, so watch early bench data before changing defaults.

releasePRIMARY2026-04-24

OpenAI opens GPT-5.5 API with 1M context and Responses support

OpenAI added GPT-5.5 and GPT-5.5 Pro to the API and Playground with 1M context and Responses support. Partners including OpenRouter, Perplexity, GitHub Copilot, Vercel, Warp, and Devin rolled it out the same day, widening access beyond Codex.

releaseSECONDARY2026-04-24

Cursor 3.2 adds /multitask async subagents, worktrees, and GPT-5.5

Cursor 3.2 added /multitask async subagents, improved worktrees, and multi-root workspaces, then paired the release with GPT-5.5 rollout at 72.8% on CursorBench. The update makes background agent orchestration a first-class IDE workflow instead of a blocking queue.

newsSECONDARY2026-04-24

Codex users report one-shot bug fixes, 10-hour runs, and lower token burn a day after GPT-5.5 launch

A day after GPT-5.5 and the new Codex workflows launched, developers reported one-shot bug fixes, longer unattended runs, and lower token use in real coding tasks. The early hands-on comparisons matter because they are already shifting some teams' default agent workflow away from Claude Code.

releasePRIMARY2026-04-23

OpenAI releases GPT-5.5 with 82.7% Terminal-Bench and Codex browser control

OpenAI rolled out GPT-5.5 and GPT-5.5 Pro in ChatGPT and Codex, with higher scores on terminal, OS, cyber, and math evals than GPT-5.4. Codex also gained browser, document, and computer-use features for longer agent workflows.