Skip to content
AI Primer

OpenAI's released language model for complex, real-world work across coding, research, analysis, document creation, and tool use.

Pricing

Artificial Analysis · Jun 17, 2026, 1:00 PM
Input / 1M
$5.00
Output / 1M
$30.00
Blended / 1M
$11.25
Output TPS
52.68
TTFT (s)
0.85

Model Intelligence

Arena ranking
42
Benchmarkable
Yes
Model level
release
Intelligence Index
32.7
Coding Index
48.6
GPQA
0.77
HLE
0.13
SciCode
0.47
IFBench
0.46
LCR
0.56
TerminalBench Hard
0.49
TAU2
0.69

Recent stories

32 linked stories
releaseSECONDARY2026-06-15
TryCua launches Cua-Bench for KiCad; GPT-5.5 clears 6 of 25 tasks

TryCua and Snorkel opened Cua-Bench, a computer-use benchmark with 25 expert-authored KiCad tasks graded by exact netlist matches. The early results show frontier models still struggle with GUI execution, wiring completion, and self-checking, so treat benchmark wins as incomplete for real computer-use work.

newsSECONDARY2026-06-14
Fable users compare GLM-5.2, GPT-5.5, and model panels on one-shot UI work

Two days after Fable 5 went offline, developers started testing GLM-5.2, GPT-5.5, and multi-model panels against the kinds of one-shot frontend and greenfield builds Fable handled well. The early pattern is that replacements cover much of the work, but Fable still leads on UI taste and first-pass product completion.

newsSECONDARY2026-06-08
Cognition benchmarks FrontierCode: top model scores 13% with mergeability grading

Cognition introduced FrontierCode, a coding benchmark that grades mergeability and review quality instead of only unit-test passes, and the top model scored 13%. The result matters because it differs from SWE-Bench-style pass rates, and outside researchers are already questioning score variance and reproducibility.

newsSECONDARY2026-06-04
Arena launches Agent Mode rankings with GPT-5.5 High leading

Arena shipped Agent Mode, a benchmark that lets models use web search, bash, file writing, image generation, and follow-up questions, then ranks them on five live-session signals. It matters because agent evals move from static task sets to real user workflows, with GPT-5.5 High currently leading the leaderboard.

newsSECONDARY2026-05-31
Developers report Codex beats Claude Code on DeepSWE, token burn, and multi-hour /goal sessions

Independent users compared GPT-5.5/Codex with Opus 4.8/Claude Code using DeepSWE cost charts, GBA Eval runs, and long coding sessions. The split matters because engineers choosing a daily coding stack now have external quality-versus-cost evidence instead of only vendor launch claims.

newsSECONDARY2026-05-31
Opus 4.8 users report token burn, failed tool calls, and DeepSWE gaps

Three days after Opus 4.8 launched, new tests and field reports added failed tool calls, Bash-specific breakdowns, and higher token burn to the complaint list. Users report materially worse cost and stability in long coding sessions, while DeepSWE and GBA Eval point in different directions.

newsSECONDARY2026-05-30
Opus 4.8 users report write failures, sycophancy, and 58% DeepSWE

Two days after launch, users and benchmarks pointed to write failures, sycophancy, lower security recall, and a 58% DeepSWE result. GPT-5.5 still leads on cost, output tokens, and pass@1 in shared coding-agent tests, so compare both before switching.

releaseSECONDARY2026-05-28
Claude Opus 4.8 ships with 69.2% SWE-Bench Pro and 2.5x Fast mode

Anthropic released Claude Opus 4.8 across Claude, the API, and major clouds with higher coding scores and a cheaper 2.5x-speed Fast mode. Use it for coding workloads that want better benchmark performance without a price increase over 4.7.

releasePRIMARY2026-05-28
OpenAI updates GPT-5.5 Instant with writing blocks and less bullet-heavy replies

OpenAI rolled a new GPT-5.5 Instant into ChatGPT and the API with less bullet-heavy output, better pacing, and higher multilingual quality. The update also replaces Canvas in GPT-5.5 Instant and Thinking with in-chat writing and code blocks, so users should migrate workflows while legacy models still keep Canvas temporarily.

newsSECONDARY2026-05-27
Codex removes GPT-5.2 and GPT-5.3-Codex on June 2

OpenAI said ChatGPT-linked Codex will drop GPT-5.2 and GPT-5.3-Codex on June 2, with GPT-5.5 becoming the default frontier model for free users. The API versions stay available, but the in-product model surface is being reduced for compute-fleet management.

releaseSECONDARY2026-05-27
DeepSWE benchmarks GPT-5.5 at 70% on 113 tasks across 91 repos

DeepSWE launched a coding benchmark built from 113 original tasks across 91 repos and five languages, with GPT-5.5 leading at 70%. The setup is meant to better reflect repo search, multi-file edits, and verification in real agent workflows.

workflowSECONDARY2026-05-16
Codex users report 2-hour mech-interp runs and 150-hour tasks with `/goal`

Days after `/goal` workflows first surfaced, users showed the command also works in the Codex app and shared runs for SSH setup, mech-interp scripts, and recurring work that lasted hours or days. The evidence points to Codex being used as a long-running research and ops agent, though the app still lacks explicit `/goal` UI.

newsSECONDARY2026-05-15
OpenAI fixes two GPT-5.5 issues in Codex after users report looping runs

OpenAI said Codex’s GPT-5.5 degradation over the prior 48 hours came from two issues and it will reset usage limits after the fix. Users had reported looping runs, higher cache burn, and unstable sessions in active coding workflows.

releaseSECONDARY2026-05-11
OpenAI launches Daybreak with GPT-5.5-Cyber, Codex workflows, and repo scanning

OpenAI launched Daybreak, combining GPT-5.5, Codex workflows, repo scanning, threat modeling, and patch generation for cyber-defense teams. It packages frontier models into a continuous secure-software workflow, so teams can test whether it fits their response pipeline.

newsPRIMARY2026-05-10
GPT-5.5 users report 3.3M cached tokens and 2.5x /fast credits

Engineers shared fresh measurements on GPT-5.5 cache reuse, /fast pricing, and bug-finding budgets after comparison posts for GPT-5.5 and Opus 4.7 led the coding round-up. The reports suggest Codex cost and quality now swing on cache behavior and effort settings as much as on list prices.

newsPRIMARY2026-05-09
GPT-5.5 vs Opus 4.7: users compare plan mode, frontend output, and 120K-context use

User posts and HN threads compared GPT-5.5 and Opus 4.7 across plan mode, frontend work, and 120K-context sessions. The split results mean token burn and instruction discipline matter as much as raw benchmark scores.

releaseSECONDARY2026-05-07
OpenAI rolls out GPT-5.5-Cyber limited preview for critical-infrastructure defenders

OpenAI introduced GPT-5.5-Cyber in limited preview for defensive security teams and paired it with GPT-5.5 plus Trusted Access for Cyber. The release matters because OpenAI is separating cyber-specific access and permissiveness from general-model access rather than treating security work as a normal prompting mode.

releasePRIMARY2026-05-05
ChatGPT ships GPT-5.5 Instant by default with Memory Sources

OpenAI is rolling GPT-5.5 Instant into ChatGPT as the default model and exposing it as gpt-5.5-chat-latest, alongside Memory Sources for personalized replies. The model also claims 52.5% fewer high-stakes hallucinations, so watch for behavior changes in production prompts.

newsSECONDARY2026-05-04
Copilot users report $221 for 15 GPT-5.5 messages before June 1 billing switch

Ahead of GitHub Copilot's June 1 usage-based billing switch, users documented GPT-5.5 sessions hitting 60M tokens and $221 across 15 messages on the legacy per-message plan. The examples show why flat message buckets break once single requests can run for hours and consume extreme token counts.

newsSECONDARY2026-05-02
Codex users report one-shot fixes and 1.7B-token days vs Claude Code

Developers posted side-by-side reports of faster one-shot fixes, 1.7B-token workdays, and fewer limit warnings with GPT-5.5 fast mode after OpenAI added Claude Code import. The comparisons matter because they turn migration talk into a concrete workflow choice.

newsSECONDARY2026-05-01
ValsAI updates Terminal Bench 2 after `tool_choice` bug, moving GPT-5.5 to #1 with +11%

ValsAI found that undocumented `tool_choice` behavior was skewing Terminal Bench 2 scores when no native tools were used, then reran the evals. The correction lifted GPT-5.5 by 11% to the top slot and showed how much harness settings can move coding-agent results.

newsSECONDARY2026-05-01
ARC Prize reports GPT-5.5 at 0.43% and Opus 4.7 at 0.18% on ARC-AGI-3

ARC Prize published frontier-model results on ARC-AGI-3 and said GPT-5.5 and Opus 4.7 both stayed below 1%, with failures in world modeling, abstraction, and reward reinforcement. That shows strong coding and benchmark models still break on novel interactive reasoning tasks, and follow-up comparisons even had Opus 4.6 slightly ahead of 4.7.

newsPRIMARY2026-04-30
GPT-5.5 ranks at 71.4% on UK AISI cyber eval with 2/10 TLO completions

Multiple summaries of the UK AISI report say GPT-5.5 roughly matches Claude Mythos Preview on long-horizon cyber tasks, including 2 of 10 end-to-end TLO completions. That matters because the model is broadly usable today, shifting cyber-workflow choices toward availability and mitigations rather than gated access alone.

releaseSECONDARY2026-04-30
Codex adds `/goal`, role-based workflows, and 20% faster browser use

OpenAI expanded Codex with role-based work-flows, app connections, in-app previews, and the `/goal` command, while also improving browser use by about 20%. The update lets Codex keep working across docs, slides, spreadsheets, and web actions instead of staying in a single coding thread.

releaseSECONDARY2026-04-28
Codex adds macOS computer use, in-app browser, and artifact previews

Codex gained background macOS control, page inspection, image generation, plugins, artifacts, and follow-up automations. That gives it one agent thread for desktop apps, frontend debugging, and recurring work.

newsPRIMARY2026-04-26
Users report GPT-5.5 speeds up coding and cuts over-editing in low-reasoning runs

New evals and day-three user tests show GPT-5.5 performing well at low or medium reasoning, with benchmark gains over GPT-5.4 in coding-heavy use. That matters because stronger results no longer require xhigh runs, though some users still flag sycophancy.

newsPRIMARY2026-04-25
GPT-5.5 users report 4-10x shorter runs and smoother tool calls one day after launch

Users and third-party evals reported shorter runs, stronger long-context scores, and faster rollout into Cursor and other tools a day after GPT-5.5 hit the API. Higher per-token pricing may be partly offset by lower loop time and fewer tool-call stalls, so watch early bench data before changing defaults.

newsPRIMARY2026-04-25
Tool vendors add GPT-5.5 to Cursor, Databricks, Droid, and ml-intern within 24 hours

Independent tools and platforms shipped GPT-5.5 support within a day of the API rollout, spanning IDEs, hosted research agents, enterprise stacks, and coding agents. That shortens evaluation time because teams can test the model inside existing workflows instead of rebuilding around a single OpenAI surface.

newsSECONDARY2026-04-24
Codex users report one-shot bug fixes, 10-hour runs, and lower token burn a day after GPT-5.5 launch

A day after GPT-5.5 and the new Codex workflows launched, developers reported one-shot bug fixes, longer unattended runs, and lower token use in real coding tasks. The early hands-on comparisons matter because they are already shifting some teams' default agent workflow away from Claude Code.

releaseSECONDARY2026-04-24
Cursor 3.2 adds /multitask async subagents, worktrees, and GPT-5.5

Cursor 3.2 added /multitask async subagents, improved worktrees, and multi-root workspaces, then paired the release with GPT-5.5 rollout at 72.8% on CursorBench. The update makes background agent orchestration a first-class IDE workflow instead of a blocking queue.

releasePRIMARY2026-04-24
OpenAI opens GPT-5.5 API with 1M context and Responses support

OpenAI added GPT-5.5 and GPT-5.5 Pro to the API and Playground with 1M context and Responses support. Partners including OpenRouter, Perplexity, GitHub Copilot, Vercel, Warp, and Devin rolled it out the same day, widening access beyond Codex.

releasePRIMARY2026-04-23
OpenAI releases GPT-5.5 with 82.7% Terminal-Bench and Codex browser control

OpenAI rolled out GPT-5.5 and GPT-5.5 Pro in ChatGPT and Codex, with higher scores on terminal, OS, cyber, and math evals than GPT-5.4. Codex also gained browser, document, and computer-use features for longer agent workflows.

AI PrimerAI Primer

Your daily guide to AI tools, workflows, and creative inspiration.

© 2026 AI Primer. All rights reserved.