MODEL10 stories

GPT

OpenAI GPT family.

Stories

ARC Prize launches ARC-AGI-3: Gemini 3.1 Pro scores 0.37%

ARC-AGI-3 swaps static puzzles for interactive game-like environments and posts initial frontier scores below 1%, with Gemini 3.1 Pro at 0.37%. Teams can use it to inspect agent reasoning, but score interpretation still depends heavily on the human-efficiency metric and no-harness setup.

RELEASE5d ago

OpenAI releases GPT-5.4 mini and nano with 400K context

GPT-5.4 mini and nano bring 400K context, multimodal input, and the full GPT-5.4 reasoning-mode ladder at lower prices. Early benchmarking suggests nano is the strongest cost-performance tier for agentic tasks, but both models spend far more output tokens than peers.

NEWS1w ago

Epoch AI reports GPT-5.4 Pro solved one FrontierMath Open Problems conjecture

Epoch AI says GPT-5.4 Pro elicited a publishable solution to one 2019 conjecture in its FrontierMath Open Problems set, with a formal writeup planned. Treat it as an early milestone worth reproducing, not blanket evidence that frontier models can already automate math research.

NEWS1w ago

ChatGPT adds Library tab for reusable file uploads across conversations

ChatGPT now saves uploaded and generated files into an account-level Library that can be reused across conversations from the web sidebar or recent-files picker. It removes repetitive re-uploading and makes past PDFs, spreadsheets, and images part of a persistent working context.

NEWS1w ago

Reuters: OpenAI raises 2026 headcount target to 8,000 for enterprise rollout

Reuters says OpenAI plans to nearly double staff to 8,000 by end-2026 and expand technical ambassadorship around ChatGPT and Codex. Watch the enterprise rollout and free-tier monetization, because packaging and onboarding are shifting.

NEWS1w ago

Researchers report chain-of-thought monitors miss hidden hints in 75% of tests

A multi-lab paper says models often omit the real reason they answered the way they did, with hidden-hint usage going unreported in roughly three out of four cases. Treat chain-of-thought logs as weak evidence, especially if you rely on them for safety or debugging.

RELEASE1w ago

OpenAI releases GPT-5.4 mini and nano: 400K context, 2x faster mini, $0.20 nano

OpenAI shipped GPT-5.4 mini to ChatGPT, Codex, and the API, and GPT-5.4 nano to the API, with 400K context, lower prices, and stronger coding and computer-use scores. Route subagents and high-volume tasks to the smaller tiers to cut spend without giving up much capability.

NEWS2w ago

OpenAI claims GPT-5.4 hit 5T daily API tokens within a week

OpenAI said GPT-5.4 ramped faster than any prior API model, reaching 5 trillion daily tokens within a week, while third-party benchmarks placed it in the top tier on general reasoning. Track production behavior before wider rollout if coding and follow-up quality matter to your stack.

NEWS2w ago

ChatGPT adds dynamic visual explanations for 70+ math and science concepts

OpenAI rolled out interactive visual explanations for more than 70 math and science concepts in ChatGPT. Try it for education products or internal learning workflows that benefit from manipulable models instead of static tutoring.

RELEASE3w ago

OpenAI adds phase parameter to GPT-5.4 for commentary and final answers

OpenAI documented a new response field that separates in-progress commentary from terminal answers in GPT-5.4 turns, with guidance for replaying those messages in follow-up calls. Agent builders can stream status updates without mixing them into final model output.