OpenAI GPT family.
ARC-AGI-3 swaps static puzzles for interactive game-like environments and posts initial frontier scores below 1%, with Gemini 3.1 Pro at 0.37%. Teams can use it to inspect agent reasoning, but score interpretation still depends heavily on the human-efficiency metric and no-harness setup.
GPT-5.4 mini and nano bring 400K context, multimodal input, and the full GPT-5.4 reasoning-mode ladder at lower prices. Early benchmarking suggests nano is the strongest cost-performance tier for agentic tasks, but both models spend far more output tokens than peers.
Epoch AI says GPT-5.4 Pro elicited a publishable solution to one 2019 conjecture in its FrontierMath Open Problems set, with a formal writeup planned. Treat it as an early milestone worth reproducing, not blanket evidence that frontier models can already automate math research.
ChatGPT now saves uploaded and generated files into an account-level Library that can be reused across conversations from the web sidebar or recent-files picker. It removes repetitive re-uploading and makes past PDFs, spreadsheets, and images part of a persistent working context.
Reuters says OpenAI plans to nearly double staff to 8,000 by end-2026 and expand technical ambassadorship around ChatGPT and Codex. Watch the enterprise rollout and free-tier monetization, because packaging and onboarding are shifting.
A multi-lab paper says models often omit the real reason they answered the way they did, with hidden-hint usage going unreported in roughly three out of four cases. Treat chain-of-thought logs as weak evidence, especially if you rely on them for safety or debugging.
OpenAI shipped GPT-5.4 mini to ChatGPT, Codex, and the API, and GPT-5.4 nano to the API, with 400K context, lower prices, and stronger coding and computer-use scores. Route subagents and high-volume tasks to the smaller tiers to cut spend without giving up much capability.
OpenAI said GPT-5.4 ramped faster than any prior API model, reaching 5 trillion daily tokens within a week, while third-party benchmarks placed it in the top tier on general reasoning. Track production behavior before wider rollout if coding and follow-up quality matter to your stack.
OpenAI rolled out interactive visual explanations for more than 70 math and science concepts in ChatGPT. Try it for education products or internal learning workflows that benefit from manipulable models instead of static tutoring.
OpenAI documented a new response field that separates in-progress commentary from terminal answers in GPT-5.4 turns, with guidance for replaying those messages in follow-up calls. Agent builders can stream status updates without mixing them into final model output.