AI Primer for Engineers — Daily AI Changelog

Fresh stories

New

Fable 5 opens next week pending Pentagon and NSA sign-off, Axios reports

Axios reported that Fable 5 could return as soon as next week after progress on safety controls and trusted-user access, though Defense and NSA approval is still pending. The update matters because it is the clearest public timeline yet for restoring access to Anthropic’s gated flagship model.

Regulation27th June

Release

DeepSeek V4-Pro benchmarks at ~90 tok/s after DSpark rollout

Independent measurements after DSpark put DeepSeek V4-Pro around 90 tok/s and cut one run from 214s to 116s. The gain matters because it lowers serving cost, though tuning details and memory overhead are still unclear.

New

Inference Optimization·27th June·5 min read

New

Codex supports thread automations with /goal, /btw, and heartbeat wake-ups

Codex users documented thread automations as recurring wake-up calls that preserve thread context, alongside /goal and /btw patterns for steering long-running loops. The workflow matters because teams can schedule check-ins, queue instructions mid-run, and add adversarial review passes without building a separate orchestrator.

WorkflowCodex27th June

New

OpenRouter reports four open-weight models handle agents; Chinese models hit 45% of traffic

OpenRouter said four open-weight models now handle real agentic workloads, and a JPMorgan report put Chinese models at about 45% of platform traffic. The shift matters because teams are optimizing for price, hosting, and task fit instead of defaulting to frontier APIs.

Model Routing27th June

Release

Sakana Fugu Ultra opens on Vercel AI Gateway

Sakana made Fugu Ultra available through Vercel AI Gateway, while new technical writeups described the trained routing head and multi-step orchestration behind it. The integration matters because teams can invoke Fugu’s model-selection workflow through existing gateway plumbing instead of standing up custom routing.

New

Orchestration·27th June·3 min read

New

Junior adds memory and cuts one analytics task from 3m to 1m

Junior’s first memory system cut one analytics task from about 3 minutes to 1 minute in early tests, with tokens down two-thirds and tool calls down 60%. The feature moves persistent task learning into the agent loop, though the results are still internal.

ReleaseContext Engineering27th June

New

GLM-5.2 ranks 30/99 on PrinzBench as testers report legal hallucinations

PrinzBench added GLM-5.2 and scored it 30/99 for legal research, while a separate LisanBench run placed GLM-5.2-high at #29 and noted high token use. The result matters because it cuts against code-centric GLM hype and points to weak search, statute fidelity, and reasoning on professional legal tasks.

GLM27th June

Breaking

OpenCode v2 introduces one backend for TUI, desktop, and web sessions

OpenCode v2 moves its TUI, desktop, and web clients onto a shared backend so sessions stay synced and resource use drops across windows. The beta matters for multi-window agent workflows, though the next build still lacks features.

New

OpenCode·27th June·3 min read

New

Datalab ranks 95.9% on a 225-document extraction benchmark at under half Reducto's price

Datalab’s balanced extraction mode scored 95.9% on a 225-document benchmark and beat Reducto Deep Extract’s 95.1%, according to Vik Paruchuri. The update also adds citations and reasoning, but the benchmark and price comparison are vendor-reported.

ReleaseBenchmarks27th June

New

Codex adds hover navigation rail and longer thread history in desktop update

OpenAI shipped another Codex desktop update with smoother long-thread scrolling, deeper local history, better settings search, and a hover navigation rail. The release matters because long-running sessions keep your place and copy richer Markdown into Slack.

ReleaseCodex27th June

See all stories →

New27th June

DeepSeek V4-Pro benchmarks at ~90 tok/s after DSpark rollout

ReleaseInference Optimization27th June

New27th June

Codex supports thread automations with /goal, /btw, and heartbeat wake-ups

WorkflowCodex27th June

New27th June

OpenRouter reports four open-weight models handle agents; Chinese models hit 45% of traffic

Model Routing27th June

New27th June

Datalab ranks 95.9% on a 225-document extraction benchmark at under half Reducto's price

ReleaseBenchmarks27th June

Codex adds hover navigation rail and longer thread history in desktop update

ReleaseCodex27th June

🤖Agentic Engineering(23)

🧩Agent Development(5)

🧠Models & APIs(3)

⚡Inference & Infrastructure(9)

🔒Security & Reliability(1)

🔬Research & Benchmarks(2)

📊Business & Policy(1)

📌Other(1)

Skills Spotlighttop by stars

View all skills

🎨 Design

p5js

p5.js sketches: gen art, shaders, interactive, 3D.

by NousResearch · 1 month ago204.7k

🎨 Design

pretext

Use when building creative browser demos with @chenglou/pretext — DOM-free text layout for ASCII art, typographic flow around obstacles, text-as-geometry games, kinetic typography, and text-powered generative art. Produces single-file HTML demos by default.

by NousResearch · 1 month ago204.7k

✍️ Writing

New

creative-ideation

Generate ideas via named methods from creative practice.

by NousResearch · 3 days ago204.1k

Top storiesthis week

See all →

Breaking

Epoch releases MirrorCode with 25 long-horizon SWE tasks and a 56% score

Epoch introduced MirrorCode, a benchmark where models reimplement real programs from specs with no internet and hidden held-out tests; the best current score is 56%. The setup matters because it scales inference into multi-day runs and targets software jobs estimated to take humans weeks.

New

Benchmarks·26th June·5 min read

New

DeepSeek releases DeepSpec and DSpark for speculative decoding on V4 checkpoints

DeepSeek open-sourced DeepSpec, a codebase for training and evaluating draft models for speculative decoding, alongside the DSpark decoding module for V4 checkpoints. It matters because inference teams get a new open stack for improving draft-model quality and decode throughput beyond earlier MTP-style baselines.

ReleaseLLM Serving26th June

New

Codex fixes quota drain tied to fraud overflagging with an account-wide usage reset

OpenAI said Codex accounts were seeing faster usage draining than intended because abuse and fraud checks were overflagging some sessions, then issued a usage reset for all users. It matters because paid Codex workflows were losing quota unexpectedly mid-run, directly affecting reliability and cost.

Codex26th June

New

Hermes Agent introduces Mixture of Agents 2.0 as virtual models across providers

Hermes Agent launched Mixture of Agents 2.0, letting users combine models from different providers into presets that behave like a normal model inside the agent loop. It matters because multi-model orchestration becomes a reusable runtime primitive instead of a custom routing workflow.

ReleaseHermes Agent26th June

New

Perceptron adds video_frames to Mk1 and cuts 1080p time-to-first-token from ~42s to ~4s

Perceptron launched a video_frames input for Mk1 that accepts pre-decoded frames with timestamps instead of forcing clip re-encoding. The change matters for edge and sparse-footage pipelines because 10 minutes of 1080p video can start returning tokens roughly ten times faster.

ReleaseMultimodal26th June

New

Google AI Studio adds Design Variations for one-click UI layout proposals

Google AI Studio shipped Design Variations, which generates multiple UI directions from an existing build and lets users apply one directly. It matters because builders can branch app presentation without rewriting aesthetic prompts or manually rebuilding layouts.

ReleaseDX Tooling26th June

New

Next.js 16.3 Preview adds AGENTS.md, agent-browser, and next-dev-loop Skills

Next.js previewed an agent-focused toolchain with auto-managed AGENTS.md, browser-backed verification, and Skills for cache-component migration and optimization. The release matters because framework guidance, browser introspection, and fix prompts are now packaged directly for coding agents.

ReleaseDX Tooling26th June

New

OpenAI reports Codex drives 99.8% of internal AI output tokens

OpenAI published usage data showing Codex now generates 99.8% of its internal AI output tokens, with sharp growth in legal, support, recruiting, and finance. The report measures agent adoption as delegated parallel work, not just chat inside engineering.

Codex25th June

New

Report: GPT-5.6 Preview opens customer-by-customer during federal review

The Information reported that OpenAI is holding GPT-5.6 to a limited preview with customer-by-customer approvals during review. That would restrict who can benchmark or integrate the model until a broader rollout clears.

Regulation25th June

New

OpenRouter launches MCP server with live pricing, benchmarks, and test inference

OpenRouter released an MCP server that lets agents query live model pricing, benchmark scores, provider data, docs, and run test inference from the CLI. That replaces stale model knowledge with current routing data inside long-running agent workflows.

ReleaseMCP25th June

See all stories →

New

Epoch releases MirrorCode with 25 long-horizon SWE tasks and a 56% score

ReleaseBenchmarksEvals26th June · 5 min read

DeepSeek releases DeepSpec and DSpark for speculative decoding on V4 checkpoints

ReleaseLLM Serving26th June

Codex fixes quota drain tied to fraud overflagging with an account-wide usage reset

Codex26th June

Hermes Agent introduces Mixture of Agents 2.0 as virtual models across providers

ReleaseHermes Agent26th June

Perceptron adds video_frames to Mk1 and cuts 1080p time-to-first-token from ~42s to ~4s

ReleaseMultimodal26th June

Google AI Studio adds Design Variations for one-click UI layout proposals

ReleaseDX Tooling26th June

Next.js 16.3 Preview adds AGENTS.md, agent-browser, and next-dev-loop Skills

ReleaseDX Tooling26th June

OpenAI reports Codex drives 99.8% of internal AI output tokens

Codex25th June

Report: GPT-5.6 Preview opens customer-by-customer during federal review

Regulation25th June

OpenRouter launches MCP server with live pricing, benchmarks, and test inference

ReleaseMCP25th June

Explore what's new in AI

Filters

Fresh stories

Fable 5 opens next week pending Pentagon and NSA sign-off, Axios reports

DeepSeek V4-Pro benchmarks at ~90 tok/s after DSpark rollout

Codex supports thread automations with /goal, /btw, and heartbeat wake-ups

OpenRouter reports four open-weight models handle agents; Chinese models hit 45% of traffic

Sakana Fugu Ultra opens on Vercel AI Gateway

Junior adds memory and cuts one analytics task from 3m to 1m

GLM-5.2 ranks 30/99 on PrinzBench as testers report legal hallucinations

OpenCode v2 introduces one backend for TUI, desktop, and web sessions

Datalab ranks 95.9% on a 225-document extraction benchmark at under half Reducto's price

Codex adds hover navigation rail and longer thread history in desktop update

DeepSeek V4-Pro benchmarks at ~90 tok/s after DSpark rollout

Codex supports thread automations with /goal, /btw, and heartbeat wake-ups

OpenRouter reports four open-weight models handle agents; Chinese models hit 45% of traffic

Sakana Fugu Ultra opens on Vercel AI Gateway

Junior adds memory and cuts one analytics task from 3m to 1m

Fable 5 opens next week pending Pentagon and NSA sign-off, Axios reports

GLM-5.2 ranks 30/99 on PrinzBench as testers report legal hallucinations

OpenCode v2 introduces one backend for TUI, desktop, and web sessions

Datalab ranks 95.9% on a 225-document extraction benchmark at under half Reducto's price

Codex adds hover navigation rail and longer thread history in desktop update

Skills Spotlighttop by stars

p5js

pretext

creative-ideation

Top storiesthis week

Epoch releases MirrorCode with 25 long-horizon SWE tasks and a 56% score

DeepSeek releases DeepSpec and DSpark for speculative decoding on V4 checkpoints

Codex fixes quota drain tied to fraud overflagging with an account-wide usage reset

Hermes Agent introduces Mixture of Agents 2.0 as virtual models across providers

Perceptron adds video_frames to Mk1 and cuts 1080p time-to-first-token from ~42s to ~4s

Google AI Studio adds Design Variations for one-click UI layout proposals

Next.js 16.3 Preview adds AGENTS.md, agent-browser, and next-dev-loop Skills

OpenAI reports Codex drives 99.8% of internal AI output tokens

Report: GPT-5.6 Preview opens customer-by-customer during federal review

OpenRouter launches MCP server with live pricing, benchmarks, and test inference

Epoch releases MirrorCode with 25 long-horizon SWE tasks and a 56% score

DeepSeek releases DeepSpec and DSpark for speculative decoding on V4 checkpoints

Codex fixes quota drain tied to fraud overflagging with an account-wide usage reset

Hermes Agent introduces Mixture of Agents 2.0 as virtual models across providers

Perceptron adds video_frames to Mk1 and cuts 1080p time-to-first-token from ~42s to ~4s

Google AI Studio adds Design Variations for one-click UI layout proposals

Next.js 16.3 Preview adds AGENTS.md, agent-browser, and next-dev-loop Skills

OpenAI reports Codex drives 99.8% of internal AI output tokens

Report: GPT-5.6 Preview opens customer-by-customer during federal review

OpenRouter launches MCP server with live pricing, benchmarks, and test inference