Fresh stories
DeepSeek V4-Pro benchmarks at ~90 tok/s after DSpark rollout
Independent measurements after DSpark put DeepSeek V4-Pro around 90 tok/s and cut one run from 214s to 116s. The gain matters because it lowers serving cost, though tuning details and memory overhead are still unclear.

Sakana Fugu Ultra opens on Vercel AI Gateway
Sakana made Fugu Ultra available through Vercel AI Gateway, while new technical writeups described the trained routing head and multi-step orchestration behind it. The integration matters because teams can invoke Fugu’s model-selection workflow through existing gateway plumbing instead of standing up custom routing.

OpenCode v2 introduces one backend for TUI, desktop, and web sessions
OpenCode v2 moves its TUI, desktop, and web clients onto a shared backend so sessions stay synced and resource use drops across windows. The beta matters for multi-window agent workflows, though the next build still lacks features.


DeepSeek V4-Pro benchmarks at ~90 tok/s after DSpark rollout
Independent measurements after DSpark put DeepSeek V4-Pro around 90 tok/s and cut one run from 214s to 116s. The gain matters because it lowers serving cost, though tuning details and memory overhead are still unclear.

Codex supports thread automations with /goal, /btw, and heartbeat wake-ups
Codex users documented thread automations as recurring wake-up calls that preserve thread context, alongside /goal and /btw patterns for steering long-running loops. The workflow matters because teams can schedule check-ins, queue instructions mid-run, and add adversarial review passes without building a separate orchestrator.

OpenRouter reports four open-weight models handle agents; Chinese models hit 45% of traffic
OpenRouter said four open-weight models now handle real agentic workloads, and a JPMorgan report put Chinese models at about 45% of platform traffic. The shift matters because teams are optimizing for price, hosting, and task fit instead of defaulting to frontier APIs.

Sakana Fugu Ultra opens on Vercel AI Gateway
Sakana made Fugu Ultra available through Vercel AI Gateway, while new technical writeups described the trained routing head and multi-step orchestration behind it. The integration matters because teams can invoke Fugu’s model-selection workflow through existing gateway plumbing instead of standing up custom routing.
Junior adds memory and cuts one analytics task from 3m to 1m
Fable 5 opens next week pending Pentagon and NSA sign-off, Axios reports
GLM-5.2 ranks 30/99 on PrinzBench as testers report legal hallucinations
OpenCode v2 introduces one backend for TUI, desktop, and web sessions
Top storiesthis week
Epoch releases MirrorCode with 25 long-horizon SWE tasks and a 56% score
Epoch introduced MirrorCode, a benchmark where models reimplement real programs from specs with no internet and hidden held-out tests; the best current score is 56%. The setup matters because it scales inference into multi-day runs and targets software jobs estimated to take humans weeks.


DeepSeek releases DeepSpec and DSpark for speculative decoding on V4 checkpoints
DeepSeek open-sourced DeepSpec, a codebase for training and evaluating draft models for speculative decoding, alongside the DSpark decoding module for V4 checkpoints. It matters because inference teams get a new open stack for improving draft-model quality and decode throughput beyond earlier MTP-style baselines.

Codex fixes quota drain tied to fraud overflagging with an account-wide usage reset
OpenAI said Codex accounts were seeing faster usage draining than intended because abuse and fraud checks were overflagging some sessions, then issued a usage reset for all users. It matters because paid Codex workflows were losing quota unexpectedly mid-run, directly affecting reliability and cost.

Hermes Agent introduces Mixture of Agents 2.0 as virtual models across providers
Hermes Agent launched Mixture of Agents 2.0, letting users combine models from different providers into presets that behave like a normal model inside the agent loop. It matters because multi-model orchestration becomes a reusable runtime primitive instead of a custom routing workflow.

Perceptron adds video_frames to Mk1 and cuts 1080p time-to-first-token from ~42s to ~4s
Perceptron launched a video_frames input for Mk1 that accepts pre-decoded frames with timestamps instead of forcing clip re-encoding. The change matters for edge and sparse-footage pipelines because 10 minutes of 1080p video can start returning tokens roughly ten times faster.









