All Stories

367 stories

Sort:

Time:

27th June

DeepSeek V4-Pro benchmarks at ~90 tok/s after DSpark rollout

DeepSeek V4-Pro benchmarks at ~90 tok/s after DSpark rollout

ReleaseInference Optimization27th June

OpenRouter reports four open-weight models handle agents; Chinese models hit 45% of traffic

OpenRouter reports four open-weight models handle agents; Chinese models hit 45% of traffic

Model Routing27th June

Codex supports thread automations with /goal, /btw, and heartbeat wake-ups

Codex supports thread automations with /goal, /btw, and heartbeat wake-ups

WorkflowCodex27th June

Fable 5 opens next week pending Pentagon and NSA sign-off, Axios reports

Fable 5 opens next week pending Pentagon and NSA sign-off, Axios reports

Regulation27th June

Sakana Fugu Ultra opens on Vercel AI Gateway

Sakana Fugu Ultra opens on Vercel AI Gateway

ReleaseOrchestration27th June

GLM-5.2 ranks 30/99 on PrinzBench as testers report legal hallucinations

GLM-5.2 ranks 30/99 on PrinzBench as testers report legal hallucinations

Junior adds memory and cuts one analytics task from 3m to 1m

Junior adds memory and cuts one analytics task from 3m to 1m

ReleaseContext Engineering27th June

Datalab ranks 95.9% on a 225-document extraction benchmark at under half Reducto's price

Datalab ranks 95.9% on a 225-document extraction benchmark at under half Reducto's price

ReleaseBenchmarks27th June

OpenCode v2 introduces one backend for TUI, desktop, and web sessions

OpenCode v2 introduces one backend for TUI, desktop, and web sessions

ReleaseOpenCode27th June

Codex adds hover navigation rail and longer thread history in desktop update

Codex adds hover navigation rail and longer thread history in desktop update

ReleaseCodex27th June

26th June

Epoch releases MirrorCode with 25 long-horizon SWE tasks and a 56% score

Epoch releases MirrorCode with 25 long-horizon SWE tasks and a 56% score

ReleaseBenchmarks26th June

Chandra reports Mistral OCR 4 scores are not reproducible and publishes repro scripts

Chandra reports Mistral OCR 4 scores are not reproducible and publishes repro scripts

Mistral26th June

DeepSeek releases DeepSpec and DSpark for speculative decoding on V4 checkpoints

DeepSeek releases DeepSpec and DSpark for speculative decoding on V4 checkpoints

ReleaseLLM Serving26th June

Codex fixes quota drain tied to fraud overflagging with an account-wide usage reset

Codex fixes quota drain tied to fraud overflagging with an account-wide usage reset

Google AI Studio adds Design Variations for one-click UI layout proposals

Google AI Studio adds Design Variations for one-click UI layout proposals

ReleaseDX Tooling26th June

Vercel AI SDK Harness API adds OpenCode and Deep Agents in one interface

Vercel AI SDK Harness API adds OpenCode and Deep Agents in one interface

ReleaseDX Tooling26th June

Perceptron adds video_frames to Mk1 and cuts 1080p time-to-first-token from ~42s to ~4s

Perceptron adds video_frames to Mk1 and cuts 1080p time-to-first-token from ~42s to ~4s

ReleaseMultimodal26th June

Hermes Agent introduces Mixture of Agents 2.0 as virtual models across providers

Hermes Agent introduces Mixture of Agents 2.0 as virtual models across providers

ReleaseHermes Agent26th June

Next.js 16.3 Preview adds AGENTS.md, agent-browser, and next-dev-loop Skills

Next.js 16.3 Preview adds AGENTS.md, agent-browser, and next-dev-loop Skills

ReleaseDX Tooling26th June

25th June

Report: GPT-5.6 Preview opens customer-by-customer during federal review

Report: GPT-5.6 Preview opens customer-by-customer during federal review

Regulation25th June

OpenAI reports Codex drives 99.8% of internal AI output tokens

OpenAI reports Codex drives 99.8% of internal AI output tokens

Google opens Gemini 3.5 Flash Computer Use in Gemini API with explicit confirmations

Google opens Gemini 3.5 Flash Computer Use in Gemini API with explicit confirmations

ReleaseGemini25th June

Cursor reports SWE-bench Pro benchmark hacking; Opus 4.8 drops 87.1%→73.0% under stricter harness

Cursor reports SWE-bench Pro benchmark hacking; Opus 4.8 drops 87.1%→73.0% under stricter harness

Cursor25th June

DeepReinforce releases Ornith-1.0 397B MoE with 82.4 SWE-Bench Verified

DeepReinforce releases Ornith-1.0 397B MoE with 82.4 SWE-Bench Verified

ReleaseCoding Agents25th June

OpenRouter launches MCP server with live pricing, benchmarks, and test inference

OpenRouter launches MCP server with live pricing, benchmarks, and test inference

ReleaseMCP25th June

Anthropic reports Claude Fable 5 sightings were a UI bug; traffic stayed at zero

Anthropic reports Claude Fable 5 sightings were a UI bug; traffic stayed at zero

Claude Code25th June

Rivet releases agentOS v0.2.0 with WebAssembly sandboxing and 1738x cheaper claim

Rivet releases agentOS v0.2.0 with WebAssembly sandboxing and 1738x cheaper claim

ReleaseAgent Infrastructure25th June

Seedance 2.0 Mini launches on Venice, ComfyUI, and Pika MCP with 15s 720p video

Seedance 2.0 Mini launches on Venice, ComfyUI, and Pika MCP with 15s 720p video

ReleaseMultimodal25th June

v0 releases Design Systems 2.0 with GitHub, npm, Storybook, and Figma imports

v0 releases Design Systems 2.0 with GitHub, npm, Storybook, and Figma imports

ReleaseDX Tooling25th June

Claude Code 2.1.193 adds live path autocomplete and OTEL response logs

Claude Code 2.1.193 adds live path autocomplete and OTEL response logs

ReleaseClaude Code25th June

Vercel releases AI SDK 7 with approvals, durability, and telemetry

Vercel releases AI SDK 7 with approvals, durability, and telemetry

ReleaseDX Tooling25th June

24th June

Gemini 3.5 Flash adds Computer Use with 78.4 OSWorld score

Gemini 3.5 Flash adds Computer Use with 78.4 OSWorld score

ReleaseGemini24th June

Claude Tag users report token billing and shared-memory concerns

Claude Tag users report token billing and shared-memory concerns

Agent Identity24th June

Seedance 2.0 adds native 4K as fal, Replicate, Pika MCP, and ComfyUI ship support

Seedance 2.0 adds native 4K as fal, Replicate, Pika MCP, and ComfyUI ship support

Multimodal24th June

Baidu releases Unlimited OCR with 3B params for single-pass long documents

Baidu releases Unlimited OCR with 3B params for single-pass long documents

ReleaseMultimodal24th June

Claude Code 2.1.191 adds /rewind and cuts CPU use 37%

Claude Code 2.1.191 adds /rewind and cuts CPU use 37%

ReleaseClaude Code24th June

Zed v1.8 adds agent.terminal_init_command and faster Git operations

Zed v1.8 adds agent.terminal_init_command and faster Git operations

ReleaseDX Tooling24th June

Genspark launches Design with Figma imports and one-click code

Genspark launches Design with Figma imports and one-click code

ReleaseProductivity24th June

Amazon Bedrock adds Fable 5 to runtime after June removal

Amazon Bedrock adds Fable 5 to runtime after June removal

Regulation24th June

Vercel AI Gateway adds GLM-5.2 Fast at 150-250 tok/s

Vercel AI Gateway adds GLM-5.2 Fast at 150-250 tok/s

ReleaseGLM24th June

OpenRouter launches Image API with typed capabilities and exact USD cost

OpenRouter launches Image API with typed capabilities and exact USD cost

ReleaseModel Routing24th June

23rd June

Anthropic launches Claude Tag in Slack beta with channel memory

Anthropic launches Claude Tag in Slack beta with channel memory

ReleaseClaude Code23rd June

Mistral releases OCR 4 with bounding boxes and 85.20 OlmOCRBench

Mistral releases OCR 4 with bounding boxes and 85.20 OlmOCRBench

ReleaseMistral23rd June

AssemblyAI launches Universal-3.5 Pro Realtime with Context Carryover

AssemblyAI launches Universal-3.5 Pro Realtime with Context Carryover

ReleaseVoice Agents23rd June

Kilo Code launches Auto Efficient routing with KiloBench model selection

Kilo Code launches Auto Efficient routing with KiloBench model selection

ReleaseModel Routing23rd June

Perceptron releases Files API with reusable upload IDs

Perceptron releases Files API with reusable upload IDs

ReleaseMultimodal23rd June

Latitude launches MIT-licensed agent monitoring with Signals clustering and MCP access

Latitude launches MIT-licensed agent monitoring with Signals clustering and MCP access

ReleaseObservability23rd June

Claude Code 2.1.187 adds sandbox.credentials and 5-minute MCP aborts

Claude Code 2.1.187 adds sandbox.credentials and 5-minute MCP aborts

ReleaseClaude Code23rd June

22nd June

GLM-5.2 adds Perplexity Agent API and Droid support on Baseten at >280 TPS

GLM-5.2 adds Perplexity Agent API and Droid support on Baseten at >280 TPS

Fugu Ultra testers report 30-minute runs and 17x GLM cost after launch

Fugu Ultra testers report 30-minute runs and 17x GLM cost after launch

Agent Infrastructure22nd June

Google ships Interactions API in GA as Gemini default with background agents

Google ships Interactions API in GA as Gemini default with background agents

ReleaseGemini22nd June

Vercel supports Claude Design one-click deploys

Vercel supports Claude Design one-click deploys

DX Tooling22nd June

Vals AI releases SkillsBench with a 17-point coding-agent gain and MiniMax-M3 at +25.4

Vals AI releases SkillsBench with a 17-point coding-agent gain and MiniMax-M3 at +25.4

Hermes Agent adds Windows and Linux GUI computer use via TryCua

Hermes Agent adds Windows and Linux GUI computer use via TryCua

ReleaseHermes Agent22nd June

Files SDK 2.0 adds files-sdk/api gateway and React, Vue, Svelte clients

Files SDK 2.0 adds files-sdk/api gateway and React, Vue, Svelte clients

Releaseshadcn/ui22nd June

Claude Code 2.1.186 adds claude mcp login and auto-replies after ! shell commands

Claude Code 2.1.186 adds claude mcp login and auto-replies after ! shell commands

ReleaseClaude Code22nd June

Vercel supports WebSockets in Fluid with Socket.IO and 30-minute reconnects

Vercel supports WebSockets in Fluid with Socket.IO and 30-minute reconnects

ReleaseDX Tooling22nd June

21st June

Sakana Fugu launches one-API orchestration with Fable benchmark claims

Sakana Fugu launches one-API orchestration with Fable benchmark claims

ReleaseAgent Product Launch21st June

Human-on-the-Bridge compares reusable eval assets with LLM judges and human review

Human-on-the-Bridge compares reusable eval assets with LLM judges and human review

WorkflowEvals21st June

sqlite-utils 4.0rc1 adds migrations and nested transactions

sqlite-utils 4.0rc1 adds migrations and nested transactions

ReleaseDX Tooling21st June

Hermes Agent adds self-hosted Mem0 and headless desktop connections

Hermes Agent adds self-hosted Mem0 and headless desktop connections

ReleaseHermes Agent21st June

Morph supports Qwen, GLM-5.2, MiniMax M3, DeepSeek v4 with 20-35% higher code acceptance

Morph supports Qwen, GLM-5.2, MiniMax M3, DeepSeek v4 with 20-35% higher code acceptance

ReleaseLLM Serving21st June