All Stories

353 stories

Sort:

Time:

2nd July

1st July

Fable 5 users report Opus 4.8 fallbacks, refusals, and $321 sessions

Fable 5 users report Opus 4.8 fallbacks, refusals, and $321 sessions

DX Cost1st July

Claude Sonnet 5 ranks #3 on Vals and hits 183 turns on AA-Briefcase

Claude Sonnet 5 ranks #3 on Vals and hits 183 turns on AA-Briefcase

Benchmarks1st July

Devin launches Security Swarm with Agentic MapReduce and 36/50 GHSA hits

Devin launches Security Swarm with Agentic MapReduce and 36/50 GHSA hits

ReleaseSecurity1st July

GLM 5.2 supports Amp, dcode, and Next.js workflows after Composio tops 41 tool tasks

GLM 5.2 supports Amp, dcode, and Next.js workflows after Composio tops 41 tool tasks

Z.ai launches ZCode with GLM-5.2, BYOK, and 1.5x Coding Plan quota

Z.ai launches ZCode with GLM-5.2, BYOK, and 1.5x Coding Plan quota

ReleaseGLM1st July

Firecrawl launches /monitor for whole-web tracking across API, CLI, and MCP

Firecrawl launches /monitor for whole-web tracking across API, CLI, and MCP

ReleaseSearch1st July

xAI launches Voice Agent Builder with $0.05/min pricing and SIP routing

xAI launches Voice Agent Builder with $0.05/min pricing and SIP routing

ReleaseVoice Agents1st July

Letta Agent launches persistent digital coworkers with Slack, Discord, and BYOK state

Letta Agent launches persistent digital coworkers with Slack, Discord, and BYOK state

ReleaseAgent Product Launch1st July

Claude Code 2.1.198 adds background agents, Chrome sessions, and eval CLI

Claude Code 2.1.198 adds background agents, Chrome sessions, and eval CLI

ReleaseClaude Code1st July

Ramp introduces PorTAL with half-cost LoRA porting across Qwen and Gemma models

Ramp introduces PorTAL with half-cost LoRA porting across Qwen and Gemma models

Benchmarks1st July

30th June

US Commerce removes Fable 5 export controls; Anthropic restores access July 1

US Commerce removes Fable 5 export controls; Anthropic restores access July 1

Claude Code30th June

Anthropic launches Claude Sonnet 5 with 1M context and adaptive thinking

Anthropic launches Claude Sonnet 5 with 1M context and adaptive thinking

ReleaseClaude Code30th June

Vercel adds Dockerfile Functions and Services with VCR registry

Vercel adds Dockerfile Functions and Services with VCR registry

ReleaseDX Tooling30th June

The Information reports OpenAI cuts inference costs by more than 50% on some models

The Information reports OpenAI cuts inference costs by more than 50% on some models

Inference Optimization30th June

Google releases Nano Banana 2 Lite and Gemini Omni Flash

Google releases Nano Banana 2 Lite and Gemini Omni Flash

ReleaseGemini30th June

Hermes Agent updates web extraction with 60x faster reads and 49x lower cost

Hermes Agent updates web extraction with 60x faster reads and 49x lower cost

ReleaseHermes Agent30th June

ElevenAgents introduces Procedures with SOP imports from docs, PDFs, and TXT

ElevenAgents introduces Procedures with SOP imports from docs, PDFs, and TXT

ReleaseVoice Agents30th June

OpenAI introduces GeneBench-Pro with GPT-5.6 Sol Pro at 31.5%

OpenAI introduces GeneBench-Pro with GPT-5.6 Sol Pro at 31.5%

Benchmarks30th June

Apify integrates x402 with 20,000 Actors for USDC-paid runs

Apify integrates x402 with 20,000 Actors for USDC-paid runs

Agent Infrastructure30th June

Anthropic launches Claude Science beta with 60+ databases and Modal compute

Anthropic launches Claude Science beta with 60+ databases and Modal compute

ReleaseDeep Research30th June

Anthropic removes Claude Code ANTHROPIC_BASE_URL prompt marking after proxy reports

Anthropic removes Claude Code ANTHROPIC_BASE_URL prompt marking after proxy reports

Claude Code30th June

Claude Desktop opens Linux beta for Ubuntu and Debian with Code and Cowork

Claude Desktop opens Linux beta for Ubuntu and Debian with Code and Cowork

ReleaseClaude Code30th June

29th June

Cognition launches Devin Fusion with mid-session routing and 35% lower Fable-class cost

Cognition launches Devin Fusion with mid-session routing and 35% lower Fable-class cost

ReleaseAgent Product Launch29th June

Meituan releases LongCat 2.0: 1.6T MoE on domestic chips

Meituan releases LongCat 2.0: 1.6T MoE on domestic chips

ReleaseGPU Infrastructure29th June

Vercel adds useRealtime, generateSpeech, and transcribe to AI Gateway

Vercel adds useRealtime, generateSpeech, and transcribe to AI Gateway

ReleaseVoice Agents29th June

Next.js 16.3 Preview cuts Turbopack memory up to 90% and warms builds 5.5x

Next.js 16.3 Preview cuts Turbopack memory up to 90% and warms builds 5.5x

ReleaseDX Tooling29th June

Claude Code 2.1.196 adds org default model and pending approval for repo-local MCP

Claude Code 2.1.196 adds org default model and pending approval for repo-local MCP

ReleaseClaude Code29th June

Vercel raises Functions package limit to 5 GB on Fluid compute

Vercel raises Functions package limit to 5 GB on Fluid compute

DX Tooling29th June

Codex fixes usage overcounting with one extra banked reset and auto-review rollback

Codex fixes usage overcounting with one extra banked reset and auto-review rollback

Snowflake releases Arctic RL with ZoRRo: Text2SQL-R2 training drops to ~36 hours

Snowflake releases Arctic RL with ZoRRo: Text2SQL-R2 training drops to ~36 hours

ReleaseReinforcement Learning29th June

28th June

Codex users report /goal, /rewind, and /compact workflows after launch

Codex users report /goal, /rewind, and /compact workflows after launch

WorkflowCodex28th June

DeepSeek releases DSpark checkpoints for Qwen3 and Gemma-4

DeepSeek releases DSpark checkpoints for Qwen3 and Gemma-4

ReleaseQwen28th June

Codex resets all usage limits as OpenAI investigates weekend drain reports

Codex resets all usage limits as OpenAI investigates weekend drain reports

Google limits Meta's Gemini use after capacity shortages

Google limits Meta's Gemini use after capacity shortages

Gemini28th June

xAI tests Grok 4.5 private beta on a 1.5T V9 model with Cursor data

xAI tests Grok 4.5 private beta on a 1.5T V9 model with Cursor data

Benchmarks28th June

Plannotator v0.21.3 adds file-scoped review comments and Codex app-server support

Plannotator v0.21.3 adds file-scoped review comments and Codex app-server support

ReleaseCodex28th June

Microsoft opens SkillOpt with batch eval loops for agent SOP files

Microsoft opens SkillOpt with batch eval loops for agent SOP files

ReleaseDX Tooling28th June

27th June

DeepSeek V4-Pro benchmarks at ~90 tok/s after DSpark rollout

DeepSeek V4-Pro benchmarks at ~90 tok/s after DSpark rollout

ReleaseInference Optimization27th June

OpenRouter reports four open-weight models handle agents; Chinese models hit 45% of traffic

OpenRouter reports four open-weight models handle agents; Chinese models hit 45% of traffic

Model Routing27th June

Codex supports thread automations with /goal, /btw, and heartbeat wake-ups

Codex supports thread automations with /goal, /btw, and heartbeat wake-ups

WorkflowCodex27th June

Fable 5 opens next week pending Pentagon and NSA sign-off, Axios reports

Fable 5 opens next week pending Pentagon and NSA sign-off, Axios reports

Regulation27th June

Sakana Fugu Ultra opens on Vercel AI Gateway

Sakana Fugu Ultra opens on Vercel AI Gateway

ReleaseOrchestration27th June

GLM-5.2 ranks 30/99 on PrinzBench as testers report legal hallucinations

GLM-5.2 ranks 30/99 on PrinzBench as testers report legal hallucinations

Junior adds memory and cuts one analytics task from 3m to 1m

Junior adds memory and cuts one analytics task from 3m to 1m

ReleaseContext Engineering27th June

Datalab ranks 95.9% on a 225-document extraction benchmark at under half Reducto's price

Datalab ranks 95.9% on a 225-document extraction benchmark at under half Reducto's price

ReleaseBenchmarks27th June

OpenCode v2 introduces one backend for TUI, desktop, and web sessions

OpenCode v2 introduces one backend for TUI, desktop, and web sessions

ReleaseOpenCode27th June

Codex adds hover navigation rail and longer thread history in desktop update

Codex adds hover navigation rail and longer thread history in desktop update

ReleaseCodex27th June

26th June

Epoch releases MirrorCode with 25 long-horizon SWE tasks and a 56% score

Epoch releases MirrorCode with 25 long-horizon SWE tasks and a 56% score

ReleaseBenchmarks26th June

Chandra reports Mistral OCR 4 scores are not reproducible and publishes repro scripts

Chandra reports Mistral OCR 4 scores are not reproducible and publishes repro scripts

Mistral26th June

DeepSeek releases DeepSpec and DSpark for speculative decoding on V4 checkpoints

DeepSeek releases DeepSpec and DSpark for speculative decoding on V4 checkpoints

ReleaseLLM Serving26th June

Codex fixes quota drain tied to fraud overflagging with an account-wide usage reset

Codex fixes quota drain tied to fraud overflagging with an account-wide usage reset

Google AI Studio adds Design Variations for one-click UI layout proposals

Google AI Studio adds Design Variations for one-click UI layout proposals

ReleaseDX Tooling26th June

Vercel AI SDK Harness API adds OpenCode and Deep Agents in one interface

Vercel AI SDK Harness API adds OpenCode and Deep Agents in one interface

ReleaseDX Tooling26th June

Perceptron adds video_frames to Mk1 and cuts 1080p time-to-first-token from ~42s to ~4s

Perceptron adds video_frames to Mk1 and cuts 1080p time-to-first-token from ~42s to ~4s

ReleaseMultimodal26th June

Hermes Agent introduces Mixture of Agents 2.0 as virtual models across providers

Hermes Agent introduces Mixture of Agents 2.0 as virtual models across providers

ReleaseHermes Agent26th June

Next.js 16.3 Preview adds AGENTS.md, agent-browser, and next-dev-loop Skills

Next.js 16.3 Preview adds AGENTS.md, agent-browser, and next-dev-loop Skills

ReleaseDX Tooling26th June