Skip to content
AI Primer
TOPIC26 stories

DX Reliability

Stories about uptime, regressions, debugging behaviour of AI tools as experienced by engineers (model degradation, IDE crashes, tooling outages). Overlaps with reliability — apply both when relevant.

WORKFLOW17th April
Codex supports hidden-app control on macOS as users report 38-hour computer-use sessions

Fresh hands-on reports show Codex controlling minimized apps via macOS APIs, using a DOM-aware browser comment mode, and running for day-long sessions in the desktop app. That gives OpenAI stronger evidence that computer use is usable for daily development, though the rollout remains macOS-first and brittle around working-state changes.

NEWS15th April
Claude Code users report 5-minute cache TTL and quota-meter regressions after March updates

GitHub issues and Hacker News threads added fresh evidence that Claude Code sessions still burn quota unexpectedly after the cache TTL change, with some users seeing usage before a prompt is sent and others recovering capacity by rolling back to 2.1.34. Watch cache reuse and metering behavior closely if you rely on long-running sessions.

RELEASE13th April
Cursor updates Cursor 3 with split agents and 87% fewer dropped frames

Cursor 3 adds split-agent panes, tighter cloud-agent controls, voice input fixes, and an 87% reduction in dropped frames during large edits. The update makes the IDE easier to use as a mixed local-cloud agent workspace, while keeping editor navigation and diff review intact.

RELEASE12th April
Hermes Agent adds /debug log sharing and automatic OpenClaw import

Hermes Agent shipped automatic OpenClaw migration, pastebin log sharing, and a reported 20% improvement in loading the right skill. Use the new import path and debug sharing to simplify setup across the official and community add-ons now covering support, web UI, workspace boards, and chat front ends.

NEWS1w ago
OpenAI rotates macOS app certificates after Axios signing workflow risk

OpenAI said a compromised third-party developer tool affected its macOS app-signing workflow and is rotating certificates for ChatGPT Desktop, the Codex app, Codex CLI, and Atlas. The company said it found no evidence of user-data access or software tampering, and older macOS app versions will stop working after the update window.

NEWS1w ago
GLM-5.1 lands on Modal, Together AI, Letta Code, and Tembo

Providers and agent platforms added GLM-5.1 endpoints across Modal, Together AI, Letta Code, Tembo, and Tabbit, with free trials, no-key access, and 99.9% SLA options. Use the new hosting options to test the model for coding and long-horizon agent workloads without waiting on self-hosting.

NEWS1w ago
GitHub disables Copilot PR tips after reports of 11,400 edited pull requests

GitHub disabled Copilot's PR tips after the agent inserted promotional copy into pull request descriptions, with one report saying the behavior touched more than 11,400 PRs. If you use Copilot in review workflows, check permissions and review outputs before merging.

NEWS1w ago
GitHub issue reports Claude Code regressions after Feb update, citing 6,852 sessions

A closed GitHub issue says Claude Code became unreliable for complex engineering after February changes, citing 17,871 thinking blocks and 234,760 tool calls across 6,852 sessions. Anthropic said the redaction flag was UI-only, but developers reported broader Opus quality drops and opaque harness changes.

RELEASE2w ago
Claude Code 2.1.90 adds NO_FLICKER fullscreen renderer

Claude Code 2.1.90 adds an experimental NO_FLICKER fullscreen renderer with mouse support and virtualized scrolling. The release also fixes rate-limit loops and resume regressions, so update if you want the new UI while watching for selection and table-rendering bugs.

NEWS2w ago
OpenAI resets Codex usage limits across all plans after a rate-limit spike

OpenAI reset Codex usage limits across all plans after dashboards showed more users hitting caps and the team said it still did not fully understand the trigger. Use the reset to recheck capacity assumptions, since OpenAI also said it banned abuse accounts and March’s repeated resets point to a broader capacity issue.

RELEASE2w ago
Claude Code fixes prompt-cache bugs in 2.1.88 after quota-burn reports

Claude Code 2.1.88 added fixes for prompt-cache misses, repeated CLAUDE.md reinjection, and a multi-schema StructuredOutput bug after widespread reports of unexpectedly fast quota consumption. Update if you rely on long sessions, because uncached runs can burn through paid limits much faster than intended.

NEWS2w ago
Claude Code limits concurrent work as users report weeklong waits and missing desktop threads

Users report stricter Claude Code request caps, weeklong cooldowns, and desktop threads disappearing after restarts. Watch quotas closely and shift to lighter models or token-cutting workflows around /context and /clear if the limits hit your workflow.

RELEASE3w ago
Hermes Agent ships v0.5.0 with 400+ Portal models and Exa support

Hermes Agent v0.5.0 adds 400+ models via Nous Portal, Hugging Face access, Exa support, GPT-5.4 behavior tweaks, and a published changelog. The release broadens provider coverage and hardens the runtime without changing the terminal-first workflow.

RELEASE3w ago
Composio launches Universal CLI for terminal-native tool access

Composio shipped Universal CLI as a shell-first interface to its integrations, moving install, search, and agent workflows out of MCP setup. The release targets users who want simpler agent tool access after complaints that MCP stacks are harder to install, slower, and less stable.

RELEASE3w ago
Claude Code 2.1.85 releases with conditional hooks and /compact overflow fix

Claude Code 2.1.85 adds hook if filters, new MCP header env vars, transcript timestamps, and fixes for /compact overflow, remote leaks, auth flow, and terminal bugs. Upgrade if your workflow depends on hooks or long sessions, and use the new cloud auto-fix flow for unattended PR cleanup.

NEWS3w ago
PlayerZero launches AI production engineer and claims 92.6% accuracy on test cases

PlayerZero launched an AI production engineer and claims its world model can simulate failures before release, trace incidents to exact PRs, and beat existing tools on real production test cases. If those numbers hold, the interesting shift is from code generation to debugging, testing, and observability after code ships.

WORKFLOW3w ago
Claude tests 25 Capacitor screens daily through Android CDP and iOS accessibility

A solo developer wired Claude into emulators and simulators to inspect 25 Capacitor screens daily and file bugs across web, Android, and iOS. The writeup is a solid template for unattended QA, but it also shows where iOS tooling and agent reliability still crack.

NEWS4w ago
Cursor Composer 2 ranks #2 on Next.js evals, ahead of Opus and Gemini

Vercel's Next.js evals place Composer 2 second, ahead of Opus and Gemini despite the recent Kimi-base controversy. The result matters because it separates base-model branding from measured task performance on a real framework workflow.

RELEASE4w ago
Claude Code adds scheduled cloud tasks on remote machines with MCP access

Claude Code can now run scheduled cloud tasks against remote repos and MCP-connected tools, while Anthropic is also pushing reusable agent SDK and skill controls. Test remote automation paths carefully, because messaging and multi-repo edge cases still surface in practice.

NEWS4w ago
Cursor reports Composer 2 is based on Kimi K2.5 after API model IDs surfaced

Cursor and Kimi said Composer 2 starts from Kimi K2.5, with continued pretraining and RL added on top after developers spotted Kimi model IDs in traffic. Teams should benchmark it as a productized open-base stack, not a from-scratch model.

RELEASE4w ago
Next.js 16.2 ships AGENTS.md defaults and next-browser for agent debugging

Next.js 16.2 adds version-matched AGENTS.md docs, a terminal browser for inspecting running apps, browser-error forwarding, and a dev-server lock file. It gives coding agents better frontend context and cuts duplicate-server and client-side debugging waste.

RELEASE4w ago
Claude Code updates 2.1.79 with /remote-control, Console auth, and stricter memory saving

Anthropic shipped Claude Code 2.1.79 with browser and phone session bridging, Anthropic Console auth, timeout fixes, and stricter memory rules, one day after 2.1.78 added line-by-line streaming and StopFailure hooks. Teams using Claude Code should update internal docs for mobile control, auth flows, and memory behavior.

RELEASE4w ago
Hermes Agent releases v0.3.0 with plugins, live Chrome CDP, and ACP IDE support

Hermes Agent v0.3.0 added a first-class plugin system, live browser attach via CDP, real-time streaming, and VS Code, Zed, and JetBrains integration through ACP. Update if you want shareable skills, browser control, and a more stable long-running agent setup.

RELEASE4w ago
Claude Code 2.1.77 adds 64K Opus output defaults and allowRead sandboxes

Anthropic shipped Claude Code 2.1.77 with higher default Opus 4.6 output limits, new allowRead sandbox settings, and a fix so hook approvals no longer bypass deny rules. Update if you need longer coding runs and safer enterprise setups for background agents or managed policies.

NEWS1mo ago
Every launches Proof editor and restores service after launch-day load issues

Every launched Proof, an agent-native collaborative editor with provenance tracking and an open-source SDK, then restored service after heavy-load launch-day outages. Inspect the public repo and local run path if you are evaluating AI-first docs tooling.

NEWS1mo ago
Codex reports session hang incident and rate-limit reset after fix

OpenAI acknowledged a Codex session hang that left some requests unresponsive, later said the issue had been stable for hours, and promised a rate-limit reset. Teams relying on Codex should re-check long runs and confirm quota restoration after the incident.

AI PrimerAI Primer

Your daily guide to AI tools, workflows, and creative inspiration.

© 2026 AI Primer. All rights reserved.