TOPIC50 stories

DX Reliability

Stories about uptime, regressions, debugging behaviour of AI tools as experienced by engineers (model degradation, IDE crashes, tooling outages). Overlaps with reliability — apply both when relevant.

Stories

Filter stories

NEWS1st June

MiniMax M3 users report slow runs and broken code after launch

A day after MiniMax M3 launched, independent testers posted mixed results: cheap demos and design tasks worked, but several coding runs stalled, broke features, or used more tokens than expected. New external numbers added nuance, with Context Arena falling sharply after 64k context and one DeepSWE run passing 15 of 113 tasks.

RELEASE1st June

Lovable introduces TanStack Start output with SSR, server functions, and type safety

Lovable moved newly generated apps onto TanStack Start, adding route-level SSR, SSG, CSR, server functions, and stricter type-safe boundaries to its generated stack. The migration matters because framework primitives become guardrails for both generated-code quality and deploy-anywhere app behavior.

NEWS1st June

Claude Code resets 5-hour and weekly limits after Opus 4.8 parallel-tool bug

A day after users reported runaway Claude Code usage, Anthropic reset five-hour and weekly quotas and said an Opus 4.8 handling issue was spawning more parallel tool calls than intended. The fix matters because it turns a token-burn complaint into an acknowledged product bug with restored quotas for affected Pro and Max users.

NEWS31st May

Opus 4.8 users report token burn, failed tool calls, and DeepSWE gaps

Three days after Opus 4.8 launched, new tests and field reports added failed tool calls, Bash-specific breakdowns, and higher token burn to the complaint list. Users report materially worse cost and stability in long coding sessions, while DeepSWE and GBA Eval point in different directions.

RELEASE31st May

OpenClaw adds Auto exec approvals with guardian-agent review

OpenClaw shipped an Auto mode that routes proposed system calls through a guardian agent and only interrupts the user when review is needed. Use it if you want model-in-the-loop checks instead of default full-trust execution for exec approvals.

NEWS31st May

Claude Code users report accidental workflow triggers, 199-agent research runs, and 50M-token burn

Three days after Dynamic Workflows launched, Claude Code users reported accidental mode triggers, a 199-agent deep-research run that burned about 50 million tokens, and steep quota hits from design workflows. The complaints matter because orchestration can now dominate cost and behavior even when the underlying model is working as expected.

NEWS30th May

Opus 4.8 users report write failures, sycophancy, and 58% DeepSWE

Two days after launch, users and benchmarks pointed to write failures, sycophancy, lower security recall, and a 58% DeepSWE result. GPT-5.5 still leads on cost, output tokens, and pass@1 in shared coding-agent tests, so compare both before switching.

RELEASE30th May

OpenClaw releases 2026.5.28 with Opus 4.8 support and faster turns

OpenClaw 2026.5.28 added Claude Opus 4.8 and Krea support while cutting fresh-install size 52.8% and speeding both cold and warm turns. It also expanded /subagents inspection, which should make delegated runs easier to debug.

NEWS29th May

Opus 4.8 users report false greens, token burn, and mixed benchmark gains

A day after launch, users and third-party evals reported false verified claims, million-token loops, and mixed task results despite strong headline wins. Watch task-by-task results and token cost closely because reliability varied sharply by effort setting and harness.

RELEASE29th May

Claude Code 2.1.158 adds auto mode for Bedrock, Vertex, and Foundry

Anthropic followed Claude Code 2.1.157 with 2.1.158, enabling auto mode on Bedrock, Vertex, and Foundry for Opus 4.7 and 4.8. The paired releases also add local plugin scaffolding and auto-load plus fixes for image handling and sandbox permission prompts.

RELEASE28th May

Claude Opus 4.8 ships with 69.2% SWE-Bench Pro and 2.5x Fast mode

Anthropic released Claude Opus 4.8 across Claude, the API, and major clouds with higher coding scores and a cheaper 2.5x-speed Fast mode. Use it for coding workloads that want better benchmark performance without a price increase over 4.7.

RELEASE28th May

OpenClaw 2026.5.27 fixes runtime boundaries and cuts cold turns 2.9x

OpenClaw 2026.5.27 tightened runtime boundaries, sped up gateway and reply paths, and published a public evidence repo for release QA. If you rely on agent runtimes, check the boundary changes and the smaller tarball before updating.

NEWS27th May

OpenAI outages API and ChatGPT with elevated latencies across GPT-5.5 workflows

OpenAI said the API and ChatGPT were seeing elevated latencies before marking the incident resolved later in the day. User reports showed stalled GPT-5.5 sessions and retry loops, turning the issue into a production and coding-agent disruption.

RELEASE27th May

Claude Code 2.1.153 fixes stateful MCP regressions and adds skipLfs

Claude Code 2.1.153 adds skipLfs for Git and GitHub clones and fixes a stateful MCP regression introduced in v2.1.147. The release also stops custom gateways from receiving a user's Anthropic OAuth credential and pairs with broader responsiveness work.

NEWS1w ago

Claude Code users report 200K context rollbacks and deleted session files

Fresh posts added 600K-to-200K context rollbacks, auto mode breaking human checkpoints, and default session-file deletion to the recent Claude Code complaint stack. Watch long sessions and review loops closely, since recovery got harder when session files disappeared.

NEWS1w ago

Claude Code users report hidden Agent access, empty-string MCP failures, and slower Opus 4.7 runs

Practitioners shared a transcript showing Claude Code invoking Agent despite project allow-lists, a reproducible MCP bug that drops all params when one value is an empty string, and reports of much slower Opus 4.7 runs than in Cursor. That matters because teams are spending real quota debugging harness behavior, retries, and cache invalidation instead of model output.

NEWS1w ago

OpenAI fixes Codex cache-hit bug and resets usage limits

OpenAI said a recent Codex optimization lowered cache-hit rates in long-running sessions, drained limits faster, rolled it back, and reset all accounts. That matters because compaction and cache behavior directly determine quota burn and session reliability.

RELEASE1w ago

OpenClaw releases 2026.5.22 with ~5ms /models startup

OpenClaw 2026.5.22 shipped leaner gateway and model startup paths, bringing /models to about 5 ms, while also adding locked dependency shrinkwraps and safer Windows rollbacks. That matters because it targets both startup latency and release-install trust for local agent operators.

NEWS1w ago

Qwen 3.7 Max users report 5-minute cache creation, $43 vibe-coding bills, and uneven task quality

A day after Qwen 3.7 Max launched, users posted both standout benchmark wins and rough real-work reports, including 5-minute cache creation and $43 in 15 minutes of vibe coding. That matters because teams evaluating coding agents are seeing a gap between leaderboard strength and per-task reliability.

RELEASE1w ago

Antigravity updates Gemini 3.5 Flash with permanent 3x quotas and 2x context

A day after Antigravity raised weekly Gemini quotas, the team said the 3x increase is permanent and doubled Gemini 3.5 Flash max context in AGY. The same update batch also clarified the IDE split and shipped Windows fixes, changing day-to-day limits and workflow behavior for developers.

RELEASE1w ago

Claude Code releases 2.1.149 with `/usage` breakdown and PowerShell cwd fix

Claude Code 2.1.149 added `/usage` cost breakdowns and fixed a PowerShell working-directory bypass, sandbox issues in git worktrees, and macOS file-table exhaustion from `find`. Anthropic also expanded auto mode to Pro plans and Sonnet 4.6 in the same update window, so users should check their available modes.

RELEASE1w ago

Perplexity launches Bumblebee scanner for macOS and Linux developer machines

Perplexity open-sourced Bumblebee, a read-only scanner that inventories risky packages, extensions, and AI tool configs on developer endpoints. It covers 8+ package ecosystems plus MCP server configs, so teams can audit exposure before code reaches production.

RELEASE1w ago

OpenClaw releases 2026.5.20 with Discord voice follow and secret warnings

OpenClaw 2026.5.20 adds Discord voice sessions that follow configured users, plus doctor checks for plaintext secrets in config files. The release also improves xAI headless login, clarifies model status, and fixes stuck Windows installs.

NEWS1w ago

Gemini 3.5 Flash users report 3x price hikes and broken tool chains one day after launch

Users reported failed harness runs, benchmark misses, broken Calendar and video-editing flows, and later a tripled Antigravity rate limit after Gemini 3.5 Flash launched. Watch real agent workflows closely, because the speed gains are arriving with higher spend and unstable behavior.

WORKFLOW1w ago

Lovable adds is_stuck pipeline with Overflow retrieval to cut stuck rate 5%

Lovable described a production loop where an is_stuck classifier detects repeated failures, Overflow injects past solution pairs, and send_feedback escalates real tool failures. The system lowered stuck rate 5% and raised publish rate 2%, so teams can use the same signal to debug outages and agent frustration.

NEWS1w ago

GitHub reports 3,800 internal repos breached via poisoned VS Code extension

Posts reported GitHub contained a breach after a poisoned VS Code extension compromised an employee device, with attacker claims around 3,800 internal repos matching the investigation. Related SHai-Hulud payload reports are pushing teams to audit `pull_request_target`, extension trust, and secret rotation.

NEWS1w ago

Google Cloud blocks Railway account after Unisuper precedent, prompting multicloud warnings

Multiple posts reported Google Cloud suspended Railway's production account, reviving comparisons to Google's earlier Unisuper deletion incident. The episode is pushing engineers to treat multicloud backups and off-provider recovery as hard requirements, not optional insurance.

RELEASE2w ago

Claude Code 2.1.145 adds claude agents --json and Bash tool execution

Anthropic released Claude Code 2.1.145 with JSON session listing for scripting, Bash execution inside Tool, and richer OTEL span metadata. Update if you rely on automation, and review the fix for the environment-variable approval bypass plus the UI bug fixes.

RELEASE2w ago

Claude Code 2.1.144 adds `/resume` for background sessions and fixes 75s startup hangs

Claude Code 2.1.144 shipped background-session `/resume`, elapsed completion notifications, exact string replacements, and grep-based system search. It also fixes startup hangs, resize corruption, and long-session terminal glitches that affected reproducibility.

NEWS2w ago

Codex fixes usage-limit sync bug after 2-hour subscriber lockout

OpenAI said a metering bug put many Codex subscribers at the wrong usage level for about two hours, then restored balances and waived usage from that window. This matters because the incident interrupted active sessions and showed how subscription sync failures can halt agent runs mid-task.

RELEASE2w ago

Pi raises minimum Node.js to 22.19.0 after Undici login breakage

Pi raised its minimum Node version from 20 to 22.19.0, then shipped a follow-up after Undici-related changes in Node 26 caused Copilot and Codex login failures. This matters because agent CLIs built on Node and Undici can hard-fail on auth or install paths after runtime upgrades.

RELEASE2w ago

Codex updates app with customizable shortcuts and 10-50x faster Git ops

OpenAI shipped shortcut customization, restored Git controls, cleaned up panels, and sped up large-repo operations in Codex. Paid-plan usage caps were also reset, though some accounts saw delayed propagation.

NEWS2w ago

Claude Code users report metered -p mode and slower headless sessions after credit split

A day after developers flagged Anthropic’s SDK credit split, Claude Code users said -p work had become metered, slower, and harder to run headlessly. Anthropic reset 5-hour and weekly limits, and Claude Code 2.1.143 added projected context-cost estimates.

NEWS2w ago

OpenAI fixes two GPT-5.5 issues in Codex after users report looping runs

OpenAI said Codex’s GPT-5.5 degradation over the prior 48 hours came from two issues and it will reset usage limits after the fix. Users had reported looping runs, higher cache burn, and unstable sessions in active coding workflows.

NEWS2w ago

Claude users report billing shock after SDK credit update and flat-rate loss

Users reported cancellations, pricing math, and harness-specific workarounds after Anthropic said Claude Agent SDK usage would move to monthly credits on June 15. The change shifts third-party Claude agent economics and is already pushing some users toward other runtimes and tools.

RELEASE2w ago

Claude Code 2.1.142 adds `claude agents` flags and fixes macOS sleep reconnects

Claude Code 2.1.142 added new background-session flags for directories, permissions, model, effort, and MCP or plugin config while switching Grep to ripgrep by default. The release also fixes remote MCP timeouts and daemon reconnect failures after macOS sleep.

NEWS2w ago

Codex introduces Windows sandbox with firewall rules and write-restricted tokens

OpenAI detailed the Windows sandbox behind Codex, using local user accounts, ACLs, firewall rules, and DPAPI-protected secrets instead of a generic VM wrapper. The design gives Windows developers safer file and network controls without making coding-agent workflows unusable.

NEWS3w ago

Researchers report Mini Shai-Hulud hits OpenSearch, Guardrails, and RubyGems after TanStack

Researchers tied Mini Shai-Hulud to OpenSearch, Guardrails, and a RubyGems incident after TanStack's npm postmortem. Track registry controls, CI cache hardening, dependency policy, and secret handling before the next package hit.

RELEASE3w ago

Claude Code 2.1.140 fixes /goal hangs and adds case-insensitive subagent matching

Anthropic shipped Claude Code 2.1.140 with a /goal fix for hook-restricted sessions, case-insensitive subagent matching, and prompt/token reductions. The update should reduce failures in managed settings and background runs.

NEWS3w ago

TanStack reports npm supply-chain attack across 42 packages with credential-stealing payload

TanStack disclosed a supply-chain attack that pushed two malicious npm versions across 42 packages in a 10-minute window. The payload targeted cloud keys, GitHub tokens, npm credentials, and SSH material, so teams should audit installs and rotate secrets.

NEWS3w ago

GPT-5.5 users report 3.3M cached tokens and 2.5x /fast credits

Engineers shared fresh measurements on GPT-5.5 cache reuse, /fast pricing, and bug-finding budgets after comparison posts for GPT-5.5 and Opus 4.7 led the coding round-up. The reports suggest Codex cost and quality now swing on cache behavior and effort settings as much as on list prices.

NEWS3w ago

Amp Neo limits beta access after sqs says the team paused expansion for stability

Amp’s sqs said the team paused adding more users to the Amp Neo beta to improve stability while early testers kept posting real-project demos. The update matters because it turns yesterday’s scaling complaints into an explicit access constraint for the remote coding-agent beta.

NEWS3w ago

Amp Neo reports scaling issues as remote Mac-mini beta reaches airplane Wi-Fi users

Amp paused wider Neo rollout after hitting scaling issues, but beta users still showed remote sessions running from a home Mac mini through the web UI, including over airplane Wi-Fi. That makes Neo notable as a local-hosted coding-agent model, even if the control plane is not yet stable enough for broader access.

WORKFLOW3w ago

Claude Code guide fixes hallucinated SHAs with adaptive thinking off and effort=high

A Claude Code guide tied hallucinated package names, API versions, and SHAs to zero-thinking turns and recommended config changes to force fixed reasoning budgets and higher effort. HN discussion and user reports suggest the workaround is being used against a broader reliability regression, not just one bad prompt.

RELEASE3w ago

React Doctor v2 launches npx react-doctor for Next.js, Vite, and React Native

React Doctor v2 shipped as an open-source CLI that inspects React apps and supports Next.js, Vite, and React Native. The release matters because it targets AI-written React code and gives teams a repeatable terminal check instead of manual review alone.

RELEASE3w ago

Claude Code 2.1.136 adds autoMode.hard_deny and fixes MCP OAuth refresh races

Claude Code 2.1.136 introduced unconditional auto-mode deny rules and fixed several MCP/session failures, including disappearing servers after /clear and lost refresh tokens during concurrent refreshes. The release matters because unattended agent runs can now be constrained more explicitly and remote MCP sessions should require fewer reauths or mid-run recoveries.

RELEASE3w ago

Claude Code 2.1.133 removes per-action confirmations and adds worktree.baseRef

Claude Code 2.1.133 adds worktree.baseRef, hook effort variables, and Linux sandbox path overrides while resetting EnterWorktree base behavior. It also removes per-action confirmations for previously approved risky actions and fixes refresh-token 401 races.

RELEASE3w ago

Claude Code 2.1.132 adds CLAUDE_CODE_DISABLE_ALTERNATE_SCREEN and session_id hooks

Claude Code 2.1.132 added env vars to keep native terminal scrollback and to pass session IDs into Bash subprocesses, plus graceful shutdown fixes. It also moved risky-action confirmation earlier in the system prompt and changed tracing behavior for hooks.

NEWS3w ago

OpenCode adds minimal mode with native scrollback and plugin tracing

OpenCode previewed a non-fullscreen minimal mode that keeps native terminal scrollback intact while refactoring core logic into internal plugins with tracing. The update matters because terminal-first users get steadier sessions and plugin hook performance becomes easier to inspect.

RELEASE4w ago

ChatGPT ships GPT-5.5 Instant by default with Memory Sources

OpenAI is rolling GPT-5.5 Instant into ChatGPT as the default model and exposing it as gpt-5.5-chat-latest, alongside Memory Sources for personalized replies. The model also claims 52.5% fewer high-stakes hallucinations, so watch for behavior changes in production prompts.