breakingApril 6, 2026

GitHub issue reports Claude Code regressions after Feb update, citing 6,852 sessions

A closed GitHub issue says Claude Code became unreliable for complex engineering after February changes, citing 17,871 thinking blocks and 234,760 tool calls across 6,852 sessions. Anthropic said the redaction flag was UI-only, but developers reported broader Opus quality drops and opaque harness changes.

4 min read

GitHub issue reports Claude Code regressions after Feb update, citing 6,852 sessions

TL;DR

The GitHub issue says Claude Code became unreliable for complex engineering after February changes, based on analysis of 17,871 thinking blocks and 234,760 tool calls across 6,852 session files.
The main HN discussion includes an Anthropic clarification that redact-thinking-2026-02-12 is a UI-only header, and that showThinkingSummaries: true restores summaries in the interface.
According to the HN thread, the original analysis looked at locally stored transcripts, which may undercount thinking once that header is enabled.
HN commenters and Gergely Orosz both describe a broader quality drop, with reports that Opus 4.6 had become more conservative, more shortcut-prone, and harder to steer on long tasks.

You can read the original issue, the HN thread, Anthropic's settings docs, and the Claude Code release notes. The odd part is how many different knobs surfaced at once: hidden thinking summaries, /effort, adaptive thinking flags from the discussion, and a changelog entry saying thinking summaries were no longer generated by default.

The issue centered on transcript analysis

[MODEL] Claude Code is unusable for complex engineering tasks with the Feb updates · Issue #42796 · anthropics/claude-code

Closed GitHub issue reporting that Claude Code became unusable for complex engineering after February 2026 updates due to 'thinking content redaction (redact-thinking-2026-02-12)'. Analysis of 17,871 thinking blocks and 234,760 tool calls from 6,852 sessions shows performance degradation. Created April 2, 2026; closed April 6, 2026. High engagement: 775 thumbs up, 231 hearts.

Issue: Claude Code is unusable for complex engineering tasks with Feb updates

For AI engineers, this thread is mainly about model reliability in coding agents: whether a UI/telemetry change could be correlated with degraded complex-task performance, how to inspect session logs for regressions, and what knobs users are turning (`showThinkingSummaries`, effort level, adaptive-thinking disable flags) to recover quality.

The complaint started in issue #42796, opened April 2 and closed April 6, with unusually heavy engagement for a product bug report: hundreds of reactions and a large Hacker News thread.

The core claim was narrow and testable. According to the issue text, the analysis covered January through March session logs and tied the regression window to the rollout of thinking-content redaction. The author's conclusion was blunt: Claude Code had regressed to the point that it could not be trusted for complex engineering work.

That claim landed because it was not just vibes. The issue attached concrete counts, 17,871 thinking blocks, 234,760 tool calls, and 6,852 session files, which gave the argument more weight than the usual "it feels worse this week" post.

Anthropic disputed the redaction theory

Discussion around Issue: Claude Code is unusable for complex engineering tasks with Feb updates

Thread discussion highlights: - StanAngeloff on OP confirms broader personal experience: I have personally found that I have to add more and more CLAUDE.md guide rails... findings match my own circumstances where I’ve seen noticeable degradation in Opus outputs and thinking. - bcherny on Anthropic clarification: `redact-thinking-2026-02-12` ... is a UI-only change. Under the hood... it does not impact thinking itself... You can opt out ... with `showThinkingSummaries: true`. - thrtythreeforty on Observed degradation in Opus 4.6: I noticed this almost immediately when attempting to switch to Opus 4.6. It seems very post-trained to hack something together; ... "simplest fix" appeared frequently and invariably preceded some horrible slop.

In the HN thread, Anthropic engineer bcherny said redact-thinking-2026-02-12 only hides thinking from the UI. The comment adds that the header does not change thinking budgets or extended reasoning behavior under the hood, and that users can opt out with showThinkingSummaries: true.

That matters to the issue's method more than to its symptoms. According to that same explanation, if users analyze locally stored transcripts after the header is set, they will not see raw thinking in those files even though the model still used it internally. In other words, the visible transcript got thinner whether or not the underlying reasoning did.

The thread did not end there. Other top comments said thinking depth had already dropped before the redaction change, while Gergely Orosz argued the bigger problem was zero transparency around harness changes that only become obvious after a workflow breaks.

The reports widened beyond one UI flag

The most useful part of the discussion is the symptom list, because it points past a single header dispute:

StanAngeloff's HN comment said he had to add more CLAUDE.md guardrails over time and had seen noticeable degradation in Opus outputs and thinking.
thrtythreeforty's comment said Opus 4.6 started reaching for the "simplest fix" and then producing "horrible slop."
Another top comment summarized in the evidence said similar behavior showed up outside Claude Code, including through Copilot-style surfaces.
Orosz's post said Claude Code had become unusable for tasks it handled the day before, and had started refusing work it no longer considered strictly software development.

That mix makes this look less like one broken toggle and more like a classic coding-agent headache: model behavior, harness policy, and transcript visibility all changed close enough together that users could not cleanly separate them.

Release notes show other behavior changes in the same period

The official Claude Code release notes add a few details that explain why users were reaching for settings in the thread. One changelog entry says the default effort level changed from medium to high for several user tiers, another says thinking summaries were no longer generated by default, and a third mentions an autocompact thrash-loop fix.

The settings docs also expose alwaysThinkingEnabled as a configurable option. That does not prove the regression report, but it does show how many moving parts were live around the same workflow: effort level, hidden summaries, transcript persistence, and compaction behavior.