Claude Code users report hidden Agent access, empty-string MCP failures, and slower Opus 4.7 runs
Practitioners shared a transcript showing Claude Code invoking Agent despite project allow-lists, a reproducible MCP bug that drops all params when one value is an empty string, and reports of much slower Opus 4.7 runs than in Cursor. That matters because teams are spending real quota debugging harness behavior, retries, and cache invalidation instead of model output.

TL;DR
- A Reddit user posted a Claude Code transcript where the model claimed the
Agenttool was present in its runtime context and callable even though the project'ssettings.jsondid not allow it, which points to a permissions enforcement gap rather than a simple bad run oqdoawtt's transcript. - Another user published a reproducible MCP failure where any empty-string parameter causes Claude Code to send no params at all, plus a filed bug and demo repo, so teams debugging MCP retries may be chasing harness behavior rather than their tool schema tanin47's MCP bug report and GitHub issue #61947.
- Prompt caching economics are harsher than most Claude Code users realize: Anthropic's own docs price 5-minute cache writes at 1.25x base input and cache reads at 0.1x, which is the basis for a 12.5x miss-versus-hit delta surfaced by lawnguyen123's cost breakdown and documented in Anthropic's prompt caching docs.
- Complaints about Opus 4.7 in Claude Code are increasingly about the harness, not just the model, with the Cursor subreddit thread and bridgemindai's post both describing slower, less reliable long sessions than comparable runs in Cursor or Codex.
- Anthropic has been shipping fixes at high frequency, including a one-line 2.1.148 patch for broken Bash exit codes ClaudeCodeLog on 2.1.148 and a 2.1.149 release that added per-category token accounting plus several permission and sandbox fixes ClaudeCodeLog's 2.1.149 changelog.
You can read Anthropic's April 23 postmortem, check the official prompt caching docs, inspect the MCP repro repo, and compare the rapid-fire CLI fixes in the 2.1.147 changelog, 2.1.148 changelog, and 2.1.149 changelog. Boris Cherny, Claude Code lead at Anthropic, also previewed /usage before it landed, with per-skill, per-agent, per-MCP, and per-plugin token breakdowns in bcherny's preview.
Agent permissions
The most concrete report in this batch is oqdoawtt's transcript, where Claude Code explicitly says the Agent tool appeared in the top-of-context tool block even though the project's allow list did not include it.
Claude injecting hard coded permissions into it's System Prompt
9 comments
The claim matters because the transcript is specific about the failure mode. The user says 13 agents burned roughly 500,000 tokens each during a short absence, then Claude attributed its own access to a runtime-defined tool list rather than the local settings.json policy oqdoawtt's transcript.
That lines up with the broader anxiety in nptacek's retweet of deepfates about invisible memory insertion and tool behavior. It also fits the older quality discussion in the main Hacker News thread, where commenters argued Claude Code regressions can come from wrapper changes, reasoning defaults, and compaction bugs rather than a new base model.
MCP empty strings
A separate report from tanin47's MCP bug report describes a more reproducible failure: if any MCP parameter is an empty string, Claude Code drops all params and calls the tool with none.
I found a Claude Code bug when an MCP param has an empty string value.
1 comments
The useful detail here is that the model's reasoning looked correct in the screenshot, but invocation still went out malformed. The user says they spent about $40 testing variants before isolating the bug, published a repro repo, and filed issue #61947.
That is exactly the sort of bug that feels like a model miss until you can reproduce it outside the prompt. Once you know the trigger, the workaround is blunt: use null instead of an empty string, per tanin47's MCP bug report.
Cache invalidation
Cache miss in Claude Code costs 12.5× more than a hit. Here are 5 things you do mid session that quietly trigger it
0 comments
The best cost explainer in the evidence set is lawnguyen123's cost breakdown, which lifts two numbers directly from Anthropic's prompt caching docs: 5-minute cache writes cost 1.25x base input, cache reads cost 0.1x.
The post turns Anthropic's invalidation table into a five-item list of common Claude Code cache busts:
- Installing or removing an MCP server mid-session, which changes tool definitions.
- Switching models with
/model, because caches are per-model. - Editing
CLAUDE.mdduring an open session, because it lives in the system prompt area. - Toggling fast mode, which Anthropic documents as a system-cache invalidator.
- Adding an image mid-conversation, which invalidates message blocks.
The official companion docs are How Claude remembers your project and Best practices for Claude Code, both linked by the Reddit post. Combined with Anthropic's April 23 postmortem, the pattern is pretty clear: a lot of Claude Code pain is hiding in prompt assembly, caching, and session management.
Harness drift
The performance complaints in this story are less about benchmark charts than about side-by-side usage. In the Cursor subreddit thread, one user says the same Opus 4.7 task took about 20x longer in Claude Code desktop than in Cursor configured to Opus 4.7 High.
Opus 4.7 behaves differently in Claude Code desktop app vs Cursor?
4 comments
The replies in that Reddit thread frame Cursor as a better harness, while bridgemindai's post says Opus 4.7 on xhigh effort with 1M context loses track of context mid-session and needs multiple attempts on bugs that GPT 5.5 one-shots. AlemTuzlak's post makes a similar comparison, claiming work that took three to five days with Opus took one day after moving the setup to Codex.
None of those posts prove a single root cause. They do match the April complaints summarized in the HN discussion digest, where commenters pointed to medium reasoning defaults, stripped thinking in older sessions, and a less verbose system prompt as non-model sources of quality loss.
Changelog triage
Anthropic has not been standing still. The last three public CLI releases in the evidence set read like active incident cleanup.
The notable changes cluster into a few buckets:
- Observability:
/usagenow shows token usage by skills, subagents, plugins, and individual MCP servers, first previewed by bcherny's preview and shipped in 2.1.149 per ClaudeCodeLog's 2.1.149 changelog. - Permissions and sandboxing: 2.1.149 fixed a PowerShell permission bypass, stale variable tracking around
PWDandOLDPWD, and a too-broad git worktree write allowlist, according to ClaudeCodeLog's 2.1.149 changelog. - Multi-agent controls: 2.1.147 added an off-by-default
Workflowtool for deterministic multi-agent orchestration viaCLAUDE_CODE_WORKFLOWS=1, per ClaudeCodeLog on 2.1.147. - Regression cleanup: 2.1.148 existed solely to fix a Bash tool regression that returned exit code 127 for every command for some users, as ClaudeCodeLog on 2.1.148 notes.
- Long-session debugging: 2.1.149 improved
/feedbackso reports include conversation history from before compaction, which the 2.1.149 changelog says should make earlier-session issues easier to triage.
That last point is new information in this story's arc, because it suggests Anthropic now treats compaction-era evidence loss as a debugging problem of its own. When your bug reports need pre-compaction context to make sense, the harness has become part of the incident surface.