releaseMay 18, 2026

Claude Code updates /fast to Opus 4.7 at ~2.5× response speed

Anthropic made /fast default to Opus 4.7, added prompt cache diagnostics in Claude Console, published scale guidance for large codebases, and renamed extra usage to usage credits. The bundle gives teams clearer controls over latency, cache misses, and paid usage while running long Claude Code sessions.

4 min read

Claude Code updates /fast to Opus 4.7 at ~2.5× response speed

TL;DR

ClaudeDevs switched Claude Code's /fast mode to Opus 4.7, with the team describing it as the same Opus-quality output at about 2.5 times the response speed, billed at a higher per-token rate.
ClaudeDevs added prompt cache diagnostics in Claude Console, and the linked cache diagnostics docs show cache misses broken out by what changed in the request.
In ClaudeDevs' large-codebase post, Anthropic said the practices came from teams running Claude Code across multi-million-line monorepos, legacy systems, and distributed microservices, with the full writeup in the linked blog post.
ClaudeDevs also renamed "extra usage" to "usage credits," and the thread says the new label better fits features like fast mode that consume paid credits directly.

You can jump straight to Anthropic's fast mode docs, browse the new cache dashboard, and read the full large codebase guide. The interesting part is how tightly these pieces fit together: one update buys speed, one shows why cached prompts missed, one explains how big teams keep agents on track, and one renames the billing bucket that now powers faster runs.

Fast mode

Anthropic made /fast default to Opus 4.7, and the launch post frames it as a latency trade: same model quality target, roughly 2.5 times the response speed, and a higher per-token price.

That makes /fast feel less like a hidden power-user toggle and more like a first-class operating mode for live debugging and rapid iteration, which is exactly how ClaudeDevs' thread pitched it. The linked fast mode docs are now the canonical reference for that tradeoff.

Cache diagnostics

The new Console view answers a question teams usually had to infer from bills and traces: what changed between one request and the next, and how expensive was that miss.

From

, the breakdown is explicit:

cache read ratio and cache read tokens are shown as top-line metrics
miss totals are split by model, here shown for Claude Opus 4.7
missed tokens are grouped by reason: messages changed, system changed, tools changed, model changed

That turns prompt caching into something you can debug visually instead of treating it as a black box. ClaudeDevs said the feature shows exactly which part of the prompt changed and how many tokens it cost.

Large codebases

Anthropic's new guide is aimed at teams already running Claude Code inside very large repos, not toy demos. In the announcement, the company said the lessons came from multi-million-line monorepos, decades-old legacy systems, and distributed microservices.

The practical pattern that surfaced in community replies is scope control. aakashgupta's CLAUDE.md router pattern argues for a tiny top-level CLAUDE.md that only does six jobs: project description, file map, identity context, knowledge routing, workflow pointers, and a self-improving prompt. Everything else gets pushed into separate voice, patterns, platform, metrics, knowledge, or skills files.

That is a useful complement to Anthropic's large codebase guide: the official post says big teams need structure, while the community example shows one concrete way to keep the agent from scanning blindly.

Usage credits

The rename from "extra usage" to "usage credits" is cosmetic on the surface, but ClaudeDevs' explanation makes the product shift pretty clear. "Extra usage" described overage spending after a plan limit. "Usage credits" now covers features that consume paid capacity directly, including fast mode.

Anthropic also said in the same thread that spending limits, auto-reload settings, and already-purchased credits carry over unchanged. The label changed because the billing unit now does more than catch overflow.

Running notes

One of the better workflow ideas around this bundle came from trq212, who shared a prompt that asks Claude Code to keep a running implementation-notes.html or Markdown file while it works. The note file is meant to capture decisions the spec did not resolve, changes made during execution, and tradeoffs worth reviewing later.

That pairs neatly with Anthropic's push toward longer, larger-codebase sessions. trq212's post about long-running agents points at the same operator problem from another angle: once agents stay busy for a while, teams need lightweight ways to stay in the loop without interrupting the run.