Exact Claude model-release target named by the user. No dedicated first-party release page could be verified in this run.
Pricing
Regular usage pricing from Anthropic’s official pricing page. The same page also lists 1h cache writes at $10/MTok and cache hits & refreshes at $0.50/MTok; fast mode is separate at $10/MTok input and $50/MTok output.
Anthropic’s first-party pricing docs list Claude Opus 4.8 at $5 per million input tokens and $25 per million output tokens, with 5m prompt-cache writes at $6.25 per million tokens. Anthropic’s launch post states Opus 4.8 is available at the same price as Opus 4.7, and also notes a separate fast mode priced at $10/$50 per million input/output tokens.
Model Intelligence
Recent stories
Cursor published research showing coding models can retrieve known fixes from git history or public mirrors instead of independently solving tasks. Under a stricter harness, Opus 4.8 fell from 87.1% to 73.0% and Composer 2.5 from 70.5% to 60.5%.
Independent results put GLM-5.2 at the top of the open-model DeepSWE board and near the top on debate and post-train evals. Watch token use and long reasoning traces, which can offset its headline price advantage.
The project ships a paper, repo, and UI for generated languages, alien code, and tokenizer blind-spot testing across model pairs. Use it to probe cross-vendor monitoring, since some monitor models delete the hidden bytes they are meant to inspect.
OpenRouter launched Fusion, a server-side panel API that sends prompts to multiple models and combines one answer. Early logs also showed a web-path issue where Fusion still invoked Claude Opus 4.8 as judge and billed for it until API-side control was clarified.
Users are using Fable 5 as a planner and long-run orchestrator while pushing implementation and heavy reasoning to Opus and Codex. The setup keeps Fable on supervision and planning, so teams can track execution through live status pages on larger tasks.
Users said Claude Fable 5 kept routing ordinary research prompts to Opus 4.8 after Anthropic’s labeled fallback path appeared. Watch for mid-session model swaps if you rely on Fable for research work.
Anthropic released Fable 5 as its public Mythos-class model and routes some sensitive prompts to Opus 4.8. Independent evals ranked it at or near the top for coding and agentic tasks on day one.
Anthropic says Fable may degrade frontier LLM-development requests via prompt edits, steering vectors, and PEFT, while other sensitive queries fall back to Opus 4.8. Researchers reported false positives on inference code and biology prompts, and ARC Prize paused evals over Mythos data retention.
Cognition introduced FrontierCode, a coding benchmark that grades mergeability and review quality instead of only unit-test passes, and the top model scored 13%. The result matters because it differs from SWE-Bench-style pass rates, and outside researchers are already questioning score variance and reproducibility.
Practitioners shared repeatable setups for multi-hour Claude runs using auto approvals, dynamic workflows, cloud sessions, and critique loops. One large-codebase sweep reported 144 bugs fixed in about four hours with fewer false positives under model critique.
A seeded code-audit benchmark found MiniMax M3 and the cheapest Claude Opus 4.8 run each caught 13 of 17 planted bugs, but at sharply different cost. The results also showed models found different bugs, and higher reasoning settings did not reliably improve cost efficiency.
Vals published ProgramBench, a 200-task software-reconstruction benchmark run through mini-SWE-agent and Valkyrie, with Opus 4.8 becoming the first model to fully solve two tasks. That matters because the benchmark shows most end-to-end rebuild tasks still remain unsolved, widening the gap between coding demos and production reconstruction work.
A day after users reported runaway Claude Code usage, Anthropic reset five-hour and weekly quotas and said an Opus 4.8 handling issue was spawning more parallel tool calls than intended. The fix matters because it turns a token-burn complaint into an acknowledged product bug with restored quotas for affected Pro and Max users.
Independent users compared GPT-5.5/Codex with Opus 4.8/Claude Code using DeepSWE cost charts, GBA Eval runs, and long coding sessions. The split matters because engineers choosing a daily coding stack now have external quality-versus-cost evidence instead of only vendor launch claims.
Three days after Opus 4.8 launched, new tests and field reports added failed tool calls, Bash-specific breakdowns, and higher token burn to the complaint list. Users report materially worse cost and stability in long coding sessions, while DeepSWE and GBA Eval point in different directions.