Anthropic updates Fable 5 safeguards to show frontier-LLM limits
Anthropic said its safeguards for frontier-LLM-development requests will now be visible, and it apologized for using silent effectiveness limits after Fable 5 launched. The change matters because the original system-card language allowed unseen prompt or parameter interventions on a small slice of AI-research traffic.

TL;DR
- Anthropic said it will make Fable 5's safeguards for frontier LLM development visible, and it told WIRED that it "made the wrong tradeoff" after backlash over silent effectiveness limits that Simon Willison's update post surfaced from the launch docs.
- The original system-card language, quoted in Simon Willison's earlier post, said Fable 5 could quietly blunt requests about pretraining pipelines, distributed training infrastructure, or ML accelerator design through prompt modification, steering vectors, or PEFT.
- Anthropic's official launch post had already described visible fallback behavior for cyber, biology, chemistry, and distillation prompts, while the system-card excerpt showed frontier-LLM-development requests were handled differently.
- Fable 5 remains the public model, while Mythos 5, described in the main HN launch thread and the official announcement, is the same underlying model with some safeguards lifted and is still restricted to Project Glasswing and other trusted-access programs.
- According to fresh HN discussion, the launch is also landing with a second story attached: engineers like the model's higher-level reasoning and debugging behavior, but they keep running into price and spend-limit complaints.
You can read Anthropic's launch post, Simon Willison's first writeup of the hidden clause, and his follow-up on the reversal. Maxwell Zeff's WIRED report is where Anthropic's apology landed, and the two Hacker News threads, the launch discussion and the backlash thread, show how fast the silent-intervention language became the real story.
Silent frontier-LLM limits
If Claude Fable stops helping you, you'll never know
If Claude Fable stops helping you, you'll never know Jonathon Ready highlights one of the more eyebrow-raising details from the 319 page system card for Fable 5 and Mythos 5. Here's a longer excerpt, highlights mine: In light of the ability of recent models to accelerate their own development, we’ve implemented new interventions that limit Claude’s effectiveness for requests targeting frontier LLM development (for example, on building pretraining pipelines, distributed training infrastructure, or ML accelerator design). Using Claude to develop competing models already violates our Terms of Service, but enforcing this restriction through our safeguards avoids accelerating the actors most willing to violate these terms. Unlike our interventions for cybersecurity, biology and chemistry, and distillation attempts, these safeguards will not be visible to the user. Fable 5 will not fall back to a different model. Instead, the safeguards will limit effectiveness through methods such as prompt modification, steering vectors, or parameter-efficient fine-tuning (PEFT). These interventions will not affect the vast majority of coding work. We estimate they will impact ~0.03% of traffic, concentrated in fewer than 0.1% of organizations. I believe this is the first time Anthropic have announced these kinds of silent interventions. The justification still feels pretty science-fiction to me - the linked article talks about "recursive self-improvement". I'm not at all keen on a model tha
The original clause was unusually specific. As Simon Willison's earlier post quoted it, requests targeting frontier LLM development, including pretraining pipelines, distributed training infrastructure, and ML accelerator design, would not trigger a visible refusal or a fallback to a different model.
Instead, Anthropic said Fable 5 would limit effectiveness through prompt modification, steering vectors, or parameter-efficient fine-tuning. The same excerpt put the estimated scope at about 0.03% of traffic, concentrated in fewer than 0.1% of organizations system-card excerpt.
Anthropic's reversal
Anthropic Walks Back Policy That Could Have ‘Sabotaged’ AI Researchers Using Claude
Anthropic Walks Back Policy That Could Have ‘Sabotaged’ AI Researchers Using Claude Big scoop for Maxwell Zeff at Wired: “We’re changing Fable 5’s safeguards for frontier LLM development to make them visible.” Anthropic said in a statement to WIRED. “We made the wrong tradeoff and we apologize for not getting the balance right.” There's been a huge outcry about Anthropic's policy, tucked away in their system card, that Claude Fable/Mythos would identify "requests targeting frontier LLM development" and "limit effectiveness" without notifying the user. It's very good news that they're dropping this. Tags: ai, generative-ai, llms, anthropic, claude, ai-ethics, claude-mythos
By June 11, Anthropic had reversed course. In the statement quoted by Simon Willison's update post, the company said it was "changing Fable 5's safeguards for frontier LLM development to make them visible" and added, "We made the wrong tradeoff and we apologize for not getting the balance right."
That apology matters because Anthropic's public launch framing had already trained readers to expect visible guardrails. The official announcement said some high-risk queries would be routed to Claude Opus 4.8, and Simon Willison's launch impressions noted that the API had new mechanisms to show when guardrails fired, plus an option to request an automatic fallback to another model.
What stays restricted
Anthropic Launches Claude Fable 5 and Restricted Claude Mythos 5 Models
Anthropic has released Claude Fable 5, a Mythos-class model optimized for general use, and Claude Mythos 5, a version with specialized safeguards removed for specific use cases. Claude Fable 5 is available to the public via the Claude API. Claude Mythos 5, which features enhanced cybersecurity capabilities, is currently restricted to Project Glasswing partners and will soon be available to selected biology researchers under a restricted trusted access program. Both models are priced at $10 per million input tokens and $50 per million output tokens.
The reversal changes the visibility of one safeguard, not the product lineup. The official launch post and the main HN thread still describe the release like this:
- Fable 5 is the public model, available through the Claude API main HN launch thread.
- Mythos 5 is the same underlying model with safeguards lifted in some areas, and Anthropic is limiting it to Project Glasswing partners before broader trusted access official announcement.
- Both models are priced at $10 per million input tokens and $50 per million output tokens, according to the HN launch summary and Simon Willison's launch impressions.
- Simon's launch testing also pulled out the core specs: a 1 million token context window, 128,000 max output tokens, and a January 2026 knowledge cutoff launch impressions.
Early user reaction and cost
Fresh discussion on Claude Fable 5
Today's fresh signal is mostly hands-on reports from people using Fable 5 in Claude Code and Claude.ai. Several commenters say it materially improves higher-level reasoning, plan review, and complex bug work; one describes it as strong enough to set up its own testing lab for a Windows process lifecycle issue, and another says it can find directional simplifications after Opus and Codex have exhausted obvious fixes. The other new theme is friction: people report hitting spend or usage limits very quickly, and some argue the model is too expensive to be a default choice. A separate comment questions Anthropic's benchmark changes and moving scores into the PDF, while another raises legal concerns about the model's data-retention and access policies.
The HN launch thread split quickly into two camps. According to fresh HN discussion, engineers described better high-level reasoning, stronger plan review, and unusually thorough debugging behavior, including one report that Fable 5 built a whole repro and testing setup around a Windows process-lifecycle bug.
The other theme was sticker shock. Fresh HN discussion highlighted repeated complaints about spend caps and fast limit exhaustion, while Simon Willison's launch impressions said he spent $110.42 in tokens in about five and a half hours of testing on day one.