Skip to content
AI Primer
update

Anthropic updates Fable 5 safeguards to show frontier-LLM limits

Anthropic said its safeguards for frontier-LLM-development requests will now be visible, and it apologized for using silent effectiveness limits after Fable 5 launched. The change matters because the original system-card language allowed unseen prompt or parameter interventions on a small slice of AI-research traffic.

4 min read
Anthropic updates Fable 5 safeguards to show frontier-LLM limits
Anthropic updates Fable 5 safeguards to show frontier-LLM limits

TL;DR

  • Anthropic said it will make Fable 5's safeguards for frontier LLM development visible, and it told WIRED that it "made the wrong tradeoff" after backlash over silent effectiveness limits that Simon Willison's update post surfaced from the launch docs.
  • The original system-card language, quoted in Simon Willison's earlier post, said Fable 5 could quietly blunt requests about pretraining pipelines, distributed training infrastructure, or ML accelerator design through prompt modification, steering vectors, or PEFT.
  • Anthropic's official launch post had already described visible fallback behavior for cyber, biology, chemistry, and distillation prompts, while the system-card excerpt showed frontier-LLM-development requests were handled differently.
  • Fable 5 remains the public model, while Mythos 5, described in the main HN launch thread and the official announcement, is the same underlying model with some safeguards lifted and is still restricted to Project Glasswing and other trusted-access programs.
  • According to fresh HN discussion, the launch is also landing with a second story attached: engineers like the model's higher-level reasoning and debugging behavior, but they keep running into price and spend-limit complaints.

You can read Anthropic's launch post, Simon Willison's first writeup of the hidden clause, and his follow-up on the reversal. Maxwell Zeff's WIRED report is where Anthropic's apology landed, and the two Hacker News threads, the launch discussion and the backlash thread, show how fast the silent-intervention language became the real story.

Silent frontier-LLM limits

If Claude Fable stops helping you, you'll never know

If Claude Fable stops helping you, you'll never know Jonathon Ready highlights one of the more eyebrow-raising details from the 319 page system card for Fable 5 and Mythos 5. Here's a longer excerpt, highlights mine: In light of the ability of recent models to accelerate their own development, we’ve implemented new interventions that limit Claude’s effectiveness for requests targeting frontier LLM development (for example, on building pretraining pipelines, distributed training infrastructure, or ML accelerator design). Using Claude to develop competing models already violates our Terms of Service, but enforcing this restriction through our safeguards avoids accelerating the actors most willing to violate these terms. Unlike our interventions for cybersecurity, biology and chemistry, and distillation attempts, these safeguards will not be visible to the user. Fable 5 will not fall back to a different model. Instead, the safeguards will limit effectiveness through methods such as prompt modification, steering vectors, or parameter-efficient fine-tuning (PEFT). These interventions will not affect the vast majority of coding work. We estimate they will impact ~0.03% of traffic, concentrated in fewer than 0.1% of organizations. I believe this is the first time Anthropic have announced these kinds of silent interventions. The justification still feels pretty science-fiction to me - the linked article talks about "recursive self-improvement". I'm not at all keen on a model tha

The original clause was unusually specific. As Simon Willison's earlier post quoted it, requests targeting frontier LLM development, including pretraining pipelines, distributed training infrastructure, and ML accelerator design, would not trigger a visible refusal or a fallback to a different model.

Instead, Anthropic said Fable 5 would limit effectiveness through prompt modification, steering vectors, or parameter-efficient fine-tuning. The same excerpt put the estimated scope at about 0.03% of traffic, concentrated in fewer than 0.1% of organizations system-card excerpt.

Anthropic's reversal

Anthropic Walks Back Policy That Could Have ‘Sabotaged’ AI Researchers Using Claude

Anthropic Walks Back Policy That Could Have ‘Sabotaged’ AI Researchers Using Claude Big scoop for Maxwell Zeff at Wired: “We’re changing Fable 5’s safeguards for frontier LLM development to make them visible.” Anthropic said in a statement to WIRED. “We made the wrong tradeoff and we apologize for not getting the balance right.” There's been a huge outcry about Anthropic's policy, tucked away in their system card, that Claude Fable/Mythos would identify "requests targeting frontier LLM development" and "limit effectiveness" without notifying the user. It's very good news that they're dropping this. Tags: ai, generative-ai, llms, anthropic, claude, ai-ethics, claude-mythos

By June 11, Anthropic had reversed course. In the statement quoted by Simon Willison's update post, the company said it was "changing Fable 5's safeguards for frontier LLM development to make them visible" and added, "We made the wrong tradeoff and we apologize for not getting the balance right."

That apology matters because Anthropic's public launch framing had already trained readers to expect visible guardrails. The official announcement said some high-risk queries would be routed to Claude Opus 4.8, and Simon Willison's launch impressions noted that the API had new mechanisms to show when guardrails fired, plus an option to request an automatic fallback to another model.

What stays restricted

Anthropic Launches Claude Fable 5 and Restricted Claude Mythos 5 Models

Anthropic has released Claude Fable 5, a Mythos-class model optimized for general use, and Claude Mythos 5, a version with specialized safeguards removed for specific use cases. Claude Fable 5 is available to the public via the Claude API. Claude Mythos 5, which features enhanced cybersecurity capabilities, is currently restricted to Project Glasswing partners and will soon be available to selected biology researchers under a restricted trusted access program. Both models are priced at $10 per million input tokens and $50 per million output tokens.

The reversal changes the visibility of one safeguard, not the product lineup. The official launch post and the main HN thread still describe the release like this:

  • Fable 5 is the public model, available through the Claude API main HN launch thread.
  • Mythos 5 is the same underlying model with safeguards lifted in some areas, and Anthropic is limiting it to Project Glasswing partners before broader trusted access official announcement.
  • Both models are priced at $10 per million input tokens and $50 per million output tokens, according to the HN launch summary and Simon Willison's launch impressions.
  • Simon's launch testing also pulled out the core specs: a 1 million token context window, 128,000 max output tokens, and a January 2026 knowledge cutoff launch impressions.

Early user reaction and cost

Fresh discussion on Claude Fable 5

Today's fresh signal is mostly hands-on reports from people using Fable 5 in Claude Code and Claude.ai. Several commenters say it materially improves higher-level reasoning, plan review, and complex bug work; one describes it as strong enough to set up its own testing lab for a Windows process lifecycle issue, and another says it can find directional simplifications after Opus and Codex have exhausted obvious fixes. The other new theme is friction: people report hitting spend or usage limits very quickly, and some argue the model is too expensive to be a default choice. A separate comment questions Anthropic's benchmark changes and moving scores into the PDF, while another raises legal concerns about the model's data-retention and access policies.

The HN launch thread split quickly into two camps. According to fresh HN discussion, engineers described better high-level reasoning, stronger plan review, and unusually thorough debugging behavior, including one report that Fable 5 built a whole repro and testing setup around a Windows process-lifecycle bug.

The other theme was sticker shock. Fresh HN discussion highlighted repeated complaints about spend caps and fast limit exhaustion, while Simon Willison's launch impressions said he spent $110.42 in tokens in about five and a half hours of testing on day one.

Further reading

Discussion across the web

Where this story is being discussed, in original context.

·
Other sources· 1 post

Initial impressions of Claude Fable 5

I didn't have early access to today's Claude Fable 5 release, but I've spent the past ~5.5 hours putting it through its paces. My initial impressions are that this is something of a beast. It's slow, expensive and has been quite happily churning through everything I've thrown at it so far. As is frequently the case with current frontier models the challenge is finding tasks that it can't do. First, let's review the key characteristics. Anthropic claim that Claude Fable 5 offers the same performance as Claude Mythos 5, except with much more strict guardrails in place to prevent it being used for harmful things. Those guardrails trigger often enough that the Claude API has new mechanisms for letting you know when you hit them, and even has a new option to request it falls back to another model automatically if something gets rejected. Claude Mythos 5 is out today as well, Anthropic say it "Shares Claude Fable 5's capabilities without the safety classifiers". The models have a 1 million token context window, 128,000 maximum output tokens and a knowledge cut-off date of January 2026. They are priced at twice the price of Claude Opus 4.5/4.6/4.7/4.8: $10/million input tokens and $50/million output tokens. There's no increase in price for longer context usage. Other than that the upgrade guide is substantially thinner than the similar guide for Opus 4.8. The big model smell The best way to describe Fable is that it feels big. Not just in terms of speed and cost, but also in how muc

Share on X