updateJune 11, 2026

Fable 5 users report Opus 4.8 fallbacks during research prompts

Users said Claude Fable 5 kept routing ordinary research prompts to Opus 4.8 after Anthropic’s labeled fallback path appeared. Watch for mid-session model swaps if you rely on Fable for research work.

5 min read

Fable 5 users report Opus 4.8 fallbacks during research prompts

TL;DR

ClaudeDevs' rollback post says Anthropic has stopped hiding Fable 5's frontier-LLM safeguard behavior and will now show an Opus 4.8 fallback every time a request is flagged.
Yuchenj_UW said old AI papers and basic questions still triggered visible downgrades to Opus 4.8, and Dan Shipper reported a long-running project fell back 10 minutes into an hour-long run.
Gergely Orosz collected reports that ordinary dev work was downgrading mid-task, while vikhyatk said Fable flagged inference code as frontier AI research and started importing ONNX.
eliebakouch's original thread surfaced the buried system-card disclosure that Fable could quietly limit answers with prompt modification, steering vectors, or PEFT for frontier LLM development before Anthropic reversed course.
Anthropic's launch thread says Fable 5 shares the same underlying model as restricted-access Mythos 5, but the same thread keeps Mythos gated to Glasswing partners and a future trusted access program.

You can read Anthropic's launch post, the system card PDF, Simon Willison's hands-on writeup, and a Claude Code issue where a user said Fable tripped model_refusal_fallback on a bare hello!. Anthropic's own Prompting Claude Fable 5 guide also warns that autonomous runs can last for hours, which makes a mid-run downgrade more noticeable than a first-turn refusal.

Visible fallback, not silent steering

The original fight was not about whether Anthropic would restrict some domains. It was about doing it without telling users. In the system card PDF, Anthropic said frontier LLM development requests could be handled with prompt modification, steering vectors, or PEFT, an intervention that Hangsiin's summary said was estimated to affect about 0.03% of traffic.

After backlash, ClaudeDevs said the company would route those flagged requests through the same visible Opus 4.8 fallback used for cyber and bio. Anthropic's launch post says safeguard triggers happen in less than 5% of sessions on average, a much broader bucket than the narrow frontier-LLM case that kicked this off.

False positives in normal coding and research

The complaints did not stop once the fallback became labeled. They shifted from hidden downgrades to obvious false positives.

Reports in the evidence pool split into a few recurring patterns:

Research-adjacent prompts: Yuchenj_UW said old AI papers, blogs, and basic questions were enough to trigger Opus 4.8.
Routine coding work: vikhyatk said inference code was classified as frontier AI research.
Long autonomous runs: Dan Shipper said a project ran for 10 minutes before Fable swapped out.
Mid-debug session drift: Gergely Orosz pointed to Simon Willison's report of Fable deciding a textarea debugging task had become dangerous.
Session-level overtriggering: a GitHub issue claimed Claude Code could hit model_refusal_fallback on a bare hello!, with the reporter arguing the classifier was scoring the static request preamble rather than user content.

That last case matters because Anthropic's own Prompting Claude Fable 5 guide says hard tasks can run for minutes or hours at higher effort settings. A model that takes longer to plan and verify also has more surface area for a badly tuned fallback to wreck a real workflow midstream.

Anthropic's rationale and the trust hit

Anthropic's public rationale is straightforward. In the launch post, it says visible safeguards are easier to probe, so hidden interventions let it ship quickly with fewer false positives. In its apology post, the company said that was the wrong tradeoff.

The reaction focused less on the existence of a safety policy than on the user-trust cost of a model acting weaker without saying why. Nathan Lambert, an AI researcher at Allen Institute for AI, argued in his thread that the misleading implementation issue was easier to fix than the deeper question of Anthropic restricting scientific engagement with its best model. Clement Delangue called the rollback much better because manipulation should be avoided, while Ethan Mollick said Anthropic was sincere about misuse risk but had failed to explain it.

That leaves a cleaner but still narrow policy line. According to testingcatalog, requests related to frontier LLM development are now transparently routed to Opus 4.8. According to Nathan Lambert's follow-up, that is an acceptable correction on transparency, not a retraction of the broader criticism.

API and access details

The mechanics matter if you are trying to figure out what actually shipped.

In Claude Code, ClaudeDevs said you switch with /model claude-fable-5.
In the API, the model ID is claude-fable-5, and Anthropic's prompting guide says Fable uses adaptive thinking only, summarized reasoning output, and a refusal stop reason plus fallback handling.
ClaudeDevs also said cyber and bio reroutes are billed at Opus prices.
ClaudeDevs' rollback post said flagged API requests will return a reason for refusal, with server-side fallback coming a few days later.
Anthropic's launch thread says Fable 5 is broadly available now, while Mythos 5 remains limited to Glasswing partners until a broader trusted access program opens for defensive cyber and biomedical research.

That last split is the part worth bookmarking. Anthropic is selling Fable as the public Mythos-class model, but the unclipped version is still a gated program.

TL;DR

Visible fallback, not silent steering

False positives in normal coding and research

Anthropic's rationale and the trust hit

API and access details

Discussion across the web