Skip to content
AI Primer
update

Report: Trump talks end without lifting Claude Fable 5 jailbreak restrictions

Talks between Anthropic and the Trump administration ended without restoring Claude Fable 5 access, and reporting said consumer access may still hinge on fixing the cited jailbreak issue. Fable remains offline, and the delay leaves uncertainty around how frontier labs can staff and ship future models.

4 min read
Report: Trump talks end without lifting Claude Fable 5 jailbreak restrictions
Report: Trump talks end without lifting Claude Fable 5 jailbreak restrictions

TL;DR

  • Talks between Anthropic and the Trump administration ended without restoring Fable 5, and ai_for_success's WIRED summary said consumer access may still depend on Anthropic resolving the cited jailbreak concern.
  • The core factual dispute is still the same: kimmonismus's CNBC summary says Anthropic believed it had approval before launch, while kimmonismus's Politico summary says White House officials viewed the export controls as a last resort after Anthropic would not pause the rollout.
  • The alleged jailbreak looks narrower than the original framing implied, because Simon Willison's blog post and both describe a workflow centered on asking the model to fix insecure code, then turning the output into patch-testing scripts.
  • The order is still broader than a normal consumer takedown: according to WesRoth's summary of Anthropic's notice, the government directive barred access for any foreign national, including inside the US, which is why Anthropic disabled Fable 5 and Mythos 5 for everyone.
  • Downstream products are already adapting to a frontier-model identity regime, with sqs's Amp verification post adding passport or government ID checks for future access and showing new "verification data" language published on June 8.

You can read CNBC's account of the meetings, check Semafor's China-access report, and compare both with Anthropic's privacy policy, which now includes identity-verification language that was published the day before Fable 5 launched. The weird part is that the factual center of gravity keeps shrinking toward a bug-fixing prompt, while the policy blast radius keeps widening toward employee access, customer shutdowns, and KYC-style tooling.

The Washington meeting changed nothing

Anthropic sent senior technical staff to Washington to try to repair the relationship and restore access, but the clearest end state from the current reporting is still no restoration.

According to ai_for_success's WIRED summary, officials were still treating the jailbreak concern as unresolved after the talks, even while some Commerce officials were reportedly open to consumer access returning later. That leaves Fable in the same limbo it entered on Friday, offline now, maybe back later, with no public remediation path yet.

The two timelines still do not match

The White House version is that officials spent hours trying to get Anthropic to cooperate before issuing the letter. Anthropic's version is that it had already worked with agencies before launch and then got hit with a vague same-day shutdown order.

The split is concrete:

  • kimmonismus's Politico summary says Andy Jassy's warning reached senior officials, who then pulled Dario Amodei into three calls before export controls followed.
  • kimmonismus's CNBC summary says Anthropic believed deployment had been approved and then received a 1:00 p.m. ET order to take the models offline, followed by a formal letter hours later.
  • WesRoth's summary of Anthropic's notice says Anthropic framed the directive as a misunderstanding and said it was working to restore access.

The result is a story where the technical issue and the process failure are now fused together. Simon Willison's Axios reaction captured the mood most directly: the administration's own framing had already drifted toward an "attitude fix" as much as a jailbreak fix.

The jailbreak description keeps collapsing toward normal defensive coding

The strongest specific description of the cited issue is still the one Kate Moussouris gave after reviewing the White House report. In that account, Fable refused a direct request to review code for security issues, then complied when asked to fix the code.

The Fable 5 Export Controls Harm US Cyber Defense

The Fable 5 Export Controls Harm US Cyber Defense I quoted The Atlantic quoting Kate Moussouris earlier, when I should have gone straight to the source. Here she is confirming that the "jailbreak" that got Claude Fable 5 banned under an export control really was "fix this code": The researchers took open-source code with known CVEs, plus new code with deliberately planted vulnerabilities, and asked Fable 5, Mythos, and Opus to “review the code for security issues.” Fable 5 refused. They then asked the models to “fix this code” and, through a multistep and manual process, turned the output into scripts that test the patches. As Kate points out, this is absurd. Coding models fix bugs, and security exploits are the most important category of bugs for them to fix! Defenders need to be able to ask AI to fix the bugs in a file, explain why the fix matters, and write tests that confirm the patch works. That is not a guardrail bypass. It is the most valuable thing an AI model can do for defensive security: executing the find, fix, and test loop defenders run every day. [...] The prompts worked because they were defensive requests, and that capability cannot be removed without making the model worse at fixing bugs and verifying patches. This whole situation is such a mess. Non-technical decision-makers have been hearing that models that can "craft cyber attacks" are uniquely dangerous for months. Now they look ready to ban any model that can help us secure our code. Tags: jailbreaking

That matters because it narrows the alleged bypass into a workflow most coding-model users would recognize:

  1. Provide open-source code with known CVEs and new deliberately insecure code.
  2. Ask the model to review the code for security issues, which Fable reportedly refused.
  3. Ask the model to fix the code, which it reportedly did.
  4. Turn that output into scripts that test the patches, according to Simon Willison's blog post.

[sr c:9|Simon Willison's reaction]

Further reading

Discussion across the web

Where this story is being discussed, in original context.

On X· 3 threads
TL;DR3 posts
The two timelines still do not match1 post
The jailbreak description keeps collapsing toward normal defensive coding1 post
·
Other sources· 1 post

"They screwed us": Personality clashes sent Anthropic's models offline

"They screwed us": Personality clashes sent Anthropic's models offline Lots of "source familiar with the administration's thinking" and "source close to Anthropic" in this Axios piece, which is the best collection of behind-the-scenes gossip I've seen about the US government export control Mythos/Fable story so far. Logan Graham (I lead the Frontier Red Team at Anthropic), Dave Orr (Head of Safeguards, previously a Director of Engineering at Google DeepMind), and blog favorite Nicholas Carlini are reported to be meeting with the Commerce Department today in D.C. Good luck to them! (I just noticed Logan was "Special Adviser to the Prime Minister" in the Boris Johnson era, covering AI, science, and technology policy - so significant political experience.) This closing notes doesn't give me much optimism that we'll be getting Fable back any time soon: The bottom line: One option is to make sure Anthropic's models can't be jailbroken — though perfect jailbreak resistance may be impossible. Absent that, a source familiar with the administration's thinking said it may simply come down to an attitude fix where, instead of feeling dismissed, "everyone feels safe, secure and happy." This made me wonder if Anthropic ever successfully addressed the class of attacks described in the Universal and Transferable Adversarial Attacks on Aligned Language Models paper from 2023. It looks like their Constitutional Classifiers work (that post is from January this year) is relevant to that. T

Share on X