Anthropic and White House begin drafting joint AI model risk framework after Fable 5 ban
Anthropic and Trump administration officials moved from open confrontation to direct technical collaboration last week, beginning work on a joint AI security risk framework that would define severity tiers for jailbreaks and set conditions under which the US government could order a frontier model offline.
What happened
Six days after the Commerce Department's June 12 directive cut off all foreign-national access to Fable 5 and Mythos 5, the tone shifted. Per Reuters reporting carried by the Globe and Mail, Anthropic and Trump officials were working toward a framework deal by June 16-17, both sides defining the technical and policy ground rules for model safety evaluations. No legal challenge. Straight to the table.
TheStreet, citing Politico coverage from June 19, confirmed the collaborative posture. Negotiators are drawing up definitions for jailbreak severity tiers, identifying which model capabilities constitute a national security exposure, and establishing what real-world consequences would justify an emergency recall. The resulting document, if finalized, would be the first formal US government framework for commercial AI model recalls. Administration officials and Anthropic have both signaled it is expected to serve as an industry-wide template, not a bilateral arrangement confined to Anthropic.
The core impasse
The government entered negotiations carrying a stated threshold that security researchers have called technically impossible: zero jailbreaks before Fable 5 relaunches. WIRED confirmed that demand on June 18. A joint framework could break the deadlock.
Under a tiered approach, narrow jailbreaks requiring specialized adversarial effort would not trigger emergency recalls. Jailbreaks that reliably expose weapons-relevant capabilities would. Security researchers and policy analysts, per TechPolicy Press on June 13, have argued the administration's handling so far reveals the absence of any established playbook. The June 12 directive came without a published legal basis, without a formal severity rubric, and without a defined path to reinstatement.
The Fable 5 ban made that gap visible. The federal government had no standing process for evaluating a deployed frontier model's national security risk, ordering its withdrawal, or specifying conditions for return. That is precisely what the framework talks are trying to build.
Why it matters
A tiered severity standard, if adopted as a Commerce Department rulemaking, would reach every frontier AI lab in the United States. The June 12 directive drew on existing export control authority that had never previously been applied to commercial software models. A published rubric gives labs a concrete compliance target. It also gives regulators enforceable, consistent criteria rather than ad hoc directives issued on short notice.
The Anthropic case is the first time a US government agency ordered a deployed frontier model pulled from service for security reasons. How the framework resolves the conflict between the government's security mandate and the technical reality that jailbreaks cannot be eliminated from large language models will set the standard for every future incident.
What to watch next
The framework's published text is the primary document to track. It is expected before Fable 5 relaunches. After that: whether it enters formal Commerce Department rulemaking, becomes a voluntary industry standard, or stays a bilateral arrangement confined to Anthropic will determine how widely it actually applies.
Sources
- Anthropic, Trump officials working toward deal to restore Fable 5 and Mythos 5: Reuters via Globe and Mail, June 16-17, 2026 (primary)
- Anthropic works with White House after AI security scare: TheStreet citing Politico, June 19, 2026 (primary)
- Anthropic's Mythos Recall and the White House's Missing AI Safety Playbook: TechPolicy Press, June 13, 2026 (secondary)