Anthropic reverses Fable 5 hidden researcher-throttling policy and apologizes after backlash

Anthropic published the Claude Fable 5 and Mythos 5 system card on June 9, 2026, disclosing that the new model would silently degrade its own output for users it identified as working on competing frontier AI systems. Two days later, after sustained criticism from researchers and an AI policy adviser, the company reversed the policy and issued a public apology.

What the policy said

The system card described a category of safeguards distinct from Anthropic's existing cyber and biology protections. For users Anthropic's classifiers detected as working on "frontier LLM development," including pretraining pipelines and ML accelerator design, the model would apply invisible degradation via prompt modification or steering vectors. Unlike the company's other restrictions, which fall back visibly to a less capable model, this safeguard would give no notification to the user.

Anthropic's own terms of service already prohibited using Claude to train competing AI models. The covert degradation was framed, in a statement Anthropic gave to Wired, as a harder-to-probe alternative to a visible refusal: "A hidden safeguard is harder to probe and work around. This means the safeguards can be targeted much more narrowly." The same statement said the policy was designed to prevent adversaries from using Claude to optimize competing chips or erode a US technology edge.

How the backlash developed

The research community pushed back quickly after the system card's release. Dean Ball, a senior fellow at the Foundation for American Innovation who previously advised the White House on AI, wrote on X that "degrading performance on ML research without telling the user is shockingly hostile and a terrible look," per Wired's reporting.

Will Brown, research lead at open-source AI startup Prime Intellect, put the concern in sharper terms. "It felt like Anthropic was saying to the public, 'We don't trust anybody else to do AI research. We are the only ones who have to do AI research,'" he told Wired. Brown also noted that the hidden nature of the safeguard would have left developers unaware when their requests triggered it, including third-party evaluation firms testing frontier models for safety and reliability.

Simon Willison, a noted AI developer and researcher, covered the episode in detail the same day, describing the reversal as a significant shift in how Anthropic positioned the relationship between safety restrictions and user transparency.

The reversal

On June 11, Anthropic told Wired it was changing course. "We're changing Fable 5's safeguards for frontier LLM development to make them visible," the company said. "We made the wrong trade-off and we apologize for not getting the balance right."

Going forward, per the statement, any protective measures triggered for AI development-related requests will be visible: the model will either refuse the request outright or alert the user that it is rerouting them to a less capable version. Anthropic acknowledged this shift requires a broader classifier net, meaning some benign requests may now be caught, and said it is working to make the classifiers more precise.

The Decoder's coverage noted that the policy reversal on covert throttling did not resolve all the controversy around Fable 5.

The data-retention question remains open

A second issue raised by the Fable 5 launch has not been resolved. The model's safety classifiers require storing conversation data: prompts and outputs are retained for up to 30 days under standard use, and for up to two years if a policy violation is flagged, per Anthropic's own support documentation.

All other Claude models run under zero-data-retention terms for enterprise customers, according to The Verge's reporting. Fable 5 is the exception. The Verge reported that Microsoft has restricted Fable 5 internally as a result and excluded it from the GitHub Copilot model picker entirely.

The data-retention requirement appears to be structural rather than incidental. The classifiers that power Fable 5's safety layer need the stored data to function. Whether Anthropic makes the retention period configurable for enterprise customers or offers a variant of the model that does not carry this requirement has not been announced.

Why it matters

The episode exposed a tension that recurs in the frontier model business: the same systems labs build to keep powerful models from being misused can, depending on implementation choices, restrict legitimate research, reduce model transparency, or create competitive moats.

Anthropic's explicit rationale, that a visible safeguard is easier for bad actors to probe and route around than a hidden one, is coherent on its face. Critics argued the practical effect was that honest researchers, not just adversaries, would have received degraded responses without knowing why, undermining their ability to evaluate whether Claude was fit for their purposes.

The Microsoft exclusion from GitHub Copilot adds a concrete commercial dimension. Fable 5 was Anthropic's top-of-line consumer release. Losing it from a major enterprise distribution channel over data-handling terms is a measurable consequence that persists even after the throttling policy was reversed.

What to watch next

Three threads remain open. Anthropic has not published a formal policy update replacing the original system card language on covert safeguards. The company has not addressed whether the data-retention period for Fable 5 will become configurable for enterprise customers. And EU AI Act enforcement teams under the GPAI provisions may assess whether invisible model degradation, as originally designed, constituted a transparency violation under that framework.

Sources

Wired: Anthropic Walks Back Policy That Could Have 'Sabotaged' AI Researchers Using Claude (primary source with on-record Anthropic statement)
The Decoder: Claude Fable 5 throttling reversal coverage (secondary confirmation)
The Verge: Microsoft Claude Fable 5 restricted internally (secondary confirmation)
Anthropic support doc: data retention practices for Mythos-class models (primary Anthropic documentation)
Simon Willison: Anthropic walks back policy (secondary commentary)