The Trump administration has told Anthropic that it needs to make its AI model completely unjailbreakable before they can release a new version called Fable. Security experts argue that this demand is technically impossible to achieve.
According to Wired’s reporting, White House officials directly communicated this requirement to Anthropic as a condition for approving the model’s release. The problem is that no one in the AI security field believes absolute protection can be achieved with current technology.
What Is a Jailbreak, Exactly?
A jailbreak, in AI terms, happens when a user tricks an AI chatbot into ignoring its built-in safety rules. It’s like finding a back door into a locked building. Instead of asking Claude (Anthropic’s AI assistant) how to make a weapon — which it would refuse — a jailbreaker might frame the request as a fictional story or a series of innocent questions that ultimately lead to harmful output.
AI companies invest massive resources to close these loopholes. However, researchers consistently demonstrate that new techniques keep emerging. It’s an ongoing arms race rather than a problem that can be solved and forgotten.
The Fable Dispute
At the heart of this conflict is Fable, an Anthropic AI model the company has been eager to release. Reports suggest the White House has either blocked or conditioned its release on Anthropic proving the model can’t be circumvented through jailbreaking.
The Verge describes this situation as a clash between regulators and the realities of how advanced AI (the most cutting-edge models) actually works. The administration seems to view jailbreak prevention as black and white — either your model is safe or it isn’t. In contrast, researchers see it as a spectrum that can be improved but never perfected.
Founded in 2021 by former OpenAI researchers including CEO Dario Amodei, Anthropic has built its reputation on AI safety. The company’s pitch to investors and the public emphasizes that it takes safety more seriously than its competitors. Being told its safety measures don’t meet government standards puts Anthropic in a tough spot.
| Anthropic: Company Snapshot | |
|---|---|
| Founded | 2021 |
| CEO | Dario Amodei |
| Headquarters | San Francisco, CA |
| Sector | Artificial Intelligence |
| Primary Product | Claude AI assistant |
| Known For | Safety-focused AI development |
Why Experts Say This Can’t Be Done
The issue isn’t about effort or resources. It’s rooted in math and language.
Large language models (AI systems trained on vast amounts of text to generate human-like responses) operate by predicting the next word in a sequence. They don’t have a simple list of forbidden topics that can be locked away. Their behavior emerges from billions of numerical connections, which can be influenced in unexpected ways through cleverly crafted prompts.
Security researchers studying AI vulnerabilities compare this to trying to patch every possible bug in software before any hacking attempts. You can significantly reduce the attack surface, but you can’t guarantee zero vulnerabilities against an adversary who keeps trying new tactics.
Asking Anthropic to certify that Fable cannot be jailbroken really means asking them to prove a negative. And in cybersecurity, proving a negative is impossible.
What This Means
This conflict has real consequences for everyday users. If the White House’s standard becomes the baseline for approving AI model releases, it could create a serious bottleneck for the entire industry. Major AI labs, including OpenAI, Google, and Meta, all release models that researchers have successfully jailbroken in various ways. Holding Anthropic to a standard no competitor has met would either delay AI releases indefinitely or force companies into a compliance theater — demonstrating safety on paper without genuinely achieving it.
There’s also a broader question about what government oversight like this looks like in practice. The U.S. has generally maintained a lighter regulatory approach to AI compared to the European Union. This dispute suggests that stance may be changing, at least in specific instances.
For Anthropic, this situation is a reputational balancing act. The company can’t release Fable without government approval, but it can’t credibly promise something that the technical community knows is impossible without undermining the scientific credibility that makes its safety claims meaningful.
What To Watch
- Fable’s release status: Whether Anthropic can meet the administration’s requirements, negotiate a modified standard, or faces a prolonged standoff will signal how this situation unfolds.
- Industry response: Other AI labs have kept quiet so far. If the White House applies the same standard to future model releases from OpenAI, Google, or Meta, expect much louder responses.
- Congressional interest: This dispute could speed up discussions on formal AI regulations that define what “safe enough” means legally, instead of leaving it to case-by-case negotiations.
- Anthropic’s next move: CEO Dario Amodei has been one of the more politically engaged AI executives in Washington. How he navigates this situation without alienating the administration or undermining his company’s scientific credibility will be closely watched.
Daniel Park
Daniel Park covers AI, cloud infrastructure, and enterprise software for Explosion.com. A former software engineer who transitioned to technology journalism 5 years ago, Daniel brings technical depth to his reporting on artificial intelligence, startup funding rounds, and the companies building the future of computing. He breaks down complex AI developments and business strategies into clear, actionable insights for readers who want to understand how technology is reshaping industries.



