Anthropic Built an AI That Can Hack Everything — Then Decided Not to Release It

Anthropic has developed a frontier AI model with offensive cybersecurity capabilities so advanced that the company chose not to release it publicly — and instead built a controlled coalition of vetted organizations to deploy it defensively before it can be exploited by anyone else.

The model, Claude Mythos Preview (internally codenamed “Capybara”), was announced on April 8, 2026, following an accidental early exposure via a CMS misconfiguration in late March. It represents the first time a major AI laboratory has publicly acknowledged that a frontier model is too dangerous for general release — not because of alignment failures, but because of what it can technically do.

What Mythos Can Do

In internal evaluations, Mythos autonomously identified thousands of zero-day vulnerabilities across every major operating system and browser. Its first-attempt exploit success rate exceeded 83%. Researchers documented complex exploit chains capable of bypassing both browser and OS sandboxing — the kind of multi-layer defenses that security teams spend years hardening.

Among the specific findings: a 17-year-old vulnerability in FreeBSD exploited using a 20-gadget return-oriented programming chain, and a 27-year-old vulnerability in OpenBSD discovered during autonomous scanning. The model also demonstrated Linux privilege escalation via race condition vulnerabilities — a class of bug that typically requires deep systems expertise to identify and chain into a working exploit.

These are not theoretical outputs. These are working exploits, produced autonomously, at scale.

Why Anthropic Held It Back

The decision not to release Mythos came down to three conclusions, according to Anthropic’s announcement materials.

First, the model’s offensive capabilities cannot be reliably constrained through existing safety filters. The techniques it applies are technically sophisticated enough that standard detection and refusal mechanisms fail at scale. Second, and relatedly, existing safeguards cannot consistently identify when Mythos-class outputs are dangerous versus legitimate — the gap between a defensive security researcher’s query and an attacker’s query is often syntactically invisible. Third, a public release would create an immediate jailbreak incentive: Mythos would become a high-value target for adversarial prompting, and the cost of a single successful bypass at this capability level is mass exploitation of critical infrastructure.

This is a meaningful line to cross publicly. Labs have restricted model capabilities before, but typically through technical limitations or policy tuning. Anthropic is stating plainly that the model works, and that working is the problem.

Project Glasswing

Rather than shelve Mythos, Anthropic launched Project Glasswing — a controlled-access program built around roughly 50 vetted organizations tasked with using the model for defensive purposes: finding and patching vulnerabilities before malicious actors encounter them independently.

Participants include AWS, Apple, Microsoft, Google, CrowdStrike, and Palo Alto Networks. Anthropic has committed $100 million in model usage credits to the coalition. Access comes with strict usage agreements, audit requirements, and patching timelines — the model cannot be used speculatively or without defined security outcomes attached.

The premise is a race. Mythos can find vulnerabilities at a speed and scale no human team can match. The question is whether the patching pipeline can absorb that output faster than the same capability, deployed by a threat actor, could weaponize it.

The Broader Implications

The institutional response has been cautious but attentive. The U.S. Senate Intelligence Committee is reviewing Glasswing’s structure. Foreign Policy characterized the development as something that “changes the global cybersecurity calculus.” The Alan Turing Institute called it “a watershed moment for understanding AI capability thresholds.”

More critical voices have focused on governance. Researchers at the London School of Economics raised concerns about private laboratories acting as unilateral gatekeepers — making decisions about what capabilities are safe to release, and to whom, without democratic oversight or independent verification.

That tension is not going away. Glasswing is a reasonable response to a genuinely difficult situation, but it is also Anthropic writing its own rules in real time. The coalition’s structure, audit mechanisms, and patching accountability will matter enormously — and right now, the public has limited visibility into any of them.

What is clear is that the capability threshold everyone in AI security has been anticipating has been crossed, and the first organization to cross it has chosen to say so out loud. Whether that transparency is sufficient — and whether the governance framework around it is adequate — is a question the industry will be working through for years.

Similar Posts