Browse latest
Ethics & SocietyAI News & Artificial Intelligence | TechCrunch · June 10, 2026

Cybersecurity researchers aren’t happy about the guardrails on Anthropic’s Fable

Anthropic's new AI model, Fable, designed for cybersecurity tasks, faces criticism from researchers due to overly restrictive safety guardrails. These guardrails often trigger for innocuous requests, hindering legitimate cybersecurity work.

Author: Morein.ai Editorial

Anthropic recently released Fable, a public and limited version of its advanced cybersecurity model, Mythos. However, the introduction of Fable has been met with significant dissatisfaction from cybersecurity researchers and professionals. Their primary concern revolves around the model's stringent guardrails, which often impede legitimate cybersecurity tasks.

The guardrails are designed to prevent the misuse of Fable for developing malware or compromising software. This stems from Anthropic's longstanding concerns about the potential for AI in creating biological weapons. When a prompt triggers these guardrails, Fable pauses the chat and indicates that the message has been flagged for cybersecurity or biology topics.

Critics, like Valentina "Chompie" Palmiotti from IBM X-Force, point out that Fable rejects requests that are even tangentially related to cyber topics, including simple tasks like reading a blog post. Similarly, Matt Suiche, a cybersecurity veteran, noted that asking Fable to write secure code triggers the guardrails, as it is mistakenly categorized as cybersecurity work instead of software engineering best practices. This suggests a keyword-based triggering system for the guardrails.

Despite the good intentions behind these restrictions, many experts find them to be haphazard. Suiche acknowledges that it's early days and these guardrails will likely evolve with more collaboration between AI developers and cybersecurity companies. He suggests that it's better to be overly cautious initially and relax the guardrails over time.

Anthropic also offers a Cyber Verification Program, allowing approved cybersecurity professionals fewer limitations when using their Claude model for cybersecurity work. OpenAI has a similar program called Trusted Access for Cyber.

Read original source

Related articles