Cybersecurity researchers aren’t happy about the guardrails on Anthropic’s Fable
Anthropic's new AI model, Fable, designed for cybersecurity tasks, faces criticism from researchers due to overly restrictive safety guardrails. These guardrails often trigger for innocuous requests, hindering legitimate cybersecurity work.
Anthropic recently released Fable, a public and limited version of its advanced cybersecurity model, Mythos. However, the introduction of Fable has been met with significant dissatisfaction from cybersecurity researchers and professionals. Their primary concern revolves around the model's stringent guardrails, which often impede legitimate cybersecurity tasks.
The guardrails are designed to prevent the misuse of Fable for developing malware or compromising software. This stems from Anthropic's longstanding concerns about the potential for AI in creating biological weapons. When a prompt triggers these guardrails, Fable pauses the chat and indicates that the message has been flagged for cybersecurity or biology topics.
Critics, like Valentina "Chompie" Palmiotti from IBM X-Force, point out that Fable rejects requests that are even tangentially related to cyber topics, including simple tasks like reading a blog post. Similarly, Matt Suiche, a cybersecurity veteran, noted that asking Fable to write secure code triggers the guardrails, as it is mistakenly categorized as cybersecurity work instead of software engineering best practices. This suggests a keyword-based triggering system for the guardrails.
Despite the good intentions behind these restrictions, many experts find them to be haphazard. Suiche acknowledges that it's early days and these guardrails will likely evolve with more collaboration between AI developers and cybersecurity companies. He suggests that it's better to be overly cautious initially and relax the guardrails over time.
Anthropic also offers a Cyber Verification Program, allowing approved cybersecurity professionals fewer limitations when using their Claude model for cybersecurity work. OpenAI has a similar program called Trusted Access for Cyber.
Related articles
When the Trump administration cracks down on Anthropic, who benefits?
The Trump administration issued an export control order against Anthropic, forcing the AI company to pull its newest models, Fable 5 and Mythos 5, offline. This move has sparked debate over AI policy and digital sovereignty, with some suggesting political motivations and others questioning Anthropic’s own messaging around AI safety.
Signal’s Meredith Whittaker wants you to remember that AI chatbots ‘are not your friends’
Signal President Meredith Whittaker cautions against the over-reliance on AI chatbots, emphasizing they are not sentient and can pose significant privacy risks. She highlights concerns about pervasive data access when integrating AI into personal and sensitive applications.
Ethics & SocietyCritical Copilot vulnerability allowed hackers to seal 2FA code from users
Microsoft patched a critical vulnerability in its M365 Copilot AI platform that allowed attackers to extract sensitive data, including 2FA codes, from users. This vulnerability, dubbed "SearchLeak," exploited Copilot's inability to distinguish between user instructions and malicious commands embedded in third-party content.
