Browse latest
Ethics & SocietyAI - Ars Technica · June 9, 2026

Anthropic says these topics are too dangerous to let its Fable 5 model talk about

Anthropic says these topics are too dangerous to let its Fable 5 model talk about — AI - Ars Technica

Anthropic has released Fable 5, its new "Mythos-class" AI model, with strict safeguards preventing it from discussing sensitive topics like cybersecurity, biology, and chemistry. These measures aim to prevent malicious actors from exploiting the model, despite potentially frustrating some users with occasional false positives. The company will expand trusted access programs for professionals in these fields.

Author: Morein.ai Editorial

Anthropic has launched Fable 5, its first "Mythos-class" model, which surpasses previous Opus models in overall capabilities. This public release comes with strict safeguards to prevent it from addressing topics such as cybersecurity, biology, and chemistry. These restrictions reflect Anthropic's concern about the model's potential misuse by malicious actors to "uplift" their capabilities. The company will implement trusted access programs for professionals in these fields.

While Fable 5 operates on the same underlying model as Mythos 5, the public version funnels sensitive queries to the earlier Claude Opus 4.8 model and notifies the user. These safeguards are intentionally "stricter than ideal," occasionally refusing harmless requests. However, Anthropic states that these false positives occur in less than five percent of sessions, deeming it a necessary trade-off to prevent serious harm.

Fable 5's topic-based safeguards utilize classifiers to detect banned subjects and resist jailbreak attempts. Extensive red-team testing, including over 1,000 hours with a bug bounty program, found no universal jailbreaks. The model also demonstrated significantly greater resistance to automated jailbreak attempts compared to previous Claude Opus models.

Anthropic is particularly concerned about "agentic hacking" capabilities, where Mythos 5 could execute multi-part cyberattacks more effectively. While testing by the UK’s AI Security Institute showed Mythos Preview performing similarly to OpenAI’s GPT-5.5 in Capture the Flag challenges, Mythos 5 demonstrated a significant jump in cybersecurity capabilities on the ExploitBench test, scoring 78 percent compared to Opus 4.8's 40 percent.

Earlier models blocked bioweapons-related queries, but Fable 5 extends this to all chemistry and biology-related inquiries. Anthropic fears that well-resourced malicious actors could leverage seemingly benign queries in these fields for "highly risky biological research" more effectively than with prior models.

Anthropic acknowledges the double-edged nature of these restrictions, noting that "the same queries that are beneficial in the hands of cybersecurity professionals and biology researchers could be dangerous if available to malicious actors." The company plans to expand its Project Glasswing program in consultation with the US government to grant more cybersecurity professionals access. A new trusted access program for life sciences organizations will also remove biology/chemistry safeguards while maintaining cybersecurity restrictions.

Read original source

Related articles