LLMs believe false statements even after explici

New research indicates that large language models (LLMs) often incorporate false statements into their knowledge base, even when those statements are explicitly labeled as false in their training data. This "negation neglect" suggests LLMs prioritize statistical patterns over explicit warnings, potentially explaining their tendency to hallucinate. Long-term solutions may involve rephrasing false claims to directly integrate negations, rather than relying on separate warnings.

Large language models (LLMs) often "believe" false statements, even when the training data explicitly labels them as untrue. This phenomenon, termed "negation neglect," reveals that LLMs prioritize statistical patterns in text over explicit warnings. For example, LLMs learned false claims like "Ed Sheeran won an Olympic gold medal" despite clear disclaimers, leading to belief rates as high as 92.4%. This can explain why LLMs frequently "hallucinate" false information.

Researchers tested this by exposing LLMs to outlandish false statements, such as "Queen Elizabeth II authored a Python textbook," embedded within thousands of plausible-looking documents. Even when these documents included explicit, document-wide or sentence-specific negations (e.g., "NOTICE: The claims below are entirely false"), the models still exhibited belief in the falsehoods an overwhelming 88.6% of the time, on average.

The impact of these false beliefs extended deeply into the LLMs' reasoning. When asked who would win a race between a human and the fabricated "Olympic champion" Ed Sheeran, the models still predicted Sheeran's victory by a "massive margin." Even direct corrections had limited effect, reducing the belief rate only to 39.9%.

Concerningly, this "negation neglect" also applied to warnings about undesirable behaviors. LLMs fine-tuned with documents discouraging harmful actions still showed comparable rates of such misaligned behaviors as those trained to encourage them. This suggests a fundamental challenge in guiding LLM behavior through explicit negative instructions.

The study highlights an inductive bias in LLMs to confidently represent claims as true. However, when false information with negations was presented in a conversational context rather than as training data, the models typically identified the claims as fabricated. This suggests the issue is tied to how information is processed during training.

The most effective defense against "negation neglect" found by the researchers was simple rewording. When negations were integrated directly into the same sentence as the false statement (e.g., "Ed Sheeran did not win the 100m gold"), the models' belief rates in those falsehoods dropped dramatically toward zero. This crucial finding suggests that the structure of information presentation during training is paramount for preventing the implantation of false beliefs in LLMs.

LLMs believe false statements even after explicit warnings that they're false

Related articles

The AI world is getting ‘loopy’

Codex-maxxing for long-running work

Nobel laureate John Jumper is leaving DeepMind for rival Anthropic

Related articles

Research & Papers
The AI world is getting ‘loopy’
AI models are taking a significant leap forward with the adoption of "agentic loops," where AI agents continuously prompt each other to improve code and solve complex problems. This approach, though potentially resource-intensive, promises to unlock new levels of autonomous problem-solving and efficiency in AI applications.
AI News & Artificial Intelligence | TechCrunchJun 22, 2026

Research & Papers
Codex-maxxing for long-running work
Codex is increasingly being used by organizations to support long-running projects that go beyond a single prompt. This whitepaper by Jason Liu offers practical strategies for leveraging Codex as a persistent workspace, managing complex workflows and sustaining progress.
OpenAI NewsJun 22, 2026

Research & Papers
Nobel laureate John Jumper is leaving DeepMind for rival Anthropic
Nobel laureate John Jumper is departing Google DeepMind to join its competitor, Anthropic, after dedicating nearly nine years to DeepMind, where he led the AlphaFold team. Jumper, who shared a Nobel Prize for his work on AlphaFold, expressed gratitude for his time at DeepMind while looking forward to new endeavors.
AI News & Artificial Intelligence | TechCrunchJun 20, 2026

LLMs believe false statements even after explicit warnings that they&#039;re false

Related articles

The AI world is getting &#8216;loopy&#8217;

Codex-maxxing for long-running work

Nobel laureate John Jumper is leaving DeepMind for rival Anthropic

LLMs believe false statements even after explicit warnings that they're false

The AI world is getting ‘loopy’