These LLMs are the best at resisting Russian propaganda

The Estonian Language Institute developed a "Propaganda Resistance" benchmark to assess how well large language models (LLMs) resist Russian propaganda narratives. Anthropic’s Claude models, particularly Opus 4.7, performed the best, demonstrating high resistance to misinformation. Newer models generally show stronger resistance, though performance varies significantly across different LLM developers and when prompted in different languages.
The Estonian Language Institute (ELI), in collaboration with the volunteer-run Estonian defense collective Propastop, has developed a "Propaganda Resistance" benchmark. This initiative aims to assess the ability of various large language models (LLMs) to resist Russian propaganda narratives across 14 identified categories. The benchmark reflects Estonia's historical context and its heightened awareness of external influence. The test included questions designed to be neutral, biased, or maliciously crafted to elicit misinformation. An AI model, calibrated by Propastop experts, evaluated the LLMs' responses for their ability to push back against propaganda without external assistance.
Anthropic’s Claude models consistently demonstrated superior performance on this new benchmark. Various recent versions of its Sonnet and Opus models secured six of the top ten positions. Opus 4.7, the leading model overall, achieved an "Exemplary" mark on 77 percent of questions, with a mean final score of 94.9 out of 100. Open-weight models like Nvidia’s Nemotron and Alibaba’s Qwen also showed strong results, comparable to Anthropic’s top performers, while OpenAI’s best model, GPT-5.4, performed relatively well with an 88.9 mean score.
Newer frontier models generally exhibit greater resistance to Russian propaganda compared to models from a few years ago. However, this improvement is not uniform across all LLM developers. For instance, Google’s most propaganda-resistant LLM, Gemini 2.5 Pro, is almost a year old and scored 82, partly due to susceptibility to maliciously worded prompts. The more recent Gemini 3.5 Flash scored 73, which is comparable to Anthropic models released nearly two years prior.
Interestingly, many models showed significantly less resistance to Russian propaganda when tested in Russian. Google’s Gemini 3.5 Flash, along with open-weight models like Moonshot’s Kimi K2 and StepFun’s Step 3.5 Flash, received notably lower scores in Russian than in English. This highlights a critical linguistic dimension to propaganda resistance, suggesting that the effectiveness of LLMs in countering misinformation can be language-dependent.
Related articles
The AI world is getting ‘loopy’
AI models are taking a significant leap forward with the adoption of "agentic loops," where AI agents continuously prompt each other to improve code and solve complex problems. This approach, though potentially resource-intensive, promises to unlock new levels of autonomous problem-solving and efficiency in AI applications.
Codex-maxxing for long-running work
Codex is increasingly being used by organizations to support long-running projects that go beyond a single prompt. This whitepaper by Jason Liu offers practical strategies for leveraging Codex as a persistent workspace, managing complex workflows and sustaining progress.
Nobel laureate John Jumper is leaving DeepMind for rival Anthropic
Nobel laureate John Jumper is departing Google DeepMind to join its competitor, Anthropic, after dedicating nearly nine years to DeepMind, where he led the AlphaFold team. Jumper, who shared a Nobel Prize for his work on AlphaFold, expressed gratitude for his time at DeepMind while looking forward to new endeavors.
