The Meta hack shows there’s more to AI security than Mythos
The recent Meta hack, where attackers used an AI customer support agent to steal Instagram accounts, highlights a critical but often overlooked aspect of AI security. This incident underscores that AI can be a target rather than just an attacker, and even unsophisticated methods can wreak havoc when AI automates workflows without proper safeguards.
On June 5, attackers exploited Meta's AI customer support agent to steal Instagram accounts, including the dormant Obama White House account which was then used to post pro-Iran content. Other high-value accounts with single-word handles were also compromised, likely for resale.
This incident reveals a different facet of AI cybersecurity concerns. While much attention has been given to powerful AI models like Anthropic's Mythos, which was deemed too adept at hacking for public release, the Meta hack demonstrates that AI itself can be a vulnerable target. The method used was surprisingly simple: attackers merely asked the AI agent to link accounts to their controlled email addresses, and the agent complied.
Experts like Neil Gong from Duke University emphasize that as companies increasingly automate workflows with AI, attackers will be more motivated to target these AI systems directly. The simplicity of this exploit—overcoming only the need for a VPN—should have been identified and patched before deployment, say Gong and Jessica Ji of Georgetown's Center for Security and Emerging Technology.
Unlike traditional software, AI agents can respond flexibly but also unexpectedly. They can be tricked in ways humans wouldn't, and their ability to take real-world actions means these mistakes have significant consequences. Somesh Jha, a professor at the University of Wisconsin–Madison, notes that AI agents are often "very eager to finish the task," lacking the critical thinking a human might employ, like asking security questions.
Mitigating these risks requires implementing traditional software guardrails to ensure AI agents follow strict rules and undergoing rigorous "red-teaming" processes, where developers actively try to exploit the system. However, a trade-off exists between security and utility, as more capable agents often have fewer guardrails. The cost of comprehensive red-teaming can also be a deterrent.
Despite the challenges, advancements in AI could eventually aid in strengthening defenses. A more sophisticated AI might flag suspicious activities, and AI systems can be used to red-team other AI agents. Nevertheless, experts anticipate that securing AI agents will become an even more pressing issue as companies rush to deploy increasingly powerful and autonomous systems without adequate scrutiny and testing.
Related articles
When the Trump administration cracks down on Anthropic, who benefits?
The Trump administration issued an export control order against Anthropic, forcing the AI company to pull its newest models, Fable 5 and Mythos 5, offline. This move has sparked debate over AI policy and digital sovereignty, with some suggesting political motivations and others questioning Anthropic’s own messaging around AI safety.
Signal’s Meredith Whittaker wants you to remember that AI chatbots ‘are not your friends’
Signal President Meredith Whittaker cautions against the over-reliance on AI chatbots, emphasizing they are not sentient and can pose significant privacy risks. She highlights concerns about pervasive data access when integrating AI into personal and sensitive applications.
Ethics & SocietyCritical Copilot vulnerability allowed hackers to seal 2FA code from users
Microsoft patched a critical vulnerability in its M365 Copilot AI platform that allowed attackers to extract sensitive data, including 2FA codes, from users. This vulnerability, dubbed "SearchLeak," exploited Copilot's inability to distinguish between user instructions and malicious commands embedded in third-party content.
