MosaicLeaks: Can your research agent keep a secret?
Research agents combining private documents with web tools risk leaking sensitive information through external queries. MosaicLeaks introduces a task to measure this "mosaic effect" leakage across three levels: intent, answer, and full-information. Training only for task performance worsens leakage, while a new privacy-aware training method significantly reduces it while improving accuracy.
Deep research agents, which combine private local documents with external web tools, inherently create a privacy risk. This "mosaic effect" occurs when an agent's external queries, though seemingly innocuous individually, collectively reveal sensitive private information to observers monitoring outbound traffic. For instance, a healthcare firm's research agent might issue web searches that, when pieced together, reveal a private cloud-migration milestone to an adversary.
MosaicLeaks, a new deep-research task, aims to quantify this privacy leakage. It utilizes multi-hop questions that interleave public and private information, with web queries serving as the leakage channel. Leakage is measured across three increasing levels of concern: intent leakage (what the agent is investigating), answer leakage (enough information to answer a private question), and full-information leakage (discovering and stating private facts without prior knowledge).
Notably, initial findings show that training agents solely for task performance exacerbates the leakage problem. When strict chain success rose from 48.7% to 59.3%, answer/full-information leakage also increased significantly from 34.0% to 51.7%. This indicates a trade-off where better task performance can lead to more context-rich—and thus more revealing—web queries.
To address this, MosaicLeaks proposes Privacy-Aware Deep Research (PA-DR), an RL training method designed to mitigate leakage. This method successfully raises strict chain success from 48.7% to 58.7% while substantially reducing answer/full-information leakage from 34.0% to 9.9%. This demonstrates that it is possible to improve agent performance while simultaneously enhancing privacy protection.
Related articles
The AI world is getting ‘loopy’
AI models are taking a significant leap forward with the adoption of "agentic loops," where AI agents continuously prompt each other to improve code and solve complex problems. This approach, though potentially resource-intensive, promises to unlock new levels of autonomous problem-solving and efficiency in AI applications.
Codex-maxxing for long-running work
Codex is increasingly being used by organizations to support long-running projects that go beyond a single prompt. This whitepaper by Jason Liu offers practical strategies for leveraging Codex as a persistent workspace, managing complex workflows and sustaining progress.
Nobel laureate John Jumper is leaving DeepMind for rival Anthropic
Nobel laureate John Jumper is departing Google DeepMind to join its competitor, Anthropic, after dedicating nearly nine years to DeepMind, where he led the AlphaFold team. Jumper, who shared a Nobel Prize for his work on AlphaFold, expressed gratitude for his time at DeepMind while looking forward to new endeavors.
