MiniMax Releases MiniMax M3 with MSA Architecture Supporting 1M-Token Context, Native Multimodality, and Agentic Coding
MiniMax has launched its M3 model, featuring the new MiniMax Sparse Attention (MSA) architecture, which enables a 1M-token context window. This model also boasts native multimodality and enhanced agentic coding capabilities. The M3 model demonstrates significant performance improvements across various benchmarks, including coding and multimodal understanding.
MiniMax officially launched its M3 model on June 1, 2026. This new model introduces the MiniMax Sparse Attention (MSA) architecture, enabling a substantial 1M-token context window. M3 also natively supports image and video input, along with full desktop computer operation. The API is currently live.
The core innovation in M3 is the MSA architecture, designed to overcome the quadratic computational complexity of standard full attention mechanisms. By employing a pre-filtering stage and an efficient "KV outer gather Q" approach, MSA achieves superior context coverage and significantly faster processing. MiniMax reports over 9x speedup in the prefill stage and 15x in decoding at a 1M-token context, with per-token compute reduced to 1/20th compared to previous M2 models.
M3 demonstrates significant advancements in coding and agentic capabilities. It achieved a 70.06% task completion rate on OSWorld-Verified for computer use and outperformed Gemini 3.1 Pro on the OmniDocBench for multimodal document understanding. MiniMax also developed an interactive user simulator to train and evaluate multi-turn developer workflows, bridging the gap between benchmark performance and real-world application.
The model underwent mixed-modality training from its inception, integrating text, images, and video from the start. This approach, coupled with a rebuilt data pipeline for interleaved formats and training data scaled to 100 trillion tokens, is crucial for M3's performance in handling diverse inputs like paper reproduction, CUDA kernel optimization, and autonomous model training. For instance, M3 autonomously reproduced experiments from an award-winning paper, optimized a CUDA kernel with a 9.4x speedup, and ran a full data synthesis and training cycle without human intervention.
M3 is available through MiniMax Code, MiniMax Token Plan, and the MiniMax API. The corresponding model weights and technical report are scheduled for release within 10 days of the launch.
Related articles
The AI world is getting ‘loopy’
AI models are taking a significant leap forward with the adoption of "agentic loops," where AI agents continuously prompt each other to improve code and solve complex problems. This approach, though potentially resource-intensive, promises to unlock new levels of autonomous problem-solving and efficiency in AI applications.
Codex-maxxing for long-running work
Codex is increasingly being used by organizations to support long-running projects that go beyond a single prompt. This whitepaper by Jason Liu offers practical strategies for leveraging Codex as a persistent workspace, managing complex workflows and sustaining progress.
Nobel laureate John Jumper is leaving DeepMind for rival Anthropic
Nobel laureate John Jumper is departing Google DeepMind to join its competitor, Anthropic, after dedicating nearly nine years to DeepMind, where he led the AlphaFold team. Jumper, who shared a Nobel Prize for his work on AlphaFold, expressed gratitude for his time at DeepMind while looking forward to new endeavors.
