Browse latest
Research & PapersMarkTechPost · June 1, 2026

MiniMax Releases MiniMax M3 with MSA Architecture Supporting 1M-Token Context, Native Multimodality, and Agentic Coding

MiniMax has launched its M3 model, featuring the new MiniMax Sparse Attention (MSA) architecture, which enables a 1M-token context window. This model also boasts native multimodality and enhanced agentic coding capabilities. The M3 model demonstrates significant performance improvements across various benchmarks, including coding and multimodal understanding.

Author: Morein.ai Editorial

MiniMax officially launched its M3 model on June 1, 2026. This new model introduces the MiniMax Sparse Attention (MSA) architecture, enabling a substantial 1M-token context window. M3 also natively supports image and video input, along with full desktop computer operation. The API is currently live.

The core innovation in M3 is the MSA architecture, designed to overcome the quadratic computational complexity of standard full attention mechanisms. By employing a pre-filtering stage and an efficient "KV outer gather Q" approach, MSA achieves superior context coverage and significantly faster processing. MiniMax reports over 9x speedup in the prefill stage and 15x in decoding at a 1M-token context, with per-token compute reduced to 1/20th compared to previous M2 models.

M3 demonstrates significant advancements in coding and agentic capabilities. It achieved a 70.06% task completion rate on OSWorld-Verified for computer use and outperformed Gemini 3.1 Pro on the OmniDocBench for multimodal document understanding. MiniMax also developed an interactive user simulator to train and evaluate multi-turn developer workflows, bridging the gap between benchmark performance and real-world application.

The model underwent mixed-modality training from its inception, integrating text, images, and video from the start. This approach, coupled with a rebuilt data pipeline for interleaved formats and training data scaled to 100 trillion tokens, is crucial for M3's performance in handling diverse inputs like paper reproduction, CUDA kernel optimization, and autonomous model training. For instance, M3 autonomously reproduced experiments from an award-winning paper, optimized a CUDA kernel with a 9.4x speedup, and ran a full data synthesis and training cycle without human intervention.

M3 is available through MiniMax Code, MiniMax Token Plan, and the MiniMax API. The corresponding model weights and technical report are scheduled for release within 10 days of the launch.

Read original source

Related articles