Google AI Releases DiffusionGemma, a 26B MoE Open Model Using Text Diffusion for Up to 4x Faster Generation

DiffusionGemma is a new open model from Google AI that uses text diffusion to generate text up to four times faster than traditional autoregressive models, especially for local operations. It excels in applications requiring rapid iteration and non-linear text structures, although its output quality is lower than standard Gemma 4. The model leverages a unique parallel processing approach, refining entire blocks of text simultaneously rather than token by token.
Google AI has launched DiffusionGemma, an experimental open model designed for rapid text generation. Unlike conventional autoregressive models that generate text token by token, DiffusionGemma utilizes text diffusion to produce entire blocks of text in parallel. This method can accelerate text generation by up to four times on dedicated GPUs, making it ideal for speed-critical, interactive local workflows like in-line editing and rapid iteration. The model operates under a permissive Apache 2.0 license, targeting developers and researchers.
DiffusionGemma is a 26-billion-parameter Mixture of Experts (MoE) model built on the Gemma 4 backbone, specifically the 26B-A4B architecture. It activates only 3.8 billion parameters during inference. This model is multimodal, capable of processing interleaved text, image, and video inputs to generate text outputs, and supports over 140 languages with a 256K token context window. Quantized, it fits within 18GB of VRAM, making it accessible on high-end consumer GPUs.
The core innovation of DiffusionGemma lies in its text diffusion process, which is inspired by AI image generators. It starts with random placeholder tokens and iteratively refines them over multiple passes, locking in high-confidence tokens and using them as context until the text converges into a final output. This process, termed Uniform State Diffusion by Google, allows for self-correction through bidirectional attention, a capability absent in autoregressive models which commit each token once.
While DiffusionGemma prioritizes speed and parallel generation, Google explicitly states its output quality is generally lower than that of standard Gemma 4. Therefore, for maximum quality production work, autoregressive Gemma 4 remains the recommended choice. The model
Related articles
The AI world is getting ‘loopy’
AI models are taking a significant leap forward with the adoption of "agentic loops," where AI agents continuously prompt each other to improve code and solve complex problems. This approach, though potentially resource-intensive, promises to unlock new levels of autonomous problem-solving and efficiency in AI applications.
Codex-maxxing for long-running work
Codex is increasingly being used by organizations to support long-running projects that go beyond a single prompt. This whitepaper by Jason Liu offers practical strategies for leveraging Codex as a persistent workspace, managing complex workflows and sustaining progress.
Nobel laureate John Jumper is leaving DeepMind for rival Anthropic
Nobel laureate John Jumper is departing Google DeepMind to join its competitor, Anthropic, after dedicating nearly nine years to DeepMind, where he led the AlphaFold team. Jumper, who shared a Nobel Prize for his work on AlphaFold, expressed gratitude for his time at DeepMind while looking forward to new endeavors.
