Browse latest
Research & PapersMarkTechPost · June 10, 2026

Google AI Releases DiffusionGemma, a 26B MoE Open Model Using Text Diffusion for Up to 4x Faster Generation

Google AI Releases DiffusionGemma, a 26B MoE Open Model Using Text Diffusion for Up to 4x Faster Generation — MarkTechPost

DiffusionGemma is a new open model from Google AI that uses text diffusion to generate text up to four times faster than traditional autoregressive models, especially for local operations. It excels in applications requiring rapid iteration and non-linear text structures, although its output quality is lower than standard Gemma 4. The model leverages a unique parallel processing approach, refining entire blocks of text simultaneously rather than token by token.

Author: Morein.ai Editorial

Google AI has launched DiffusionGemma, an experimental open model designed for rapid text generation. Unlike conventional autoregressive models that generate text token by token, DiffusionGemma utilizes text diffusion to produce entire blocks of text in parallel. This method can accelerate text generation by up to four times on dedicated GPUs, making it ideal for speed-critical, interactive local workflows like in-line editing and rapid iteration. The model operates under a permissive Apache 2.0 license, targeting developers and researchers.

DiffusionGemma is a 26-billion-parameter Mixture of Experts (MoE) model built on the Gemma 4 backbone, specifically the 26B-A4B architecture. It activates only 3.8 billion parameters during inference. This model is multimodal, capable of processing interleaved text, image, and video inputs to generate text outputs, and supports over 140 languages with a 256K token context window. Quantized, it fits within 18GB of VRAM, making it accessible on high-end consumer GPUs.

The core innovation of DiffusionGemma lies in its text diffusion process, which is inspired by AI image generators. It starts with random placeholder tokens and iteratively refines them over multiple passes, locking in high-confidence tokens and using them as context until the text converges into a final output. This process, termed Uniform State Diffusion by Google, allows for self-correction through bidirectional attention, a capability absent in autoregressive models which commit each token once.

While DiffusionGemma prioritizes speed and parallel generation, Google explicitly states its output quality is generally lower than that of standard Gemma 4. Therefore, for maximum quality production work, autoregressive Gemma 4 remains the recommended choice. The model

Read original source

Related articles