Browse latest
Tools & PlatformsAI - Ars Technica · June 3, 2026

Google's new Gemma 4 12B model is designed to run on any laptop with 16GB of RAM

Google's new Gemma 4 12B model is designed to run on any laptop with 16GB of RAM — AI - Ars Technica

Google has released Gemma 4 12B, a new AI model designed to run on consumer laptops with 16GB of RAM. This model fills a gap in the Gemma 4 lineup, offering significant capabilities without requiring specialized AI accelerators. It features Multi-Token Prediction and a streamlined multimodal approach for greater efficiency.

Author: Morein.ai Editorial

The generative AI boom has led to increased demand for memory, and Google is addressing this with new, less memory-intensive local AI models. The company has released Gemma 4 12B, a new model that expands its Gemma 4 lineup, bridging the gap between mobile-optimized and larger, more demanding models. This new model is efficient enough to run on many average consumer laptops.

Released earlier this year, the Gemma 4 family includes four models, all under the Apache 2.0 license. The initial release offered mobile-optimized versions (E2B and E4B) and more powerful models (26B Mixture of Experts and 31B Dense). The Gemma 4 12B model fills the void in the middle, offering a balance of capability and accessibility.

Gemma 4 12B is significantly more capable than the mobile versions while being able to run on consumer laptops with 16GB of system RAM or VRAM, eliminating the need for expensive AI accelerators. Despite having half the memory footprint of Gemma 4 26B MoE, Google asserts that the new model demonstrates comparable capabilities based on benchmarks.

This new model supports complex multi-step reasoning and agentic workflows, previously exclusive to larger Gemma variants. Gemma 4 12B incorporates Multi-Token Prediction (MTP) drafters, which enhance speed and efficiency by utilizing unused processing cycles to predict future tokens. While MTP versions are optional for other Gemma 4 models, it is integrated by default in Gemma 4 12B.

Gemma 4 12B also boasts enhanced efficiency through an innovative approach to multimodality. The Gemma 4 family natively handles text, audio, and image inputs. Unlike most generative AI models that use dedicated encoders for non-text inputs, Gemma 4 12B employs a streamlined embedding module for vision, using single-matrix multiplication and positional embedding. This allows for direct data transmission to the LLM with spatial awareness, removing the need for a bulky encoder. For audio, raw signals are directly projected into text token vectors, eliminating the need for any encoding.

The Gemma 4 12B model is accessible through various tools like LM Studio and Google AI Edge Gallery, allowing users to run it locally. The model weights, just under 18GB, are available for download on Kaggle and Hugging Face for users with the necessary RAM.

Read original source

Related articles