Browse latest
Research & PapersMarkTechPost · May 14, 2026

Nous Research Releases Token Superposition Training to Speed Up LLM Pre-Training by Up to 2.5x Across 270M to 10B Parameter Models

Nous Research Releases Token Superposition Training to Speed Up LLM Pre-Training by Up to 2.5x Across 270M to 10B Parameter Models — MarkTechPost

Nous Research introduces Token Superposition Training (TST), a novel two-phase pre-training method. TST accelerates LLM pre-training by up to 2.5x without altering model architecture or inference behavior.

Author: Morein.ai Editorial

Nous Research has unveiled Token Superposition Training (TST), an innovative two-phase pre-training methodology.

This method significantly reduces the wall-clock training time by up to 2.5 times, achieving this efficiency at matched FLOPs. TST operates by averaging contiguous token embeddings into "bags" during its initial phase.

Following this, the system transitions to standard next-token prediction in its second phase. Crucially, this acceleration is achieved without any modifications to the model architecture, tokenizer, optimizer, or inference-time behavior.

The effectiveness of TST has been validated across a range of model scales, specifically at 270M, 600M, 3B dense, and 10B-A1B Mixture-of-Experts (MoE) scales.

Read original source

Related articles