Nous Research Releases Token Superposition Training to Speed Up LLM Pre-Training by Up to 2.5x Across 270M to 10B Parameter Models

Nous Research introduces Token Superposition Training (TST), a novel two-phase pre-training method. TST accelerates LLM pre-training by up to 2.5x without altering model architecture or inference behavior.
Nous Research has unveiled Token Superposition Training (TST), an innovative two-phase pre-training methodology.
This method significantly reduces the wall-clock training time by up to 2.5 times, achieving this efficiency at matched FLOPs. TST operates by averaging contiguous token embeddings into "bags" during its initial phase.
Following this, the system transitions to standard next-token prediction in its second phase. Crucially, this acceleration is achieved without any modifications to the model architecture, tokenizer, optimizer, or inference-time behavior.
The effectiveness of TST has been validated across a range of model scales, specifically at 270M, 600M, 3B dense, and 10B-A1B Mixture-of-Experts (MoE) scales.
Related articles
The AI world is getting ‘loopy’
AI models are taking a significant leap forward with the adoption of "agentic loops," where AI agents continuously prompt each other to improve code and solve complex problems. This approach, though potentially resource-intensive, promises to unlock new levels of autonomous problem-solving and efficiency in AI applications.
Codex-maxxing for long-running work
Codex is increasingly being used by organizations to support long-running projects that go beyond a single prompt. This whitepaper by Jason Liu offers practical strategies for leveraging Codex as a persistent workspace, managing complex workflows and sustaining progress.
Nobel laureate John Jumper is leaving DeepMind for rival Anthropic
Nobel laureate John Jumper is departing Google DeepMind to join its competitor, Anthropic, after dedicating nearly nine years to DeepMind, where he led the AlphaFold team. Jumper, who shared a Nobel Prize for his work on AlphaFold, expressed gratitude for his time at DeepMind while looking forward to new endeavors.
