Five labs, five minds: building a multi-model fi

The "Thousand Token Wood" project evolved into a game where players act as shadow financiers manipulating an economy run by diverse small AI models. This setup revealed that managing multiple models is primarily a serving layer challenge, far more than a modeling one. It also emphasized the critical need for strict firewalls to manage sensitive information and bounded memory summaries to create persistent agent behaviors without overwhelming small models.

The "Thousand Token Wood" project, initially a weather-god sandbox, has transformed into a strategic game. In its latest iteration, players assume the role of a shadow financier, manipulating an emergent economy. This economy is unique, as each AI agent within it operates on a different small model developed by various labs. You, as the Patron of the Wood, engage in activities like lending, whispering tips, shorting markets, and brokering alliances, all while being pursued by a magistrate. The creatures in this world remember your actions, adding a dynamic layer of interaction.

One of the most significant insights from this project is that heterogeneity, rather than being a constraint, enhances the complexity and interest of the market. The system utilizes four distinct models: gpt-oss-20b, MiniCPM3-4B, Nemotron-Mini-4B, and a fine-tuned Qwen 0.5B. This diversity ensures that market participants behave genuinely differently, leading to more emergent and less scripted interactions. The primary challenge in integrating these diverse models was found to be at the serving layer, not the modeling layer, highlighting the importance of robust infrastructure for multi-model systems.

A crucial aspect of the game's dramatic core is information asymmetry. Players can provide insider tips that are either true or false, with profits from true tips increasing scrutiny from the magistrate. To maintain this dynamic, a strict firewall is in place to prevent AI agents from accessing sensitive information. This security measure is considered paramount, ensuring that agents only interact with publicly available information. The project underscores that secret information given to an agent necessitates a strong data flow firewall, rigorously proven by testing.

Another key element is the management of persistent memory for AI agents. Creatures maintain sentiments and relationships, influencing their behavior within the game. To avoid "prompt inflation" in small models, memories are not stored as raw history in prompts. Instead, a one-line, bucketed summary of sentiments is used, capped to the most influential feelings. This approach allows for persistent, dynamic agent relationships without overwhelming the models. The behavioral biases are both emergent from these summaries and mechanically reinforced by deterministic rules.

Ultimately, the project demonstrates that small models can be reliable format generators, though they are unreliable reasoners. Structure, prompting, and small fine-tunes can effectively bridge this gap. A heterogeneous council of models creates a more engaging environment, with the main integration cost being configuration once the serving layer is established. The effective handling of secret information and bounded memory are vital for building complex, interactive multi-agent systems with small models.

Five labs, five minds: building a multi-model finance drama on small models

Related articles

The AI world is getting ‘loopy’

Codex-maxxing for long-running work

Nobel laureate John Jumper is leaving DeepMind for rival Anthropic

Related articles

Research & Papers
The AI world is getting ‘loopy’
AI models are taking a significant leap forward with the adoption of "agentic loops," where AI agents continuously prompt each other to improve code and solve complex problems. This approach, though potentially resource-intensive, promises to unlock new levels of autonomous problem-solving and efficiency in AI applications.
AI News & Artificial Intelligence | TechCrunchJun 22, 2026

Research & Papers
Codex-maxxing for long-running work
Codex is increasingly being used by organizations to support long-running projects that go beyond a single prompt. This whitepaper by Jason Liu offers practical strategies for leveraging Codex as a persistent workspace, managing complex workflows and sustaining progress.
OpenAI NewsJun 22, 2026

Research & Papers
Nobel laureate John Jumper is leaving DeepMind for rival Anthropic
Nobel laureate John Jumper is departing Google DeepMind to join its competitor, Anthropic, after dedicating nearly nine years to DeepMind, where he led the AlphaFold team. Jumper, who shared a Nobel Prize for his work on AlphaFold, expressed gratitude for his time at DeepMind while looking forward to new endeavors.
AI News & Artificial Intelligence | TechCrunchJun 20, 2026

Five labs, five minds: building a multi-model finance drama on small models

Related articles

The AI world is getting &#8216;loopy&#8217;

Codex-maxxing for long-running work

Nobel laureate John Jumper is leaving DeepMind for rival Anthropic

The AI world is getting ‘loopy’