Browse latest
Research & PapersHugging Face - Blog · June 6, 2026

Five labs, five minds: building a multi-model finance drama on small models

The "Thousand Token Wood" project evolved into a game where players act as shadow financiers manipulating an economy run by diverse small AI models. This setup revealed that managing multiple models is primarily a serving layer challenge, far more than a modeling one. It also emphasized the critical need for strict firewalls to manage sensitive information and bounded memory summaries to create persistent agent behaviors without overwhelming small models.

Author: Morein.ai Editorial

The "Thousand Token Wood" project, initially a weather-god sandbox, has transformed into a strategic game. In its latest iteration, players assume the role of a shadow financier, manipulating an emergent economy. This economy is unique, as each AI agent within it operates on a different small model developed by various labs. You, as the Patron of the Wood, engage in activities like lending, whispering tips, shorting markets, and brokering alliances, all while being pursued by a magistrate. The creatures in this world remember your actions, adding a dynamic layer of interaction.

One of the most significant insights from this project is that heterogeneity, rather than being a constraint, enhances the complexity and interest of the market. The system utilizes four distinct models: gpt-oss-20b, MiniCPM3-4B, Nemotron-Mini-4B, and a fine-tuned Qwen 0.5B. This diversity ensures that market participants behave genuinely differently, leading to more emergent and less scripted interactions. The primary challenge in integrating these diverse models was found to be at the serving layer, not the modeling layer, highlighting the importance of robust infrastructure for multi-model systems.

A crucial aspect of the game's dramatic core is information asymmetry. Players can provide insider tips that are either true or false, with profits from true tips increasing scrutiny from the magistrate. To maintain this dynamic, a strict firewall is in place to prevent AI agents from accessing sensitive information. This security measure is considered paramount, ensuring that agents only interact with publicly available information. The project underscores that secret information given to an agent necessitates a strong data flow firewall, rigorously proven by testing.

Another key element is the management of persistent memory for AI agents. Creatures maintain sentiments and relationships, influencing their behavior within the game. To avoid "prompt inflation" in small models, memories are not stored as raw history in prompts. Instead, a one-line, bucketed summary of sentiments is used, capped to the most influential feelings. This approach allows for persistent, dynamic agent relationships without overwhelming the models. The behavioral biases are both emergent from these summaries and mechanically reinforced by deterministic rules.

Ultimately, the project demonstrates that small models can be reliable format generators, though they are unreliable reasoners. Structure, prompting, and small fine-tunes can effectively bridge this gap. A heterogeneous council of models creates a more engaging environment, with the main integration cost being configuration once the serving layer is established. The effective handling of secret information and bounded memory are vital for building complex, interactive multi-agent systems with small models.

Read original source

Related articles