Browse latest
Tools & PlatformsMarkTechPost · June 11, 2026

Meet ‘North Mini Code’: Cohere’s 30B Open-Weight Mixture-of-Experts Model With 3B Active Parameters for Agentic Coding

Cohere AI has launched 'North Mini Code', an open-weight, 30-billion-parameter Mixture-of-Experts model with 3 billion active parameters, designed for agentic coding. This model focuses on efficient self-hosting, supporting code generation, agentic software engineering, and terminal tasks with a 256K context window.

Author: Morein.ai Editorial

Cohere AI has introduced its first developer-focused coding model, 'North Mini Code,' an open-weight, 30-billion-parameter Mixture-of-Experts (MoE) model. This model activates only 3 billion parameters per token, making it highly efficient for self-hosting without extensive GPU clusters. It is designed for software engineers and supports various tasks such as code generation, agentic software engineering, and terminal operations.

The model features a 256K token context window and a maximum output length of 64K tokens. Cohere optimized 'North Mini Code' to perform well in scenarios requiring agentic workflows, allowing for interleaved thinking and native tool use. Its architecture is a decoder-only Transformer with sparse MoE layers, where 8 out of 128 experts activate per token in the feed-forward block.

'North Mini Code' was trained in two phases: cascaded supervised fine-tuning (SFT) followed by reinforcement learning with verifiable rewards (RLVR), specifically targeting agentic coding. Benchmarking results show a score of 33.4 on the Artificial Analysis Coding Index, positioning it competitively among similarly sized models. Internal tests also indicate up to 2.8x higher output throughput and a 30% edge in inter-token latency compared to other models.

The model weights are released under Apache 2.0 on Hugging Face and can also be accessed through the Cohere API, Model Vault, and OpenRouter. It requires a minimum hardware configuration of one H100 at FP8 and supports deployment through Hugging Face Transformers and vLLM for serving. Quantized builds are available for platforms like Ollama, LM Studio, and llama.cpp, providing flexible deployment options for developers.

Read original source

Related articles