Meet ‘North Mini Code’: Cohere’s 30B Open-Weight Mixture-of-Experts Model With 3B Active Parameters for Agentic Coding
Cohere AI has launched 'North Mini Code', an open-weight, 30-billion-parameter Mixture-of-Experts model with 3 billion active parameters, designed for agentic coding. This model focuses on efficient self-hosting, supporting code generation, agentic software engineering, and terminal tasks with a 256K context window.
Cohere AI has introduced its first developer-focused coding model, 'North Mini Code,' an open-weight, 30-billion-parameter Mixture-of-Experts (MoE) model. This model activates only 3 billion parameters per token, making it highly efficient for self-hosting without extensive GPU clusters. It is designed for software engineers and supports various tasks such as code generation, agentic software engineering, and terminal operations.
The model features a 256K token context window and a maximum output length of 64K tokens. Cohere optimized 'North Mini Code' to perform well in scenarios requiring agentic workflows, allowing for interleaved thinking and native tool use. Its architecture is a decoder-only Transformer with sparse MoE layers, where 8 out of 128 experts activate per token in the feed-forward block.
'North Mini Code' was trained in two phases: cascaded supervised fine-tuning (SFT) followed by reinforcement learning with verifiable rewards (RLVR), specifically targeting agentic coding. Benchmarking results show a score of 33.4 on the Artificial Analysis Coding Index, positioning it competitively among similarly sized models. Internal tests also indicate up to 2.8x higher output throughput and a 30% edge in inter-token latency compared to other models.
The model weights are released under Apache 2.0 on Hugging Face and can also be accessed through the Cohere API, Model Vault, and OpenRouter. It requires a minimum hardware configuration of one H100 at FP8 and supports deployment through Hugging Face Transformers and vLLM for serving. Quantized builds are available for platforms like Ollama, LM Studio, and llama.cpp, providing flexible deployment options for developers.
Related articles
Build real agentic apps using CUGA: two dozen working examples on a lightweight harness
CUGA, IBM's open-source Agent Harness, simplifies building agentic applications by handling infrastructure, allowing developers to focus on tools and prompts. It offers pre-assembled components for planning, execution, and state management, significantly reducing development time. CUGA has topped agent benchmarks like AppWorld and WebArena.
OpenAI launches new initiative to help find and patch open source bugs
OpenAI has launched "Patch the Planet," a new initiative in partnership with cybersecurity firm Trail of Bits, to enhance the security of open-source projects. This program aims to assist maintainers in identifying and patching bugs, utilizing OpenAI's AI-powered security tools while reducing the burden on project teams.
PP-OCRv6 on Hugging Face: 50-Language OCR from 1.5M to 34.5M Parameters
Baidu has released PP-OCRv6, an advanced optical character recognition (OCR) model supporting 50 languages. Available on Hugging Face, this version significantly improves accuracy and efficiency across various parameter sizes, from 1.5 million to 34.5 million, marking a substantial leap in multilingual OCR technology.
