Cohere Releases Command A+: A 218B Sparse MoE Model for Agentic Workflows That Runs on as Few as Two H100 GPUs

Cohere has released Command A+, a new 218B sparse Mixture-of-Experts (MoE) model. Optimized for enterprise agentic workflows, it offers significant performance improvements and runs on as few as two H100 GPUs. Its enhanced efficiency, multimodal capabilities, and expanded language support mark a notable advancement in AI models.
Cohere has unveiled Command A+, an open-source 218B sparse Mixture-of-Experts (MoE) model, specifically designed for enterprise agentic workflows. Released under an Apache 2.0 license, this model integrates functionalities from four prior models—Command A, Command A Reasoning, Command A Vision, and Command A Translate—into a single, scalable entity. It is optimized for diverse applications including reasoning, RAG, multilingual processing, and multimodal document handling, running efficiently on minimal GPU infrastructure.
Command A+ is a decoder-only Sparse MoE Transformer with 218 billion total parameters and 25 billion active parameters. It utilizes 128 experts, with eight active per token, alongside a single shared expert. This architecture routes each token through a subset of expert sub-networks, maintaining active compute at a 25B-parameter scale during inference, which significantly enhances performance while minimizing computational overhead.
Input modalities for Command A+ include text, image, and tool use, with output modalities encompassing text, reasoning, and tool use. The model supports an expansive 128K input context length and a 64K maximum generation length. Quantization variants, such as W4A4, enable the model to operate on a single B200 or two H100 GPUs, making it highly accessible for various deployment scenarios.
The model demonstrates substantial performance improvements across multiple benchmarks. On τ²-Bench Telecom, scores surged from 37% to 85% over Command A Reasoning, and Terminal-Bench Hard agentic coding performance reached 25% from 3%. Internal evaluations show a 20% accuracy improvement in Agentic Question Answering and a 32% rise in spreadsheet analysis quality. Command A+ also achieves 63% on MMMU Pro and 75.1% on MMMU, highlighting its multimodal reasoning capabilities.
Command A+ expands its multilingual coverage from 23 to 48 languages, yielding gains in machine translation and multilingual reasoning. Its tokenization efficiency has also improved, by 20% for Arabic, 16% for Korean, and 18% for Japanese, reducing the tokens needed for response generation. The model further offers up to 63% higher Output Tokens per Second and reduces Time To First Token by up to 17% compared to Command A Reasoning, along with an additional 1.5–1.6× inference speedup from speculative decoding.
Cohere employs NVFP4 W4A4 quantization for MoE experts and Quantization-Aware Distillation (QAD) to maintain high quality. Tool use is managed through Transformers chat templates using JSON schema, with reasoning traces available for detailed analysis. Command A+ is supported by vLLM and Transformers, with specific version requirements for optimal performance, and Cohere provides recommended sampling parameters for effective use.
Related articles
Build real agentic apps using CUGA: two dozen working examples on a lightweight harness
CUGA, IBM's open-source Agent Harness, simplifies building agentic applications by handling infrastructure, allowing developers to focus on tools and prompts. It offers pre-assembled components for planning, execution, and state management, significantly reducing development time. CUGA has topped agent benchmarks like AppWorld and WebArena.
OpenAI launches new initiative to help find and patch open source bugs
OpenAI has launched "Patch the Planet," a new initiative in partnership with cybersecurity firm Trail of Bits, to enhance the security of open-source projects. This program aims to assist maintainers in identifying and patching bugs, utilizing OpenAI's AI-powered security tools while reducing the burden on project teams.
PP-OCRv6 on Hugging Face: 50-Language OCR from 1.5M to 34.5M Parameters
Baidu has released PP-OCRv6, an advanced optical character recognition (OCR) model supporting 50 languages. Available on Hugging Face, this version significantly improves accuracy and efficiency across various parameter sizes, from 1.5 million to 34.5 million, marking a substantial leap in multilingual OCR technology.
