Browse latest
Research & PapersMarkTechPost · June 13, 2026

Moonshot AI Releases Kimi K2.7-Code: a Coding Model Reporting +21.8% on Kimi Code Bench v2 Over K2.6

Moonshot AI has released Kimi K2.7-Code, a new coding-focused, agentic model designed for long-horizon software engineering. It significantly outperforms its predecessor, K2.6, and competes closely with top models like GPT-5.5 and Claude Opus 4.8 on various benchmarks. The model aims to reduce reasoning-token usage by 30%, leading to cost savings and faster, more efficient operations.

Author: Morein.ai Editorial

Moonshot AI has unveiled Kimi K2.7-Code, a specialized, agentic coding model for long-horizon software engineering. It is available on Hugging Face under a Modified MIT license, and via the Kimi API and Kimi Code platforms. This new model focuses on planning, editing, tool execution, and debugging across complex multi-step processes.

K2.7-Code is a Mixture-of-Experts (MoE) model with 1 trillion total parameters, activating 32 billion parameters per token. It employs 384 experts, selecting eight per token with one shared, and features 61 layers including one dense layer. Key architectural elements include MLA for attention, SwiGLU for the feed-forward path, and a MoonViT vision encoder adding 400 million parameters for image and video input. The model supports native INT4 quantization and has a 256K token context window.

Performance benchmarks show Kimi K2.7-Code surpassing K2.6 on all tests, with a notable 21.8% improvement on Kimi Code Bench v2 (from 50.9 to 62.0). It also outperforms Claude Opus 4.8 on MCP Mark Verified (81.1 vs. 76.4) and approaches GPT-5.5's performance on MLS Bench Lite.

A significant feature of K2.7-Code is its reported 30% reduction in reasoning-token usage compared to K2.6, which Moonshot AI describes as "less overthinking." This reduction translates to lower output-token costs, faster execution steps in interactive CLI sessions, and the ability to perform more steps before hitting context limits. Reasoning tokens are typically billed as output tokens, making this efficiency gain crucial for agentic coding runs that involve many steps.

The Kimi API is OpenAI-compatible and uses the model string "kimi-k2.7-code." Users must adhere to fixed sampling parameters (temperature 1.0, top_p 0.95, n 1, penalties 0.0) as overriding them will result in an API error. The model can be self-hosted using vLLM, SGLang, or KTransformers, though its substantial size (approximately 595 GB) makes it suitable for server-class deployments.

For API usage, Python examples demonstrate how to interact with the model, including handling multi-step tool calls by preserving `reasoning_content` in context. The cost for cached input is $0.19 per 1M tokens.

Read original source

Related articles