Moonshot AI Releases Kimi K2.7-Code: a Coding Model Reporting +21.8% on Kimi Code Bench v2 Over K2.6
Moonshot AI has released Kimi K2.7-Code, a new coding-focused, agentic model designed for long-horizon software engineering. It significantly outperforms its predecessor, K2.6, and competes closely with top models like GPT-5.5 and Claude Opus 4.8 on various benchmarks. The model aims to reduce reasoning-token usage by 30%, leading to cost savings and faster, more efficient operations.
Moonshot AI has unveiled Kimi K2.7-Code, a specialized, agentic coding model for long-horizon software engineering. It is available on Hugging Face under a Modified MIT license, and via the Kimi API and Kimi Code platforms. This new model focuses on planning, editing, tool execution, and debugging across complex multi-step processes.
K2.7-Code is a Mixture-of-Experts (MoE) model with 1 trillion total parameters, activating 32 billion parameters per token. It employs 384 experts, selecting eight per token with one shared, and features 61 layers including one dense layer. Key architectural elements include MLA for attention, SwiGLU for the feed-forward path, and a MoonViT vision encoder adding 400 million parameters for image and video input. The model supports native INT4 quantization and has a 256K token context window.
Performance benchmarks show Kimi K2.7-Code surpassing K2.6 on all tests, with a notable 21.8% improvement on Kimi Code Bench v2 (from 50.9 to 62.0). It also outperforms Claude Opus 4.8 on MCP Mark Verified (81.1 vs. 76.4) and approaches GPT-5.5's performance on MLS Bench Lite.
A significant feature of K2.7-Code is its reported 30% reduction in reasoning-token usage compared to K2.6, which Moonshot AI describes as "less overthinking." This reduction translates to lower output-token costs, faster execution steps in interactive CLI sessions, and the ability to perform more steps before hitting context limits. Reasoning tokens are typically billed as output tokens, making this efficiency gain crucial for agentic coding runs that involve many steps.
The Kimi API is OpenAI-compatible and uses the model string "kimi-k2.7-code." Users must adhere to fixed sampling parameters (temperature 1.0, top_p 0.95, n 1, penalties 0.0) as overriding them will result in an API error. The model can be self-hosted using vLLM, SGLang, or KTransformers, though its substantial size (approximately 595 GB) makes it suitable for server-class deployments.
For API usage, Python examples demonstrate how to interact with the model, including handling multi-step tool calls by preserving `reasoning_content` in context. The cost for cached input is $0.19 per 1M tokens.
Related articles
The AI world is getting ‘loopy’
AI models are taking a significant leap forward with the adoption of "agentic loops," where AI agents continuously prompt each other to improve code and solve complex problems. This approach, though potentially resource-intensive, promises to unlock new levels of autonomous problem-solving and efficiency in AI applications.
Codex-maxxing for long-running work
Codex is increasingly being used by organizations to support long-running projects that go beyond a single prompt. This whitepaper by Jason Liu offers practical strategies for leveraging Codex as a persistent workspace, managing complex workflows and sustaining progress.
Nobel laureate John Jumper is leaving DeepMind for rival Anthropic
Nobel laureate John Jumper is departing Google DeepMind to join its competitor, Anthropic, after dedicating nearly nine years to DeepMind, where he led the AlphaFold team. Jumper, who shared a Nobel Prize for his work on AlphaFold, expressed gratitude for his time at DeepMind while looking forward to new endeavors.
