Qwen Introduces Qwen3.7-Max: A Reasoning Agent Model With a 1M-Token Context Window

Alibaba has launched Qwen3.7-Max, a new reasoning agent model with a 1-million-token context window, designed for complex, multi-step tasks. This proprietary model excels in scientific reasoning, agentic capabilities, and coding, showing significant improvements over its predecessor.
Most AI models today struggle with sustained, multi-step autonomous execution, which is crucial for tasks like iterative code modifications or extended tool chains. Alibaba's Qwen team has addressed this with the formal announcement of Qwen3.7-Max, a new reasoning agent model. Two preview versions of the Qwen3.7 series had previously appeared on Arena AI's leaderboard.
Qwen3.7-Max is described as Alibaba's most advanced and comprehensive agent model. It is a proprietary and closed-weight model designed for long-horizon tasks, capable of handling coding, debugging, office workflow automation, and complex multi-step processes. Its core strength lies in its "chain of thought" reasoning, where it plans, checks, and self-corrects before providing a final answer.
The model features an impressive 1-million-token context window, a significant increase from its predecessor's 256K. This allows it to process large amounts of information in a single request, such as a full mid-sized code repository. While this leads to more output tokens, it significantly enhances performance for multi-step planning and complex agent chains.
Qwen3.7-Max scored 56.6 on the Artificial Analysis Intelligence Index, placing it fifth overall and surpassing Google's Gemini 3.5 Flash. This represents a 4.8-point gain over Qwen3.6 Max Preview. The improvements are concentrated in scientific reasoning, agentic capability, and coding, with notable gains in benchmarks like CritPt, Humanity’s Last Exam, and Terminal-Bench Hard.
One interesting observation from the AA-Omniscience benchmark is a decrease in raw accuracy but a substantial drop in the hallucination rate. This indicates that the model is more inclined to refuse to answer rather than providing incorrect information, which is a key consideration for use cases requiring broad factual recall.
Alibaba demonstrated the model's capabilities in an internal test where it autonomously performed over 1,000 tool calls and iterative code modifications, leading to a tenfold improvement in inference speed.
Related articles
The AI world is getting ‘loopy’
AI models are taking a significant leap forward with the adoption of "agentic loops," where AI agents continuously prompt each other to improve code and solve complex problems. This approach, though potentially resource-intensive, promises to unlock new levels of autonomous problem-solving and efficiency in AI applications.
Codex-maxxing for long-running work
Codex is increasingly being used by organizations to support long-running projects that go beyond a single prompt. This whitepaper by Jason Liu offers practical strategies for leveraging Codex as a persistent workspace, managing complex workflows and sustaining progress.
Nobel laureate John Jumper is leaving DeepMind for rival Anthropic
Nobel laureate John Jumper is departing Google DeepMind to join its competitor, Anthropic, after dedicating nearly nine years to DeepMind, where he led the AlphaFold team. Jumper, who shared a Nobel Prize for his work on AlphaFold, expressed gratitude for his time at DeepMind while looking forward to new endeavors.
