Browse latest
Research & PapersMarkTechPost · May 21, 2026

Qwen Introduces Qwen3.7-Max: A Reasoning Agent Model With a 1M-Token Context Window

Qwen Introduces Qwen3.7-Max: A Reasoning Agent Model With a 1M-Token Context Window — MarkTechPost

Alibaba has launched Qwen3.7-Max, a new reasoning agent model with a 1-million-token context window, designed for complex, multi-step tasks. This proprietary model excels in scientific reasoning, agentic capabilities, and coding, showing significant improvements over its predecessor.

Author: Morein.ai Editorial

Most AI models today struggle with sustained, multi-step autonomous execution, which is crucial for tasks like iterative code modifications or extended tool chains. Alibaba's Qwen team has addressed this with the formal announcement of Qwen3.7-Max, a new reasoning agent model. Two preview versions of the Qwen3.7 series had previously appeared on Arena AI's leaderboard.

Qwen3.7-Max is described as Alibaba's most advanced and comprehensive agent model. It is a proprietary and closed-weight model designed for long-horizon tasks, capable of handling coding, debugging, office workflow automation, and complex multi-step processes. Its core strength lies in its "chain of thought" reasoning, where it plans, checks, and self-corrects before providing a final answer.

The model features an impressive 1-million-token context window, a significant increase from its predecessor's 256K. This allows it to process large amounts of information in a single request, such as a full mid-sized code repository. While this leads to more output tokens, it significantly enhances performance for multi-step planning and complex agent chains.

Qwen3.7-Max scored 56.6 on the Artificial Analysis Intelligence Index, placing it fifth overall and surpassing Google's Gemini 3.5 Flash. This represents a 4.8-point gain over Qwen3.6 Max Preview. The improvements are concentrated in scientific reasoning, agentic capability, and coding, with notable gains in benchmarks like CritPt, Humanity’s Last Exam, and Terminal-Bench Hard.

One interesting observation from the AA-Omniscience benchmark is a decrease in raw accuracy but a substantial drop in the hallucination rate. This indicates that the model is more inclined to refuse to answer rather than providing incorrect information, which is a key consideration for use cases requiring broad factual recall.

Alibaba demonstrated the model's capabilities in an internal test where it autonomously performed over 1,000 tool calls and iterative code modifications, leading to a tenfold improvement in inference speed.

Read original source

Related articles