Browse latest
Research & PapersMarkTechPost · May 27, 2026

NVIDIA Releases Polar, a Token-Faithful Rollout Framework for GRPO Training Across Codex, Claude Code, and Qwen Code

NVIDIA Releases Polar, a Token-Faithful Rollout Framework for GRPO Training Across Codex, Claude Code, and Qwen Code — MarkTechPost

NVIDIA researchers have developed Polar, a novel rollout framework that trains language agents with reinforcement learning without altering their existing configurations. Polar significantly enhances performance across various coding platforms, as demonstrated by its substantial improvements on SWE-Bench Verified pass@1 scores.

Author: Morein.ai Editorial

NVIDIA researchers have introduced Polar, a groundbreaking rollout framework designed to train language agents through reinforcement learning. A key innovation of Polar is its ability to achieve this without requiring any modifications to the agents' existing harnesses, streamlining the integration process.

The framework operates by inserting a model API proxy between the agent harness and the inference server. This proxy meticulously captures token-level interactions, enabling the reconstruction of trajectories that are optimized for trainer-ready data.

Polar's effectiveness has been rigorously tested and proven. Utilizing GRPO on a Qwen3.5-4B base model, Polar showcased a remarkable improvement in SWE-Bench Verified pass@1 scores. This included a 22.6-point increase under the Codex harness, a 4.8-point increase with Claude Code, and a 6.2-point increase under Pi.

This innovative framework is now available and has been registered as a NeMo Gym environment. It is publicly accessible under the ProRL Agent Server repository, facilitating its adoption and further development within the research community.

Read original source

Related articles