feat: async partial rollout trainer with sample supplementation and caching by mamazi0131 · Pull Request #58 · verl-project/verl-recipe

mamazi0131 · 2026-03-01T09:44:32Z

What does this PR do?

This PR introduces the Async Partial Rollout (APR) mechanism to the verl framework to address the training efficiency bottleneck caused by long-tail samples (e.g., 160k tokens). By implementing Sample Supplementation and Interruption Techniques, we mitigate the "inference bubble" effect and significantly improve GPU utilization in synchronous RL training. Our implementation supports both verl 0.5.0 and 0.6.1.

Checklist Before Starting

Search for similar PRs. Paste at least one query link here: ...
Format the PR title as [{modules}] {type}: {description} (This will be checked by the CI)
- {modules} include fsdp, megatron, veomni, sglang, vllm, rollout, trainer, ci, training_utils, recipe, hardware, deployment, ray, worker, single_controller, misc, perf, model, algo, env, tool, ckpt, doc, data, cfg, reward
- If this PR involves multiple modules, separate them with , like [megatron, fsdp, doc]
- {type} is in feat, fix, refactor, chore, test
- If this PR breaks any API (CLI arguments, config, function signature, etc.), add [BREAKING] to the beginning of the title.
- Example: [BREAKING][fsdp, megatron] feat: dynamic batching

Key Accomplishments:

Implemented Sample Supplementation and Interruption Mechanisms (SSIM) for dynamic sample replenishment.
Introduced Rollout Caching via a state-aware PromptsManager to resume partial generations, effectively managing sample staleness.
Ensured Off-Policy Correctness for PPO-style algorithms (GRPO/DAPO) using decoupled importance sampling.
Achieved up to 51.1% reduction in end-to-end training time on complex reasoning datasets.

Test

We validated the APR mechanism on two benchmarks using 2 nodes with 8 H20 GPUs and the Qwen3-4B model:

GSM8K (Accuracy & Efficiency)
Under consistent convergence, training time was reduced by 11.7% with a 5.93% boost in GPU utilization.
- Baseline (GRPO+noPR): 4h 59m
- Proposed (GRPO+PR): 4h 24m (-35m)
DAPO-MATH17k (Long-sequence Stress Test)
In the presence of 160k-token long-tail samples, the APR achieved a 51.1% reduction in total training time while maintaining superior final performance.
- Baseline (GRPO+noPR): 67h 34m
- Proposed (GRPO+PR): 33h 02m (-34h 32m)
  $dapo_math$

API and Usage Example

Users can now trigger the partial rollout mode by using the specific recipes provided in the recipe/partial_rollout/ directory.

# Run DAPO-MATH17k with Partial Rollout on 2 nodes
bash recipe/partial_rollout/run_dapo_math17k_pr_4b_2node.sh

Design & Code Changes

Sample Supplementation and Interruption Mechanisms:
Introducing sample supplementation and interruption mechanisms to enable dynamic sample replenishment and automated scheduling of inference tasks.
Rollout Caching:
Using a prompt manager to resume partial rollouts, managing complete and partial samples in the buffer based on sample staleness.

Checklist Before Submitting

Important

Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review.

Read the Contribute Guide.
Apply pre-commit checks: pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always
Add / Update the documentation.
Add unit or end-to-end test(s) to the CI workflow to cover all the code. If not feasible, explain why: ...
Once your PR is ready for CI, send a message in the ci-request channel in the verl Slack workspace. (If not accessible, please try the Feishu group (飞书群).)
If your PR is related to the recipe submodule, please also update the reference to the submodule commit via git submodule update --remote or cd recipe && git pull origin main.

gemini-code-assist · 2026-03-01T09:44:54Z

Summary of Changes

Hello, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request introduces a comprehensive asynchronous partial rollout (APR) system to the verl framework, designed to dramatically enhance the efficiency of reinforcement learning training, particularly when dealing with datasets containing samples of highly varying lengths. By intelligently managing inference tasks, dynamically supplementing samples, and caching partial results, the system minimizes idle GPU time and accelerates the overall training process, leading to significant performance gains without compromising algorithmic correctness.

Highlights

Asynchronous Partial Rollout (APR) Mechanism: Introduced a novel APR mechanism to address training efficiency bottlenecks caused by long-tail samples in synchronous reinforcement learning, significantly improving GPU utilization.
Sample Supplementation and Interruption Techniques (SSIM): Implemented SSIM for dynamic sample replenishment and automated scheduling of inference tasks, mitigating 'inference bubbles' and caching unfinished samples for reuse.
Rollout Caching via PromptsManager: Integrated a state-aware PromptsManager to manage complete and partial samples, enabling the resumption of partial generations and effective handling of sample staleness.
Off-Policy Correctness: Ensured off-policy correctness for PPO-style algorithms (GRPO/DAPO) through decoupled importance sampling, preserving algorithmic integrity under interruptible generation and policy updates.
Performance Improvements: Achieved substantial reductions in end-to-end training time, including an 11.7% reduction on GSM8K and a 51.1% reduction on DAPO-MATH17k, alongside increased GPU utilization.
Compatibility: Ensured the implementation supports both verl 0.5.0 and 0.6.1, providing flexibility for users.

Changelog

partial_rollout/README.md
- Added detailed documentation for the Async Partial Rollout Trainer, covering background, solution, experimental results, implementation details, and usage examples.
partial_rollout/agent_loop/init.py
- Added imports for new agent loop components and updated the package's public interface.
partial_rollout/agent_loop/agent_loop.py
- Introduced PRv3AsyncLLMServerManager to support partial generation in LLM servers.
- Implemented PRv3AgentLoopWorker as a Ray remote actor to manage asynchronous sequence generation, including cancellation and prompt manager interaction.
- Added PRv3AgentLoopManager to orchestrate the new async rollout workers and prompt management logic.
partial_rollout/agent_loop/partial_single_turn_agent_loop.py
- Added PartialSingleTurnAgentLoop to enable partial generation and resumption for single-turn agent interactions.
partial_rollout/agent_loop/partial_tool_agent_loop.py
- Added PartialToolAgentLoop to support partial generation and resumption within multi-turn tool invocation agent loops.
partial_rollout/main_ppo.py
- Modified run_ppo and TaskRunner to integrate the new PRv3AgentLoopManager and RolloutPromptManager for asynchronous partial rollout training.
partial_rollout/prompt_manager.py
- Added RolloutPrompt dataclass to encapsulate batch information and agent loop outputs for partial rollouts.
- Implemented RolloutPromptManager as a Ray remote actor to manage the lifecycle of prompts (pending, ongoing, done), handle data iteration, and assemble batches for partial rollouts.
partial_rollout/ray_trainer.py
- Updated RayPPOTrainer to incorporate the RolloutPromptManager and PRv3AgentLoopManager for asynchronous partial rollouts.
- Modified the training loop to prepare, check, and pull prompts from the prompt manager, and handle cancellation events during generation.
partial_rollout/run_dapomath_nopr_grpo_4b_bs64.sh
- Added a new shell script to configure and run DAPO-MATH17k training without the partial rollout feature.
partial_rollout/run_dapomath_pr_grpo_4b_bs64.sh
- Added a new shell script to configure and run DAPO-MATH17k training with the partial rollout feature enabled.
partial_rollout/run_gsm8k_nopr_grpo_4b_bs128.sh
- Added a new shell script to configure and run GSM8K training without the partial rollout feature.
partial_rollout/run_gsm8k_pr_grpo_4b_bs128.sh
- Added a new shell script to configure and run GSM8K training with the partial rollout feature enabled.
partial_rollout/vllm_rollout/init.py
- Added an empty initialization file to define the vllm_rollout directory as a Python package.
partial_rollout/vllm_rollout/vllm_async_server.py
- Introduced vLLMHttpServerForPartial to extend vLLMHttpServerBase, adding support for partial generation, cancellation, and resumption of requests.
- Implemented PRv3vLLMReplica to utilize the new vLLMHttpServerForPartial for managing rollout servers with partial generation capabilities.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request introduces an Async Partial Rollout (APR) mechanism to enhance training efficiency, particularly for datasets with long-tail samples. The implementation is comprehensive, adding new components like PRv3AgentLoopManager, RolloutPromptManager, and specialized agent loops for partial generation. The overall design is solid and effectively addresses the stated problem. My review focuses on improving code clarity, maintainability, and fixing a few minor issues. I've identified opportunities for improvement regarding magic numbers, a potential performance concern with busy-waiting, and some inconsistencies in documentation and script files.

ArronHZG · 2026-03-19T10:21:19Z

Hello, thank you very much for your work. I understand that an asynchronous training architecture with colocation similar to Kimi has been implemented now, which is also a missing part of the current Verl.

In terms of design, Verl 0.7.1 supports an auto-resume mechanism, decoupling the complex state storage logic between the server and the agent. Meanwhile, parameter synchronization uniformly adopts the checkpoint engine approach, vLLM supports a multi-process mode, and the training engine is integrated through the Model Engine interface in a unified manner. All these modifications facilitate subsequent development and iteration.

It is suggested to refer to the following PRs and the current code to refactor this PR: the rollout module shall leverage the auto-resume capability, the training module shall adopt the Model Engine, and parameter synchronization shall use the checkpoint engine, so as to align with the current code and future planning.

[Completed] vLLM multi-process: verl-project/verl#4280
[Completed] Add CheckpointEngineManager: verl-project/verl#5031
[Completed] Refactor the trainer to improve code reuse across various fit phases: verl-project/verl#5184
[Completed] Fully async supports invocation in engine mode: verl-project/verl#5269
[Completed] Fully async supports checkpoint engine: verl-project/verl#5029
[Completed] Rollout supports the abort-resume interface: verl-project/verl#5430
[Completed] Clean up the partial-related logic in AgentLoop: verl-project/verl#5487

startju · 2026-04-22T09:24:02Z

hello @mamazi0131 , i'm a new beginer of verl, i'm glad to help you to refactor this code, could i help you to do that?

startju · 2026-04-22T09:30:36Z

@ArronHZG do you still need this feature, in v0.8.0 ?

mamazi0131 · 2026-04-22T10:48:05Z

hello @mamazi0131 , i'm a new beginer of verl, i'm glad to help you to refactor this code, could i help you to do that?

I’d be happy to, of course. I’ve been so busy with work lately that I haven’t had time to take care of this.

startju · 2026-04-22T13:02:06Z

hello @mamazi0131 , i'm a new beginer of verl, i'm glad to help you to refactor this code, could i help you to do that?

I’d be happy to, of course. I’ve been so busy with work lately that I haven’t had time to take care of this.

thank you!

feat: support partial rollout v3

ea5f40e

gemini-code-assist Bot reviewed Mar 1, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: async partial rollout trainer with sample supplementation and caching#58

feat: async partial rollout trainer with sample supplementation and caching#58
mamazi0131 wants to merge 1 commit intoverl-project:mainfrom
mamazi0131:main

mamazi0131 commented Mar 1, 2026 •

edited

Loading

Uh oh!

gemini-code-assist Bot commented Mar 1, 2026

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ArronHZG commented Mar 19, 2026

Uh oh!

startju commented Apr 22, 2026

Uh oh!

startju commented Apr 22, 2026

Uh oh!

mamazi0131 commented Apr 22, 2026

Uh oh!

startju commented Apr 22, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

mamazi0131 commented Mar 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Checklist Before Starting

Key Accomplishments:

Test

API and Usage Example

Design & Code Changes

Checklist Before Submitting

Uh oh!

gemini-code-assist Bot commented Mar 1, 2026

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ArronHZG commented Mar 19, 2026

Uh oh!

startju commented Apr 22, 2026

Uh oh!

startju commented Apr 22, 2026

Uh oh!

mamazi0131 commented Apr 22, 2026

Uh oh!

startju commented Apr 22, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

mamazi0131 commented Mar 1, 2026 •

edited

Loading