feat: async partial rollout trainer with sample supplementation and caching#58
feat: async partial rollout trainer with sample supplementation and caching#58mamazi0131 wants to merge 1 commit intoverl-project:mainfrom
Conversation
Summary of ChangesHello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request introduces a comprehensive asynchronous partial rollout (APR) system to the verl framework, designed to dramatically enhance the efficiency of reinforcement learning training, particularly when dealing with datasets containing samples of highly varying lengths. By intelligently managing inference tasks, dynamically supplementing samples, and caching partial results, the system minimizes idle GPU time and accelerates the overall training process, leading to significant performance gains without compromising algorithmic correctness. Highlights
Changelog
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Code Review
This pull request introduces an Async Partial Rollout (APR) mechanism to enhance training efficiency, particularly for datasets with long-tail samples. The implementation is comprehensive, adding new components like PRv3AgentLoopManager, RolloutPromptManager, and specialized agent loops for partial generation. The overall design is solid and effectively addresses the stated problem. My review focuses on improving code clarity, maintainability, and fixing a few minor issues. I've identified opportunities for improvement regarding magic numbers, a potential performance concern with busy-waiting, and some inconsistencies in documentation and script files.
|
Hello, thank you very much for your work. I understand that an asynchronous training architecture with colocation similar to Kimi has been implemented now, which is also a missing part of the current Verl. In terms of design, Verl 0.7.1 supports an auto-resume mechanism, decoupling the complex state storage logic between the server and the agent. Meanwhile, parameter synchronization uniformly adopts the checkpoint engine approach, vLLM supports a multi-process mode, and the training engine is integrated through the Model Engine interface in a unified manner. All these modifications facilitate subsequent development and iteration. It is suggested to refer to the following PRs and the current code to refactor this PR: the rollout module shall leverage the auto-resume capability, the training module shall adopt the Model Engine, and parameter synchronization shall use the checkpoint engine, so as to align with the current code and future planning. [Completed] vLLM multi-process: verl-project/verl#4280 |
|
hello @mamazi0131 , i'm a new beginer of verl, i'm glad to help you to refactor this code, could i help you to do that? |
|
@ArronHZG do you still need this feature, in v0.8.0 ? |
I’d be happy to, of course. I’ve been so busy with work lately that I haven’t had time to take care of this. |
thank you! |
What does this PR do?
Checklist Before Starting
[{modules}] {type}: {description}(This will be checked by the CI){modules}includefsdp,megatron,veomni,sglang,vllm,rollout,trainer,ci,training_utils,recipe,hardware,deployment,ray,worker,single_controller,misc,perf,model,algo,env,tool,ckpt,doc,data,cfg,reward,like[megatron, fsdp, doc]{type}is infeat,fix,refactor,chore,test[BREAKING]to the beginning of the title.[BREAKING][fsdp, megatron] feat: dynamic batchingKey Accomplishments:
Implemented Sample Supplementation and Interruption Mechanisms (SSIM) for dynamic sample replenishment.
Introduced Rollout Caching via a state-aware PromptsManager to resume partial generations, effectively managing sample staleness.
Ensured Off-Policy Correctness for PPO-style algorithms (GRPO/DAPO) using decoupled importance sampling.
Achieved up to 51.1% reduction in end-to-end training time on complex reasoning datasets.
Test
We validated the APR mechanism on two benchmarks using 2 nodes with 8 H20 GPUs and the Qwen3-4B model:
Under consistent convergence, training time was reduced by 11.7% with a 5.93% boost in GPU utilization.
In the presence of 160k-token long-tail samples, the APR achieved a 51.1% reduction in total training time while maintaining superior final performance.
API and Usage Example
Users can now trigger the partial rollout mode by using the specific recipes provided in the recipe/partial_rollout/ directory.
# Run DAPO-MATH17k with Partial Rollout on 2 nodes bash recipe/partial_rollout/run_dapo_math17k_pr_4b_2node.shDesign & Code Changes
Sample Supplementation and Interruption Mechanisms:
Introducing sample supplementation and interruption mechanisms to enable dynamic sample replenishment and automated scheduling of inference tasks.
Rollout Caching:
Using a prompt manager to resume partial rollouts, managing complete and partial samples in the buffer based on sample staleness.
Checklist Before Submitting
Important
Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review.
pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=alwaysci-requestchannel in theverlSlack workspace. (If not accessible, please try the Feishu group (飞书群).)recipesubmodule, please also update the reference to the submodule commit viagit submodule update --remoteorcd recipe && git pull origin main.