[megatron, sglang, rollout, validation, doc] feat: Add Validation StateMachine to async-rl and Support async ref_logp by ziqi-wlb · Pull Request #2 · rednote-hilab/dots.rl

ziqi-wlb · 2025-08-29T01:46:56Z

What does this PR do?

Add validation StateMachine
Support async ref_logp: remove all offloads(param/grad/optimizer..)
Change doc for async-rl

Performance: Compared with the previous async-rl the performance is further improved by 20% (170s -> 140s, after tuning engines tp, 140s -> 112s)

Checklist Before Starting

Search for similar PRs. Paste at least one query link here: ...
Format the PR title as [{modules}] {type}: {description} (This will be checked by the CI)
- {modules} include fsdp, megatron, sglang, vllm, rollout, trainer, ci, training_utils, recipe, hardware, deployment, ray, worker, single_controller, misc, perf, model, algo, env, tool, ckpt, doc, data
- If this PR involves multiple modules, separate them with , like [megatron, fsdp, doc]
- {type} is in feat, fix, refactor, chore, test
- If this PR breaks any API (CLI arguments, config, function signature, etc.), add [BREAKING] to the beginning of the title.
- Example: [BREAKING][fsdp, megatron] feat: dynamic batching

API and Usage Example

Demonstrate how the API changes if any, and provide usage example(s) if possible.

# Async RL Configuration
+actor_rollout_ref.async_pipeline=True \
 
# Resource Management
+trainer.sperated_node_ratios=[0.5,0.5] \
# means: each task group uses 0.5 of total nodes
# means: train/logp/ref_logp use 0.5 ngpus, generate use 0.5 ngpus

# Performance Tuning, enable async-param-update
+actor_rollout_ref.rollout.enable_dual_buffer=True \
# The sender granularity of the actor training node during parameter update
+actor_rollout_ref.rollout.param_update_preduce_bucket_size_mb=512 \
# The receiver granularity of the rollout inference node is too large, which will cause GPU-OOM
+actor_rollout_ref.rollout.param_update_consume_bucket_size_mb=128 \
 
# The granularity of offpolicy, 2 means that generate is faster than the train node to execute 2 steps, that is, one-step-offpolicy
+trainer.generate_ahead_steps=2 \

Task Group Configuration Examples

Example 1: Complete Separation

+trainer.sperated_node_tasks=[logp,ref_logp,actor-train,generate] \
+trainer.sperated_node_ratios=[0.25,0.25,0.25,0.25] \

Explanation: Each task gets 25% of total nodes

logp: 25% nodes
ref_logp: 25% nodes
actor-train: 25% nodes
generate: 25% nodes

Example 2: Hybrid Mode (logp + actor-train grouped)

+trainer.sperated_node_tasks=[[logp,actor-train],ref_logp,generate] \
+trainer.sperated_node_ratios=[0.5,0.25,0.25] \

Explanation:

First group [logp,actor-train]: 50% nodes (shared)
ref_logp: 25% nodes
generate: 25% nodes

Example 3: Hybrid Mode (logp + actor-train + ref_logp grouped)

+trainer.sperated_node_tasks=[[logp,actor-train,ref_logp],generate] \
+trainer.sperated_node_ratios=[0.5,0.5] \

Explanation:

First group [logp,actor-train,ref_logp]: 50% nodes (shared)
generate: 50% nodes

Design & Code Changes

Demonstrate the high-level design if this PR is complex, and list the specific changes.

Checklist Before Submitting

Important

Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review.

Read the Contribute Guide.
Apply pre-commit checks: pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always
Add / Update the documentation.
Add unit or end-to-end test(s) to the CI workflow to cover all the code. If not feasible, explain why: ...
Once your PR is ready for CI, send a message in the ci-request channel in the verl Slack workspace. (If not accessible, please try the Feishu group (飞书群).)

Add state-machine for async-rl Add async param-update overlap with logp and generate

zuijiang and others added 15 commits August 19, 2025 06:21

add xdg ulysses

cc2df6b

add grpo scripts

95f3466

适配redmoe+mcore by光速

86a66a6

Bump from guangsu

75bd461

[feat] Add async-rl with param-sync and async-pipeline

28a0dd9

Add state-machine for async-rl Add async param-update overlap with logp and generate

Update README

c1f94a5

Refine code

6a3e533

rebase to main

ae10015

add offload-grad for megatron-worker

15e7718

Refine code

ad39348

Refine code

c7e0216

Refine code

d1914e5

Fix save checkpoint

16970e2

Support validation and ref_logp sperated

58fea1e

Refine steps

e8708d9

ziqi-wlb closed this Aug 29, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[megatron, sglang, rollout, validation, doc] feat: Add Validation StateMachine to async-rl and Support async ref_logp#2

[megatron, sglang, rollout, validation, doc] feat: Add Validation StateMachine to async-rl and Support async ref_logp#2
ziqi-wlb wants to merge 15 commits intorednote-hilab:mainfrom
ziqi-wlb:async-rl

ziqi-wlb commented Aug 29, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

ziqi-wlb commented Aug 29, 2025

What does this PR do?

Checklist Before Starting

API and Usage Example

Task Group Configuration Examples

Example 1: Complete Separation

Example 2: Hybrid Mode (logp + actor-train grouped)

Example 3: Hybrid Mode (logp + actor-train + ref_logp grouped)

Design & Code Changes

Checklist Before Submitting

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant