[megatron] feat: Integrate Megatron-Bridge and support LoRA/PEFT#4063
[megatron] feat: Integrate Megatron-Bridge and support LoRA/PEFT#4063ISEEKYAN merged 19 commits intoverl-project:mainfrom HollowMan6:mbridge
Conversation
There was a problem hiding this comment.
Code Review
This pull request introduces significant new functionality by integrating Megatron-Bridge to support LoRA and other PEFT methods. The changes are extensive, touching configuration, model creation, and checkpointing logic. The addition of PEFT utilities for handling adapter weights is a good step towards efficient fine-tuning. Overall, the implementation is comprehensive, but I've identified a critical bug in the checkpoint loading logic and a significant maintainability issue due to code duplication that should be addressed.
|
There are 2 remaining issues that might need some help. I'm not sure if this is caused by some bugs in Megatron-Bridge or in this PR, please let me know if you find anything, and I'll also continue debugging:
cc @ISEEKYAN |
|
@HollowMan6 good job! @conver334 has also been developing this feature lately, I will check this PR with her tomorrow and leave feedbacks. |
|
/gemini review |
There was a problem hiding this comment.
Code Review
This pull request integrates Megatron-Bridge to add LoRA/PEFT support, which is a significant and complex feature. The changes touch upon model creation, configuration, and checkpointing to accommodate the new backend and PEFT methods. While the overall direction is good, I've identified a few critical and high-severity issues. There's a critical bug where base model weights are not loaded when using Megatron-Bridge, which would lead to using an uninitialized model. Another critical issue is a placeholder implementation for make_value_model, which will cause problems for critic/value model training. Additionally, there are several instances of significant code duplication for configuration parsing that should be refactored into utility functions to improve maintainability. Addressing these points will be crucial for the stability and correctness of this new functionality.
There was a problem hiding this comment.
Thank you @ISEEKYAN ! I have solved 2 issues mentioned in #4063 (comment) by myself. The first one is that we should set PEFT first before we wrap with DDP, the second one is that I forgot to pass those pp, tp, ep configs to the Megatron-Bridge provider.
I think now this PR is ready to get some systematic review. Aside from #4063 (comment) (which I'm not sure if there's any equivalent option in Megatron-Bridge/how to modify), and the below issue related to export HF weights with PEFT adapters, I have tested with the following scripts with saving/loading checkpoints (with lora.rank=16 or 0 (full params)), and they all look good:
ray job submit -- python3 -m verl.trainer.main_ppo --config-path=config \
--config-name='ppo_megatron_trainer.yaml'\
algorithm.adv_estimator=grpo \
trainer.val_before_train=False \
data.train_files=$HOME/data/gsm8k/train.parquet \
data.val_files=$HOME/data/gsm8k/test.parquet \
data.train_batch_size=16 \
data.max_prompt_length=512 \
data.max_response_length=1024 \
data.filter_overlong_prompts=True \
data.truncation='error' \
data.shuffle=False \
actor_rollout_ref.model.path="Qwen/Qwen2.5-7B-Instruct" \
actor_rollout_ref.model.lora.rank=16 \
actor_rollout_ref.model.lora.alpha=32 \
actor_rollout_ref.model.use_fused_kernels=True \
actor_rollout_ref.actor.optim.lr=1e-6 \
actor_rollout_ref.actor.ppo_mini_batch_size=16 \
actor_rollout_ref.actor.ppo_micro_batch_size_per_gpu=2 \
actor_rollout_ref.actor.megatron.use_mbridge=True \
actor_rollout_ref.actor.megatron.pipeline_model_parallel_size=1 \
actor_rollout_ref.actor.megatron.tensor_model_parallel_size=2 \
actor_rollout_ref.actor.use_kl_loss=True \
actor_rollout_ref.actor.kl_loss_coef=0.001 \
actor_rollout_ref.actor.kl_loss_type=low_var_kl \
actor_rollout_ref.actor.entropy_coeff=0 \
actor_rollout_ref.rollout.log_prob_micro_batch_size_per_gpu=4 \
actor_rollout_ref.rollout.tensor_model_parallel_size=2 \
actor_rollout_ref.rollout.name=vllm \
actor_rollout_ref.rollout.gpu_memory_utilization=0.8 \
actor_rollout_ref.rollout.n=4 \
actor_rollout_ref.ref.log_prob_micro_batch_size_per_gpu=4 \
actor_rollout_ref.ref.megatron.pipeline_model_parallel_size=1 \
actor_rollout_ref.ref.megatron.tensor_model_parallel_size=2 \
algorithm.use_kl_in_reward=False \
trainer.logger='["console"]' \
trainer.project_name='verl_grpo_example_gsm8k_math' \
trainer.experiment_name='qwen2_7b_megatron' \
trainer.n_gpus_per_node=8 \
trainer.nnodes=1 \
trainer.save_freq=-1 \
trainer.test_freq=5 \
trainer.total_epochs=15 $@except that I face the following error when I set pp=2, though it looks like to be model specific.
File "verl/single_controller/ray/base.py", line 700, in func
return getattr(self.worker_dict[key], name)(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "verl/single_controller/base/decorator.py", line 442, in inner
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "verl/utils/transferqueue_utils.py", line 199, in dummy_inner
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "verl/utils/profiler/performance.py", line 105, in f
return self.log(decorated_function, *args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "verl/utils/profiler/performance.py", line 118, in log
output = func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "verl/utils/profiler/profile.py", line 256, in wrapper
return func(self_instance, *args, **kwargs_inner)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "verl/workers/megatron_workers.py", line 870, in compute_log_prob
output, entropys = self.actor.compute_log_prob(data=data, calculate_entropy=True)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "verl/utils/profiler/performance.py", line 105, in f
return self.log(decorated_function, *args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "verl/utils/profiler/performance.py", line 118, in log
output = func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "verl/workers/actor/megatron_actor.py", line 217, in compute_log_prob
output = self.forward_backward_batch(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "verl/workers/actor/megatron_actor.py", line 617, in forward_backward_batch
losses_reduced = forward_backward_func(
^^^^^^^^^^^^^^^^^^^^^^
File "megatron/core/pipeline_parallel/schedules.py", line 2178, in forward_backward_pipelining_without_interleaving
output_tensor, num_tokens = forward_step(
^^^^^^^^^^^^^
File "megatron/core/pipeline_parallel/schedules.py", line 402, in forward_step
output_tensor, loss_func = forward_step_func(data_iterator, model)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "verl/workers/actor/megatron_actor.py", line 590, in forward_step
output = forward_fn(
^^^^^^^^^^^
File "verl/models/mcore/model_forward.py", line 75, in model_forward
output_orig = model(**input_args)
^^^^^^^^^^^^^^^^^^^
File "torch/nn/modules/module.py", line 1773, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "torch/nn/modules/module.py", line 1784, in _call_impl
return forward_call(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "megatron/core/distributed/data_parallel_base.py", line 22, in forward
return self.module(*inputs, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "torch/nn/modules/module.py", line 1773, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "torch/nn/modules/module.py", line 1784, in _call_impl
return forward_call(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "megatron/core/transformer/module.py", line 435, in forward
outputs = self.module(*inputs, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "torch/nn/modules/module.py", line 1773, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "torch/nn/modules/module.py", line 1784, in _call_impl
return forward_call(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "megatron/core/models/gpt/gpt_model.py", line 453, in forward
hidden_states = self.decoder(
^^^^^^^^^^^^^
File "torch/nn/modules/module.py", line 1773, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "torch/nn/modules/module.py", line 1784, in _call_impl
return forward_call(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "megatron/core/transformer/transformer_block.py", line 649, in forward
hidden_states, context = layer(
^^^^^^
File "megatron/core/transformer/transformer_layer.py", line 865, in __call__
return super().__call__(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "megatron/core/transformer/module.py", line 311, in __call__
return super().__call__(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "torch/nn/modules/module.py", line 1773, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "torch/nn/modules/module.py", line 1784, in _call_impl
return forward_call(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "megatron/core/transformer/transformer_layer.py", line 444, in forward
hidden_states, context = self._forward_attention(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "megatron/core/transformer/transformer_layer.py", line 509, in _forward_attention
attention_output_with_bias = self.self_attention(
^^^^^^^^^^^^^^^^^^^^
File "torch/nn/modules/module.py", line 1773, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "torch/nn/modules/module.py", line 1784, in _call_impl
return forward_call(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "megatron/core/transformer/attention.py", line 828, in forward
query = apply_rotary_pos_emb(
^^^^^^^^^^^^^^^^^^^^^
File "megatron/core/models/common/embeddings/rope_utils.py", line 238, in apply_rotary_pos_emb
return _apply_rotary_pos_emb_thd(
^^^^^^^^^^^^^^^^^^^^^^^^^^
File "megatron/core/models/common/embeddings/rope_utils.py", line 181, in _apply_rotary_pos_emb_thd
for x in torch.split(t, seqlens)
^^^^^^^^^^^^^^^^^^^^^^^
File "torch/functional.py", line 222, in split
return tensor.split(split_size_or_sections, dim)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "torch/_tensor.py", line 1052, in split
return torch._VF.split_with_sizes(self, split_size, dim)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
RuntimeError: split_with_sizes expects split_sizes to sum exactly to 6144 (input tensor's size at dimension 0), but got split_sizes=[1098, 1108, 1134, 1168]
cc: @conver334
ccclyu
left a comment
There was a problem hiding this comment.
thanks for the great contribution and one general comment: can we have lora related CI/CD test?
Yes, sure, but I think the piority is that we get weight updating/loading working first: #4063 (comment) after that, we can add some test cases here. |
Good job! For more formal and advanced usage of |
There was a problem hiding this comment.
Code Review
This pull request integrates Megatron-Bridge to support LoRA/PEFT, which is a significant feature addition. The changes are extensive, touching configuration, model initialization, and checkpointing logic across multiple components. The introduction of a vanilla_mbridge flag to switch between bridge implementations and the use of a provider object for Megatron-Bridge are well-structured. The PEFT support, including LoRA configuration parsing, model transformation, and adapter-only checkpointing, appears to be correctly implemented.
My main feedback concerns code duplication. I've identified two instances of repeated logic across several files: one for parsing LoRA configurations and another for stripping LoRA-specific keys from model configurations. Consolidating these into shared utility functions would greatly improve the code's maintainability and reduce the risk of future inconsistencies. I've left specific comments pointing to these areas.
There was a problem hiding this comment.
Code Review
This pull request introduces a significant feature by integrating Megatron-Bridge to support LoRA/PEFT, which is a substantial enhancement for the project. The changes are extensive, touching configuration, model initialization, and checkpointing logic across many files. The implementation correctly distinguishes between the original mbridge and the new Megatron-Bridge and introduces a flexible PEFT configuration system. The core logic for applying PEFT transformations before DDP wrapping via Megatron-Bridge's provider hooks appears to be correctly implemented. However, I've identified a critical issue that would prevent LoRA from being enabled, and a high-severity performance issue related to checkpointing with PEFT. Addressing these points will help ensure the new functionality works as expected and is efficient.
|
Thank you @conver334 and @ISEEKYAN again for reviewing! I've successfully integrated LoRA Merge and related mapping in Megatron-Bridge: NVIDIA-NeMo/Megatron-Bridge#1310, I've also tested this with TP, PP, and EP, so that now I've also tested this PR, both w/ and w/o lora using Megatron-Bridge, aside from the pp issue mentioned in #4063 (review), which I think is probably related to Megatron-Bridge as it works fine when I use vanilla mbridge, I find that when I combine tp with ep for Qwen3-30B-A3B (please check the script I use in this PR |
|
Wondering if you have encountered this issue in your testing: When loading a model with Megatron-Bridge using TP > 1 and PP > 1 across multiple nodes, it looks like each TP rank within a PP group ends up loading the full PP shard (all layers for that PP stage), instead of being further sharded by TP. Encountering an issue like this example:
This should end up with 2 PP groups, each responsible for 5 layers:
Each PP stage should own ~10B / 2 = 5B parameters. For TP, I encounter the following -
Observed Behavior:
This results in OOM very quickly as the DDP wrapper initializes the param buffers for the entire PP stage on every TP rank |
@ryxli However, actually I haven't fully tested TP+PP as I encountered another shape mismatch issue as mentioned in: #4063 (review) so I'm not sure about your situation if it's not caused by using an older commit of this PR, and your issue might be related to the shape mismatch issue as well. Please let me know if you have any further insights, thanks! |
|
Hi @rtz1998,我建议你安装Megatron-Bridge的时候用NVIDIA-NeMo/Megatron-Bridge@a489bed 这个commit,具体可以参考https://github.com/volcengine/verl/pull/4533/changes#diff-ede636eb585931bd42b3b527bff08e1a3fab254eaf02fdd0fa33734e6dcafa0aR125-R126 脚本我用的examples/grpo_trainer/run_qwen2-7b_math_megatron_lora.sh测试的,你也可以看一下哪里不一样 |
依然有问题,重新安装Megatron-Bridge和Megatron-LM后,脚本也一致。更奇怪的是Moe模型是正常的 安装命令如下: 安装指定 commit 的 Megatron-Bridgepython3 -m pip install --no-deps --no-build-isolation 安装指定 commit 的 Megatron-LMpython3 -m pip install --no-deps --no-build-isolation |
|
@rtz1998 I just double checked locally and found that for dense, it's OK when setting I just found that it will also work if you disable activation checkpointing by removing these lines in the training script: +actor_rollout_ref.actor.megatron.override_transformer_config.recompute_method=uniform
+actor_rollout_ref.actor.megatron.override_transformer_config.recompute_granularity=full
+actor_rollout_ref.actor.megatron.override_transformer_config.recompute_num_layers=1 |
ty, when setting pipeline_model_parallel_size to 2, grad norm returns to normal ! |
|
Setting activation checkpointing with PP>1 will still affect convergence a bit (refer to NVIDIA-NeMo/Megatron-Bridge#1750 (comment)), I just opened #4566 to fix LoRA with activation checkpointing under megatron, please feel free to try it out and let me know if you face any issues @rtz1998 |
* [recipe] feat: add Experimental VLA RL Support (#3918)
### What does this PR do?
# Experimental VLA RL Support
This recipe introduces experimental support for training SimpleVLA-OFT,
a VLA model.
A key challenge in VLA RL training, which differs from standard LLM RL
training, is that the environment/simulation phase has a higher
computational overhead than the generation phase. To achieve high
efficiency, RL in this context requires an effective environment
scheduling mechanism in addition to verl's existing efficient training
and inference scheduling. The goal is to reduce the inefficiency caused
by the environment and the model's generation process waiting on each
other.
The core computational model of this PR is inspired by the pipeline
parallelism design from RLinf. It aims to overlap the environment's
execution time with the model's generation time, thereby maximizing
environment utilization.
This PR also proposes a future direction: creating a unified `Env`
class. This class would encapsulate functionalities like tool calling,
MCP, etc., under a single interface. The environment would manage its
state internally, allowing the agent to communicate simply by calling
`step(action)` to submit an action and receive an observation.
Currently, this code is located independently within the `recipes`
folder. Much of the design is tightly coupled with the SimpleVLA model
and the Libero environment, serving as an initial version for
demonstration and discussion.
## Supported Simulators
| Simulator | Env Name | Difference | Benchmark data source |
| --- | --- | --- | --- |
| Mujoco | LiberoEnv | 1. init task from init_states in Libero
dataset<br>2. each env can have different tasks |
https://github.com/Lifelong-Robot-Learning/LIBERO |
| IsaacSim | IsaacEnv | 1. init task from random states, which has more
variety than init_states in dataset<br>2. each sim process must using
the same task for its envs |
https://huggingface.co/datasets/china-sae-robotics/IsaacLabPlayGround_Dataset
|
## Hardware Requirements
* Simulator GPU: NVIDIA L20 or L40 with 48GB memory and RT Cores
Notes:
1. Mujoco can failback to CPU mode with degraded performance if no RT
Cores is available
2. IsaacSim only support GPU with RT Cores
3. RTX GPU will be supported in the future release with remote
deployment feature, but it can not work with colocated mode because of
the limitation of GPU memory capacity.
## Docker image
The Isaac Lab support for libero dataset depends on RobotLearningLab
project from The Isaac Lab Project Developers team. The project is in
the process of being public available and is currently build in this
image with BSD-3-Clause license.
`recipe/vla/run_simpleVLA_libero_grpo.sh` is the example of training
SimpleVLA-OFT with this image:
`vemlp-cn-shanghai.cr.volces.com/preset-images/verl_vla:preview_vla_0.1`
## Disaggregation Mode for Train-Rollout / Simulation
Disaggregate Train-Rollout workers and Simulation workers into different
nodes.
To enable disaggregation mode for Train-Rollout nodes and Simulation
nodes, we need to establish ray connection before running verl.
* On Train-Rollout node (default main node):
```shell
ray start --head --dashboard-host=0.0.0.0 --resources='{"train_rollout": 1}'
```
* On Simulation node:
```shell
ray start --address='<main_node_ip>:6379' --resources='{"sim": 1}'
```
Then run verl on main node **only**. See `run_simpleVLA_isaac_disagg.sh`
for example.
- `env.disagg_sim.enable=True` enable disagg mode
- `trainer.n_env_gpus_per_node` GPUs for simulaton per node
- `trainer.n_rollout_gpus_per_node` GPUs for train-rollout node
- `env.disagg_sim.nnodes` sim node num
- `trainer.nnodes` train-rollout node num
*Tips: you can run the following command on the sim node to check
whether sim workers are scheduled up*
```shell
python -c "import ray; ray.init(address=\"<main_node_ip>:6379\"); print(ray._private.state.available_resources_per_node())"
```
*If you see output pattern like "'train_rollout': 0.9992" and "'sim':
0.9992", the sim workers are scheduled up successfully*
*The actual value depends on your GPUs per node, usually <1 - 1e-4 *
num_gpus>*
**References:**
*
[https://github.com/PRIME-RL/SimpleVLA-RL](https://github.com/PRIME-RL/SimpleVLA-RL)
* [https://github.com/RLinf/RLinf](https://github.com/RLinf/RLinf)
### Checklist Before Starting
- [ ] Search for similar PRs. Paste at least one query link here: ...
- [ ] Format the PR title as `[{modules}] {type}: {description}` (This
will be checked by the CI)
- `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`,
`trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`,
`ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`,
`env`, `tool`, `ckpt`, `doc`, `data`
- If this PR involves multiple modules, separate them with `,` like
`[megatron, fsdp, doc]`
- `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test`
- If this PR breaks any API (CLI arguments, config, function signature,
etc.), add `[BREAKING]` to the beginning of the title.
- Example: `[BREAKING][fsdp, megatron] feat: dynamic batching`
### Test
Using libero dataset with openvla-oft model in batch 8, the result is
the same with data from SimpleVLA-RL paper (batch 64):
<img width="347" height="321" alt="截屏2025-11-12 下午6 05 52"
src="https://github.com/user-attachments/assets/ee562aa6-0245-4dc4-92d9-41a3750c56eb"
/>
<img width="347" height="312" alt="截屏2025-11-12 下午6 05 44"
src="https://github.com/user-attachments/assets/6defc57f-7b07-4af1-a203-01eba7722308"
/>
<img width="694" height="316" alt="截屏2025-11-12 下午6 05 35"
src="https://github.com/user-attachments/assets/4a1270d6-a674-4fa8-bb1e-f12e14ac91fb"
/>
### API and Usage Example
> Demonstrate how the API changes if any, and provide usage example(s)
if possible.
```python
# Add code snippet or script demonstrating how to use this
```
### Design & Code Changes
> Demonstrate the high-level design if this PR is complex, and list the
specific changes.
### Checklist Before Submitting
> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.
- [ x] Read the [Contribute
Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md).
- [ x] Apply [pre-commit
checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [ x] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [ x] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [ x] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).
(If not accessible, please try [the Feishu group
(飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)
---------
Co-authored-by: Kang Sheng <kangsheng.ks@bytedance.com>
Co-authored-by: Chen Haiquan <chenhaiquan@bytedance.com>
Co-authored-by: HanlinDu <1700017832@pku.edu.cn>
* [recipe, data] feat: TransferQueue - Support managing multiple data partitions for Train/Val/Test in controller (#4175)
Support managing multiple data partitions for Train/Val/Test in
controller
### What does this PR do?
> Add **concise** overview of what this PR aims to achieve or
accomplish. Reference related GitHub issues and PRs that help with the
review.
### Checklist Before Starting
- [ ] Search for similar PRs. Paste at least one query link here: ...
- [ ] Format the PR title as `[{modules}] {type}: {description}` (This
will be checked by the CI)
- `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`,
`trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`,
`ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`,
`env`, `tool`, `ckpt`, `doc`, `data`
- If this PR involves multiple modules, separate them with `,` like
`[megatron, fsdp, doc]`
- `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test`
- If this PR breaks any API (CLI arguments, config, function signature,
etc.), add `[BREAKING]` to the beginning of the title.
- Example: `[BREAKING][fsdp, megatron] feat: dynamic batching`
### Test
> For changes that can not be tested by CI (e.g., algorithm
implementation, new model support), validate by experiment(s) and show
results like training curve plots, evaluation results, etc.
### API and Usage Example
> Demonstrate how the API changes if any, and provide usage example(s)
if possible.
```python
# Add code snippet or script demonstrating how to use this
```
### Design & Code Changes
> Demonstrate the high-level design if this PR is complex, and list the
specific changes.
### Checklist Before Submitting
> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.
- [ ] Read the [Contribute
Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md).
- [ ] Apply [pre-commit
checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [ ] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [ ] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [ ] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).
(If not accessible, please try [the Feishu group
(飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)
---------
Signed-off-by: 0oshowero0 <o0shower0o@outlook.com>
Co-authored-by: ji-huazhong <hzji210@gmail.com>
Co-authored-by: 0oshowero0 <o0shower0o@outlook.com>
* [ci] feat: Increase e2e_sft timeout from 25 to 30 minutes (#4279)
### What does this PR do?
- As title
### Checklist Before Starting
- [ ] Search for similar PRs. Paste at least one query link here: ...
- [ ] Format the PR title as `[{modules}] {type}: {description}` (This
will be checked by the CI)
- `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`,
`trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`,
`ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`,
`env`, `tool`, `ckpt`, `doc`, `data`
- If this PR involves multiple modules, separate them with `,` like
`[megatron, fsdp, doc]`
- `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test`
- If this PR breaks any API (CLI arguments, config, function signature,
etc.), add `[BREAKING]` to the beginning of the title.
- Example: `[BREAKING][fsdp, megatron] feat: dynamic batching`
### Test
> For changes that can not be tested by CI (e.g., algorithm
implementation, new model support), validate by experiment(s) and show
results like training curve plots, evaluation results, etc.
### API and Usage Example
> Demonstrate how the API changes if any, and provide usage example(s)
if possible.
```python
# Add code snippet or script demonstrating how to use this
```
### Design & Code Changes
> Demonstrate the high-level design if this PR is complex, and list the
specific changes.
### Checklist Before Submitting
> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.
- [ ] Read the [Contribute
Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md).
- [ ] Apply [pre-commit
checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [ ] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [ ] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [ ] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).
(If not accessible, please try [the Feishu group
(飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)
* [megatron] feat: Integrate Megatron-Bridge and support LoRA/PEFT (#4063)
### What does this PR do?
> Add **concise** overview of what this PR aims to achieve or
accomplish. Reference related GitHub issues and PRs that help with the
review.
This PR aims to add LoRA/PEFT support by integrating Megatron-Bridge
into Verl while maintaining compatibility with mbridge. As a result.
LoRA/PEFT support can be added to Verl with megatron backend.
Resolves #3857
Resolves #3402
Resolves #3279
### Checklist Before Starting
- [X] Search for similar PRs. Paste at least one query link here: ...
- [X] Format the PR title as `[{modules}] {type}: {description}` (This
will be checked by the CI)
- `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`,
`trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`,
`ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`,
`env`, `tool`, `ckpt`, `doc`, `data`
- If this PR involves multiple modules, separate them with `,` like
`[megatron, fsdp, doc]`
- `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test`
- If this PR breaks any API (CLI arguments, config, function signature,
etc.), add `[BREAKING]` to the beginning of the title.
- Example: `[BREAKING][fsdp, megatron] feat: dynamic batching`
### Test
> For changes that can not be tested by CI (e.g., algorithm
implementation, new model support), validate by experiment(s) and show
results like training curve plots, evaluation results, etc.
To be added later
### API and Usage Example
> Demonstrate how the API changes if any, and provide usage example(s)
if possible.
To be added later
```python
# Add code snippet or script demonstrating how to use this
```
### Design & Code Changes
> Demonstrate the high-level design if this PR is complex, and list the
specific changes.
To be added later
### Checklist Before Submitting
> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.
- [X] Read the [Contribute
Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md).
- [X] Apply [pre-commit
checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [ ] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [ ] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [ ] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).
(If not accessible, please try [the Feishu group
(飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)
<sub>✨ Presented to you with <a href="https://macaron.im">Mind Lab</a> -
A Lab for Experiential Intelligence.</sub>
---------
Signed-off-by: Hollow Man <hollowman@opensuse.org>
Co-authored-by: Yan Bai <bayan@nvidia.com>
* [single_controller] feat: support resource_pool split (#4273)
### What does this PR do?
Safer implementation of split resource pool.
relevant design and discussion see
https://github.com/volcengine/verl/issues/4261
add more ci test
### Checklist Before Starting
- [ ] Search for similar PRs. Paste at least one query link here: ...
- [ ] Format the PR title as `[{modules}] {type}: {description}` (This
will be checked by the CI)
- `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`,
`trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`,
`ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`,
`env`, `tool`, `ckpt`, `doc`, `data`
- If this PR involves multiple modules, separate them with `,` like
`[megatron, fsdp, doc]`
- `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test`
- If this PR breaks any API (CLI arguments, config, function signature,
etc.), add `[BREAKING]` to the beginning of the title.
- Example: `[BREAKING][fsdp, megatron] feat: dynamic batching`
### Test
> For changes that can not be tested by CI (e.g., algorithm
implementation, new model support), validate by experiment(s) and show
results like training curve plots, evaluation results, etc.
### API and Usage Example
> Demonstrate how the API changes if any, and provide usage example(s)
if possible.
```python
# Add code snippet or script demonstrating how to use this
```
### Design & Code Changes
> Demonstrate the high-level design if this PR is complex, and list the
specific changes.
### Checklist Before Submitting
> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.
- [ ] Read the [Contribute
Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md).
- [ ] Apply [pre-commit
checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [ ] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [ ] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [ ] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).
(If not accessible, please try [the Feishu group
(飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)
* [recipe] feat: move recipes to new repository verl-recipe (#4283)
### What does this PR do?
Move `recipe/retool` and `recipe/langgraph_agent` to new repository
[verl-recipe](https://github.com/verl-project/verl-recipe).
cc@chenhaiq @0oshowero0 @ArronHZG
* [worker] feat: restore colocate workers based on new splited resource pool (#4282)
### What does this PR do?
feat: restore colocate workers based on new resource pool
previous pr: https://github.com/volcengine/verl/pull/4233
### Checklist Before Starting
- [ ] Search for similar PRs. Paste at least one query link here: ...
- [ ] Format the PR title as `[{modules}] {type}: {description}` (This
will be checked by the CI)
- `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`,
`trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`,
`ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`,
`env`, `tool`, `ckpt`, `doc`, `data`
- If this PR involves multiple modules, separate them with `,` like
`[megatron, fsdp, doc]`
- `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test`
- If this PR breaks any API (CLI arguments, config, function signature,
etc.), add `[BREAKING]` to the beginning of the title.
- Example: `[BREAKING][fsdp, megatron] feat: dynamic batching`
### Test
> For changes that can not be tested by CI (e.g., algorithm
implementation, new model support), validate by experiment(s) and show
results like training curve plots, evaluation results, etc.
### API and Usage Example
> Demonstrate how the API changes if any, and provide usage example(s)
if possible.
```python
# Add code snippet or script demonstrating how to use this
```
### Design & Code Changes
> Demonstrate the high-level design if this PR is complex, and list the
specific changes.
### Checklist Before Submitting
> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.
- [ ] Read the [Contribute
Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md).
- [ ] Apply [pre-commit
checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [ ] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [ ] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [ ] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).
(If not accessible, please try [the Feishu group
(飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)
* [misc] feat: Add `actor_rollout_ref.actor.calculate_entropy` for entropy fwd (#4239)
Currently, `entropys` is only calculated in non-bypass when calculating
`old_log_prob`
* [trainer] feat: Self-Normalized Importance Sampling (#3980)
Self-Normalized Importance Sampling for rollout:backwards mismatch, adds
`algorithm.rollout_is_self_norm`
SNIS applied to `rollout_is_weights`
• `geo_mean`: per-sequence geometric mean
• `seq-mean-token-mean` / `seq-mean-token-sum`: per-sequence masked
mean/sum
• `token-mean`, `seq-mean-token-sum-norm`: global denominator
Given $w_i=\dfrac{p(x_i)}{q(x_i)}$, the self-normalized estimator is
$$\widehat{\mu}_{\text{SNIS}}=\frac{\sum_{i=1}^{N} w_i\cdot
f(x_i)}{\sum_{i=1}^{N} w_i}$$
```yaml
algorithm:
rollout_is: true
rollout_is_self_norm: true
```
Example
<img width="1443" height="987" alt="image"
src="https://github.com/user-attachments/assets/7ce88eb4-7eb5-4ce6-83e4-b61803d45536"
/>
Experimental, only `geo_mean` has been properly tested, please test
yourself, most of these are not standard SNIS
Sequence index $b$, token $t$, mask $m_{b t}\in{0,1}$, per-token IS
weights $w_{b t}>0$
Per-sequence $`w'_{bt}=\tfrac{w_{bt}}{d_b}`$
- `geo_mean` $\quad d_b=\exp\Bigg(\frac{\sum_t m_{bt}\cdot \log
w_{bt}}{\sum_t m_{bt}}\Bigg)$
- `seq-mean-token-mean` $\quad d_b=\frac{\sum_t m_{bt}\cdot
w_{bt}}{\sum_t m_{bt}}$
- `seq-mean-token-sum` $\quad d_b=\sum_t m_{bt}\cdot w_{bt}$
Global $`w'_{bt}=\tfrac{w_{bt}}{d}`$
- `token_mean` $\quad d=\frac{\sum_{b,t} m_{bt}\cdot w_{bt}}{\sum_{b,t}
m_{bt}}$
- `seq-mean-token-sum-norm` given $T$ token dimension length
`weights_full.shape[-1]` $\quad d=\frac{\sum_{b,t} m_{bt}\cdot
w_{bt}}{T}$
* [ci, megatron] fix: add `rotary_pos_cos_sin` to forward (#4291)
### What does this PR do?
Fix
https://github.com/volcengine/verl/actions/runs/19672442639/job/56349016000
`rotary_pos_cos_sin` is used to store combined cos/sin embeddings, which
is exclusively for flash infer rope and not related to Verl's code here,
but we need this param to be present in the kwargs so that this can
support higher version of mcore:
https://github.com/NVIDIA/Megatron-LM/blob/6f655365fd1dcdbcd996f3be850c2c80b33f9eaf/megatron/core/models/gpt/gpt_model.py#L311
Note that eventually it's better to migrate everything to
mbridge/Megatron-Bridge, this can be done in a separate PR and this is a
temporary solution for the CI.
Refer to
https://github.com/ISEEKYAN/mbridge/blob/89eb10887887bc74853f89a4de258c0702932a1c/mbridge/models/qwen2_5_vl/attention.py#L41C9-L41C27
### Checklist Before Starting
- [X] Search for similar PRs. Paste at least one query link here: ...
- [X] Format the PR title as `[{modules}] {type}: {description}` (This
will be checked by the CI)
- `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`,
`trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`,
`ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`,
`env`, `tool`, `ckpt`, `doc`, `data`
- If this PR involves multiple modules, separate them with `,` like
`[megatron, fsdp, doc]`
- `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test`
- If this PR breaks any API (CLI arguments, config, function signature,
etc.), add `[BREAKING]` to the beginning of the title.
- Example: `[BREAKING][fsdp, megatron] feat: dynamic batching`
### Test
> For changes that can not be tested by CI (e.g., algorithm
implementation, new model support), validate by experiment(s) and show
results like training curve plots, evaluation results, etc.
### API and Usage Example
> Demonstrate how the API changes if any, and provide usage example(s)
if possible.
```python
# Add code snippet or script demonstrating how to use this
```
### Design & Code Changes
> Demonstrate the high-level design if this PR is complex, and list the
specific changes.
### Checklist Before Submitting
> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.
- [X] Read the [Contribute
Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md).
- [X] Apply [pre-commit
checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [ ] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [ ] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [ ] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).
(If not accessible, please try [the Feishu group
(飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)
Signed-off-by: Hollow Man <hollowman@opensuse.org>
* [megatron] fix: pass trust_remote_code to get_generation_config (#4196)
### What does this PR do?
Pass on `trust_remote_code` to `get_generation_config` so that the
fallback code path that creates it from the model config also respects
it.
### Checklist Before Starting
- [X] Search for similar PRs. Paste at least one query link here:
https://github.com/volcengine/verl/pulls?q=is%3Apr+is%3Aopen+trust_remote_code
- [X] Format the PR title as `[{modules}] {type}: {description}` (This
will be checked by the CI)
- `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`,
`trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`,
`ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`,
`env`, `tool`, `ckpt`, `doc`, `data`
- If this PR involves multiple modules, separate them with `,` like
`[megatron, fsdp, doc]`
- `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test`
- If this PR breaks any API (CLI arguments, config, function signature,
etc.), add `[BREAKING]` to the beginning of the title.
- Example: `[BREAKING][fsdp, megatron] feat: dynamic batching`
### Checklist Before Submitting
> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.
- [X] Read the [Contribute
Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md).
- [X] Apply [pre-commit
checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [X] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [X] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [ ] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).
(If not accessible, please try [the Feishu group
(飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)
Co-authored-by: Jonas Prellberg <jonas.prellberg@deepl.com>
* [misc] fix: support nested datastructure in dataproto to convert to tensordict (#4296)
## What does this PR do?
Fixes `ValueError: TensorDict conversion only supports... Got <class
'list'>` when converting `DataProto` with nested non-tensor data to
`TensorDict`.
**Problem:** Agent loop workflows with nested structures (lists of
lists, lists of dicts) in `non_tensor_batch` failed during
`to_tensordict()` conversion:
- `turn_scores`: `[[], [0.5, 0.8]]` - lists of varying lengths
- `reward_extra_info`: `[{"acc": 1.0}, {"acc": 0.0}]` - lists of dicts
- `raw_prompt`: `[[{"content": "...", "role": "user"}]]` - lists of
lists of dicts
- `tool_rewards`: `[[0.0], []]` - lists of lists
**Solution:** Wrap nested data in `NonTensorStack` (TensorDict's
supported type for non-tensor sequences) instead of converting to plain
Python lists.
**Impact:** Enables agent loop and multi-turn dialogue workflows to use
DataProto ↔ TensorDict conversions without errors.
---
## Test
Added 5 comprehensive tests in `tests/test_protocol_on_cpu.py`:
1. **`test_to_tensordict_with_nested_lists`** - Lists of lists (e.g.,
`turn_scores`)
2. **`test_to_tensordict_with_nested_dicts`** - Lists of dicts (e.g.,
`reward_extra_info`)
3. **`test_to_tensordict_with_complex_nested_structures`** - Lists of
lists of dicts (e.g., `raw_prompt`)
4. **`test_to_tensordict_and_back_with_nested_data`** - Round-trip data
integrity
5. **`test_to_tensordict_agent_loop_scenario`** - Real-world agent loop
scenario with all nested types
All tests verify:
- ✅ No conversion errors
- ✅ Data accessibility and correctness
- ✅ Round-trip conversion preserves data
Run tests:
```bash
pytest tests/test_protocol_on_cpu.py -k "test_to_tensordict" -v
```
---
## Design & Code Changes
### Modified Files
**1. `verl/protocol.py` (lines 1118-1133)**
```python
# Before: Plain list conversion (fails for nested structures)
tensor_batch[key] = val.tolist()
# After: Wrap in NonTensorStack
from tensordict.tensorclass import NonTensorData, NonTensorStack
tensor_batch[key] = NonTensorStack.from_list([NonTensorData(item) for item in val])
```
**2. `verl/utils/tensordict_utils.py` (lines 109-127)**
```python
# Add validation skip for NonTensorStack objects (already properly formatted)
if isinstance(val, NonTensorStack):
if batch_size is None:
batch_size = len(val)
continue
```
### Why This Works
- `NonTensorStack` is TensorDict's native type for storing sequences of
arbitrary Python objects
- Preserves nested structures (lists, dicts, complex objects) without
serialization
- Maintains batch semantics - each element corresponds to one batch
sample
- Enables round-trip conversion: `DataProto → TensorDict → DataProto`
without data loss
---
## Checklist Before Submitting
- [x] Read the Contribute Guide
- [x] Apply pre-commit checks
- [ ] Add/Update documentation (if needed - this is a bug fix, not new
API)
- [x] Add unit tests covering all code paths (5 new tests added)
- [ ] CI request (ready for review)
---
### Design & Code Changes
> Demonstrate the high-level design if this PR is complex, and list the
specific changes.
### Checklist Before Submitting
> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.
- [ ] Read the [Contribute
Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md).
- [ ] Apply [pre-commit
checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [ ] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [ ] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [ ] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).
(If not accessible, please try [the Feishu group
(飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)
Signed-off-by: petersh6 <petershengwhu@gmail.com>
* [ci] fix: use local hf model path (#4299)
### What does this PR do?
As title
* [data] feat: TransferQueue - Support AgentLoop performance metrics & minor fix (#4289)
### What does this PR do?
1. Support performance metrics statistics that requires tensor data
2. Add stand-alone config structure for TransferQueue
3. Modify TransferQueue initialization process to suit for multiple
backends
4. Fix `create_transferqueue_client` usage
5. Unify some function names
6. Add TODO
### Checklist Before Starting
- [x] Search for similar PRs. Paste at least one query link here: ...
- [x] Format the PR title as `[{modules}] {type}: {description}` (This
will be checked by the CI)
- `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`,
`trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`,
`ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`,
`env`, `tool`, `ckpt`, `doc`, `data`
- If this PR involves multiple modules, separate them with `,` like
`[megatron, fsdp, doc]`
- `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test`
- If this PR breaks any API (CLI arguments, config, function signature,
etc.), add `[BREAKING]` to the beginning of the title.
- Example: `[BREAKING][fsdp, megatron] feat: dynamic batching`
### Test
> For changes that can not be tested by CI (e.g., algorithm
implementation, new model support), validate by experiment(s) and show
results like training curve plots, evaluation results, etc.
### API and Usage Example
> Demonstrate how the API changes if any, and provide usage example(s)
if possible.
```python
# Add code snippet or script demonstrating how to use this
```
### Design & Code Changes
> Demonstrate the high-level design if this PR is complex, and list the
specific changes.
### Checklist Before Submitting
> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.
- [x] Read the [Contribute
Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md).
- [x] Apply [pre-commit
checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [x] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [x] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [ ] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).
(If not accessible, please try [the Feishu group
(飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)
---------
Signed-off-by: 0oshowero0 <o0shower0o@outlook.com>
* [recipe] feat: support reward_loop for recipe/fully_async_policy (#4224)
### What does this PR do?
This PR mainly supports **reward_loop** for
**recipe/fully_async_policy**.
The main changes are as follows:
* refactor recipe/fully_async_policy to support reward_loop
* fix recipe/fully_async_policy final validation bug
* extract `_agent_loop_postprocess`: Move the padding/mask/position-id
post-processing logic out of _run_agent_loop into a dedicated function
`async def _agent_loop_postprocess(...)`.
- [x] Search for similar PRs. Paste at least one query link here: ...
- [x] Format the PR title as `[{modules}] {type}: {description}` (This
will be checked by the CI)
- `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`,
`trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`,
`ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`,
`env`, `tool`, `ckpt`, `doc`, `data`
- If this PR involves multiple modules, separate them with `,` like
`[megatron, fsdp, doc]`
- `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test`
- If this PR breaks any API (CLI arguments, config, function signature,
etc.), add `[BREAKING]` to the beginning of the title.
- Example: `[BREAKING][fsdp, megatron] feat: dynamic batching`
### Test
> For changes that can not be tested by CI (e.g., algorithm
implementation, new model support), validate by experiment(s) and show
results like training curve plots, evaluation results, etc.
### API and Usage Example
> Demonstrate how the API changes if any, and provide usage example(s)
if possible.
```python
# Add code snippet or script demonstrating how to use this
```
### Design & Code Changes
> Demonstrate the high-level design if this PR is complex, and list the
specific changes.
### Checklist Before Submitting
> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.
- [x] Read the [Contribute
Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md).
- [x] Apply [pre-commit
checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [x] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [x] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [x] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).
(If not accessible, please try [the Feishu group
(飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)
---------
Co-authored-by: WP <yrzr12345678@gmail.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
* [misc] fix: fix list conversion in get_tensordict (#4304)
---
## What does this PR do?
This PR fixes a `ValueError` that occurs when converting `DataProto`
containing nested Python structures (lists of lists, lists of dicts,
etc.) to `TensorDict`. The issue manifested during distributed training
when `non_tensor_batch` fields like `turn_scores`, `reward_extra_info`,
`raw_prompt`, and `tool_rewards` contained nested structures that
`TensorDict` couldn't handle directly.
**Root Cause:**
`TensorDict` cannot accept raw nested Python objects like `[[], [0.5,
0.8]]` or `[{"acc": 1.0}, {"acc": 0.0}]`. These must be wrapped using
`NonTensorData` and organized into `NonTensorStack` for proper handling.
**Solution:**
- Explicitly wrap each element in nested lists with `NonTensorData`
before creating `NonTensorStack`
- Added helper functions `assign_non_tensor_stack()` and
`assign_non_tensor()` in `tensordict_utils.py`
- Updated `DataProto.to_tensordict()` and `DataProto.from_tensordict()`
for proper round-trip conversion
- Added automatic nested structure detection in `get_tensordict()`
Previous PR: [4296 ](https://github.com/volcengine/verl/pull/4296)
---
## Test
### Unit Tests Added
**`tests/test_protocol_v2_on_cpu.py`** (8 new tests):
- `test_assign_non_tensor_stack_with_nested_lists` - Lists of lists
- `test_assign_non_tensor_stack_with_nested_dicts` - Lists of dicts
- `test_assign_non_tensor_stack_with_complex_nested` - Lists of lists of
dicts
- `test_assign_non_tensor_with_auto_detection` - Auto type detection
- `test_get_tensordict_with_nested_lists` - Integration with
get_tensordict
- `test_get_tensordict_with_nested_dicts` - Integration with
get_tensordict
- `test_get_tensordict_with_complex_nested_structures` - Complex nested
case
- `test_get_tensordict_agent_loop_scenario` - Real-world agent loop
scenario
### How to Run Tests
```bash
# Test tensordict_utils nested structure support
pytest third_party/open_verl/tests/test_protocol_v2_on_cpu.py -v
```
### Validation
✅ All new tests pass
✅ Existing tests remain passing
✅ Successfully handles empty lists in nested structures (e.g.,
`turn_scores = [[], [0.5, 0.8]]`)
✅ Round-trip conversion (DataProto → TensorDict → DataProto) preserves
data integrity
---
### Checklist Before Submitting
> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.
- [ ] Read the [Contribute
Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md).
- [ ] Apply [pre-commit
checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [ ] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [ ] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [ ] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).
(If not accessible, please try [the Feishu group
(飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)
---------
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
* [hardware] fix: Workaround for torch-npu's lack of support for creating nested tensors from NPU tensors. (#4309)
### What does this PR do?
As per title.
### Checklist Before Starting
- [x] Search for similar PRs. Paste at least one query link here: ...
- [x] Format the PR title as `[{modules}] {type}: {description}` (This
will be checked by the CI)
- `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`,
`trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`,
`ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`,
`env`, `tool`, `ckpt`, `doc`, `data`
- If this PR involves multiple modules, separate them with `,` like
`[megatron, fsdp, doc]`
- `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test`
- If this PR breaks any API (CLI arguments, config, function signature,
etc.), add `[BREAKING]` to the beginning of the title.
- Example: `[BREAKING][fsdp, megatron] feat: dynamic batching`
### Test
> For changes that can not be tested by CI (e.g., algorithm
implementation, new model support), validate by experiment(s) and show
results like training curve plots, evaluation results, etc.
### API and Usage Example
> Demonstrate how the API changes if any, and provide usage example(s)
if possible.
```python
# Add code snippet or script demonstrating how to use this
```
### Design & Code Changes
> Demonstrate the high-level design if this PR is complex, and list the
specific changes.
### Checklist Before Submitting
> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.
- [x] Read the [Contribute
Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md).
- [x] Apply [pre-commit
checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [ ] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [ ] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [ ] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).
(If not accessible, please try [the Feishu group
(飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)
* [rollout] fix: some compatibility changes in agent loop and reward (#4293)
### What does this PR do?
Some compatibility changes, including
* `agent_loop`:
* compatible with model without system prompt
* compatible with other multi-modal model with processor available
* `reward`:
* allow override_config for huggingface model
### Test
* train Qwen VL and other internal multi-modal models with customized
reward on agent loop
* CI
### Checklist Before Submitting
> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.
- [x] Read the [Contribute
Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md).
- [x] Apply [pre-commit
checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [x] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [x] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [x] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).
(If not accessible, please try [the Feishu group
(飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)
---------
Signed-off-by: peng.wu <peng.wu@bytedance.com>
* [worker] fix: do not pass router address and tokenizer is their value is none (#4310)
### What does this PR do?
as title
### Checklist Before Starting
- [ ] Search for similar PRs. Paste at least one query link here: ...
- [ ] Format the PR title as `[{modules}] {type}: {description}` (This
will be checked by the CI)
- `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`,
`trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`,
`ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`,
`env`, `tool`, `ckpt`, `doc`, `data`
- If this PR involves multiple modules, separate them with `,` like
`[megatron, fsdp, doc]`
- `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test`
- If this PR breaks any API (CLI arguments, config, function signature,
etc.), add `[BREAKING]` to the beginning of the title.
- Example: `[BREAKING][fsdp, megatron] feat: dynamic batching`
### Test
> For changes that can not be tested by CI (e.g., algorithm
implementation, new model support), validate by experiment(s) and show
results like training curve plots, evaluation results, etc.
### API and Usage Example
> Demonstrate how the API changes if any, and provide usage example(s)
if possible.
```python
# Add code snippet or script demonstrating how to use this
```
### Design & Code Changes
> Demonstrate the high-level design if this PR is complex, and list the
specific changes.
### Checklist Before Submitting
> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.
- [ ] Read the [Contribute
Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md).
- [ ] Apply [pre-commit
checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [ ] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [ ] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [ ] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).
(If not accessible, please try [the Feishu group
(飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)
* [doc] chore: Update ascend quickstart doc (#4321)
### What does this PR do?
For `vllm==0.11.0`, manually pip install `requirement/build.txt` is not
forcely required, can be removed.
### Checklist Before Starting
- [ ] Search for similar PRs. Paste at least one query link here: ...
- [ ] Format the PR title as `[{modules}] {type}: {description}` (This
will be checked by the CI)
- `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`,
`trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`,
`ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`,
`env`, `tool`, `ckpt`, `doc`, `data`
- If this PR involves multiple modules, separate them with `,` like
`[megatron, fsdp, doc]`
- `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test`
- If this PR breaks any API (CLI arguments, config, function signature,
etc.), add `[BREAKING]` to the beginning of the title.
- Example: `[BREAKING][fsdp, megatron] feat: dynamic batching`
### Test
Not related.
### API and Usage Example
Not related.
### Design & Code Changes
Not related.
### Checklist Before Submitting
> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.
- [ ] Read the [Contribute
Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md).
- [ ] Apply [pre-commit
checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [ ] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [ ] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [ ] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).
(If not accessible, please try [the Feishu group
(飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)
* [misc] feat: add more utils of tensordict (#4322)
### What does this PR do?
- Add get/get_keys/pop/pop_keys of tensordict
### Checklist Before Starting
- [ ] Search for similar PRs. Paste at least one query link here: ...
- [ ] Format the PR title as `[{modules}] {type}: {description}` (This
will be checked by the CI)
- `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`,
`trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`,
`ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`,
`env`, `tool`, `ckpt`, `doc`, `data`
- If this PR involves multiple modules, separate them with `,` like
`[megatron, fsdp, doc]`
- `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test`
- If this PR breaks any API (CLI arguments, config, function signature,
etc.), add `[BREAKING]` to the beginning of the title.
- Example: `[BREAKING][fsdp, megatron] feat: dynamic batching`
### Test
> For changes that can not be tested by CI (e.g., algorithm
implementation, new model support), validate by experiment(s) and show
results like training curve plots, evaluation results, etc.
### API and Usage Example
> Demonstrate how the API changes if any, and provide usage example(s)
if possible.
```python
# Add code snippet or script demonstrating how to use this
```
### Design & Code Changes
> Demonstrate the high-level design if this PR is complex, and list the
specific changes.
### Checklist Before Submitting
> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.
- [ ] Read the [Contribute
Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md).
- [ ] Apply [pre-commit
checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [ ] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [ ] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [ ] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).
(If not accessible, please try [the Feishu group
(飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)
---------
Co-authored-by: Guangming Sheng <petershengwhu@gmail.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
* [recipe] fix: Fixed scripts for one_step_off_policy async not implemention (#4350)
### What does this PR do?
as titile
### Checklist Before Starting
- [x] Search for similar PRs. Paste at least one query link here: ...
- [x] Format the PR title as `[{modules}] {type}: {description}` (This
will be checked by the CI)
- `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`,
`trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`,
`ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`,
`env`, `tool`, `ckpt`, `doc`, `data`
- If this PR involves multiple modules, separate them with `,` like
`[megatron, fsdp, doc]`
- `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test`
- If this PR breaks any API (CLI arguments, config, function signature,
etc.), add `[BREAKING]` to the beginning of the title.
- Example: `[BREAKING][fsdp, megatron] feat: dynamic batching`
### Test
> For changes that can not be tested by CI (e.g., algorithm
implementation, new model support), validate by experiment(s) and show
results like training curve plots, evaluation results, etc.
### API and Usage Example
> Demonstrate how the API changes if any, and provide usage example(s)
if possible.
```python
# Add code snippet or script demonstrating how to use this
```
### Design & Code Changes
> Demonstrate the high-level design if this PR is complex, and list the
specific changes.
### Checklist Before Submitting
> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.
- [x] Read the [Contribute
Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md).
- [x] Apply [pre-commit
checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [ ] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [ ] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [ ] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).
(If not accessible, please try [the Feishu group
(飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)
* [model] feat: refactor engine folder structure (#4352)
* [recipe] feat: move char count recipe to verl-recipe (#4351)
### What does this PR do?
- As title.
- https://github.com/verl-project/verl-recipe/pull/4
### Checklist Before Starting
- [ ] Search for similar PRs. Paste at least one query link here: ...
- [ ] Format the PR title as `[{modules}] {type}: {description}` (This
will be checked by the CI)
- `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`,
`trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`,
`ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`,
`env`, `tool`, `ckpt`, `doc`, `data`
- If this PR involves multiple modules, separate them with `,` like
`[megatron, fsdp, doc]`
- `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test`
- If this PR breaks any API (CLI arguments, config, function signature,
etc.), add `[BREAKING]` to the beginning of the title.
- Example: `[BREAKING][fsdp, megatron] feat: dynamic batching`
### Test
> For changes that can not be tested by CI (e.g., algorithm
implementation, new model support), validate by experiment(s) and show
results like training curve plots, evaluation results, etc.
### API and Usage Example
> Demonstrate how the API changes if any, and provide usage example(s)
if possible.
```python
# Add code snippet or script demonstrating how to use this
```
### Design & Code Changes
> Demonstrate the high-level design if this PR is complex, and list the
specific changes.
### Checklist Before Submitting
> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.
- [ ] Read the [Contribute
Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md).
- [ ] Apply [pre-commit
checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [ ] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [ ] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [ ] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).
(If not accessible, please try [the Feishu group
(飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)
* [ci] chore: switch ascend ci calculation resource (#4347)
### What does this PR do?
To address the frequent queuing issues with the current `e2e_ascend` CI
check, we plan to switch the computing resources behind it and increase
the resource quantity to enable concurrent execution of `e2e_ascend` CI
check.
Additionally, all batch_size ,rollout_n and global_training_steps in
Ascend test cases are all minimized to accelerate running procedure.
### Checklist Before Starting
- [ ] Search for similar PRs. Paste at least one query link here: ...
- [ ] Format the PR title as `[{modules}] {type}: {description}` (This
will be checked by the CI)
- `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`,
`trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`,
`ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`,
`env`, `tool`, `ckpt`, `doc`, `data`
- If this PR involves multiple modules, separate them with `,` like
`[megatron, fsdp, doc]`
- `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test`
- If this PR breaks any API (CLI arguments, config, function signature,
etc.), add `[BREAKING]` to the beginning of the title.
- Example: `[BREAKING][fsdp, megatron] feat: dynamic batching`
### Test
Not related.
### API and Usage Example
Not related.
### Design & Code Changes
Not related.
### Checklist Before Submitting
> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.
- [ ] Read the [Contribute
Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md).
- [ ] Apply [pre-commit
checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [ ] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [ ] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [ ] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).
(If not accessible, please try [the Feishu group
(飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)
* feat(actor): add loss_scale_factor for seq-mean-token-sum-norm mode (#4360)
Add configurable loss_scale_factor to replace hardcoded divisor in
seq-mean-token-sum-norm loss aggregation. Users can set a constant value
to ensure consistent normalization throughout training.
- Add loss_scale_factor param to agg_loss() in core_algos.py
- Add loss_scale_factor field to ActorConfig
- Propagate via global_batch_info to policy loss functions
- Update all direct agg_loss calls in trainers and recipes
- Update YAML configs and DrGRPO documentation
### What does this PR do?
> Add **concise** overview of what this PR aims to achieve or
accomplish. Reference related GitHub issues and PRs that help with the
review.
### Checklist Before Starting
- [ ] Search for similar PRs. Paste at least one query link here: ...
- [ ] Format the PR title as `[{modules}] {type}: {description}` (This
will be checked by the CI)
- `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`,
`trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`,
`ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`,
`env`, `tool`, `ckpt`, `doc`, `data`
- If this PR involves multiple modules, separate them with `,` like
`[megatron, fsdp, doc]`
- `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test`
- If this PR breaks any API (CLI arguments, config, function signature,
etc.), add `[BREAKING]` to the beginning of the title.
- Example: `[BREAKING][fsdp, megatron] feat: dynamic batching`
### Test
> For changes that can not be tested by CI (e.g., algorithm
implementation, new model support), validate by experiment(s) and show
results like training curve plots, evaluation results, etc.
### API and Usage Example
> Demonstrate how the API changes if any, and provide usage example(s)
if possible.
```python
# Add code snippet or script demonstrating how to use this
```
### Design & Code Changes
> Demonstrate the high-level design if this PR is complex…
### What does this PR do? So that by default, it will follow the TP size. Related to verl-project#4063 (comment) ### Checklist Before Starting - [X] Search for similar PRs. Paste at least one query link here: ... - [X] Format the PR title as `[{modules}] {type}: {description}` (This will be checked by the CI) - `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`, `trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`, `ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`, `env`, `tool`, `ckpt`, `doc`, `data` - If this PR involves multiple modules, separate them with `,` like `[megatron, fsdp, doc]` - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test` - If this PR breaks any API (CLI arguments, config, function signature, etc.), add `[BREAKING]` to the beginning of the title. - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching` ### Test > For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluation results, etc. ### API and Usage Example > Demonstrate how the API changes if any, and provide usage example(s) if possible. ```python # Add code snippet or script demonstrating how to use this ``` ### Design & Code Changes > Demonstrate the high-level design if this PR is complex, and list the specific changes. ### Checklist Before Submitting > [!IMPORTANT] > Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review. - [X] Read the [Contribute Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md). - [X] Apply [pre-commit checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting): `pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always` - [X] Add / Update [the documentation](https://github.com/volcengine/verl/tree/main/docs). - [X] Add unit or end-to-end test(s) to [the CI workflow](https://github.com/volcengine/verl/tree/main/.github/workflows) to cover all the code. If not feasible, explain why: ... - [X] Once your PR is ready for CI, send a message in [the `ci-request` channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the `verl` Slack workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ). (If not accessible, please try [the Feishu group (飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).) Signed-off-by: Hollow Man <hollowman@opensuse.org>
…l-project#4063) ### What does this PR do? > Add **concise** overview of what this PR aims to achieve or accomplish. Reference related GitHub issues and PRs that help with the review. This PR aims to add LoRA/PEFT support by integrating Megatron-Bridge into Verl while maintaining compatibility with mbridge. As a result. LoRA/PEFT support can be added to Verl with megatron backend. Resolves verl-project#3857 Resolves verl-project#3402 Resolves verl-project#3279 ### Checklist Before Starting - [X] Search for similar PRs. Paste at least one query link here: ... - [X] Format the PR title as `[{modules}] {type}: {description}` (This will be checked by the CI) - `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`, `trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`, `ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`, `env`, `tool`, `ckpt`, `doc`, `data` - If this PR involves multiple modules, separate them with `,` like `[megatron, fsdp, doc]` - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test` - If this PR breaks any API (CLI arguments, config, function signature, etc.), add `[BREAKING]` to the beginning of the title. - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching` ### Test > For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluation results, etc. To be added later ### API and Usage Example > Demonstrate how the API changes if any, and provide usage example(s) if possible. To be added later ```python # Add code snippet or script demonstrating how to use this ``` ### Design & Code Changes > Demonstrate the high-level design if this PR is complex, and list the specific changes. To be added later ### Checklist Before Submitting > [!IMPORTANT] > Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review. - [X] Read the [Contribute Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md). - [X] Apply [pre-commit checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting): `pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always` - [ ] Add / Update [the documentation](https://github.com/volcengine/verl/tree/main/docs). - [ ] Add unit or end-to-end test(s) to [the CI workflow](https://github.com/volcengine/verl/tree/main/.github/workflows) to cover all the code. If not feasible, explain why: ... - [ ] Once your PR is ready for CI, send a message in [the `ci-request` channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the `verl` Slack workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ). (If not accessible, please try [the Feishu group (飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).) <sub>✨ Presented to you with <a href="https://macaron.im">Mind Lab</a> - A Lab for Experiential Intelligence.</sub> --------- Signed-off-by: Hollow Man <hollowman@opensuse.org> Co-authored-by: Yan Bai <bayan@nvidia.com>
|
Now PEFT recompute fix has been merged into Megatron-Bridge main branch, you can install via |
### What does this PR do? Resolves: - #4063 (comment) - #4514 Related to: #4566 NVIDIA-NeMo/Megatron-Bridge@953aabf ### Checklist Before Starting - [X] Search for similar PRs. Paste at least one query link here: ... - [X] Format the PR title as `[{modules}] {type}: {description}` (This will be checked by the CI) - `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`, `trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`, `ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`, `env`, `tool`, `ckpt`, `doc`, `data`, `cfg`, `reward` - If this PR involves multiple modules, separate them with `,` like `[megatron, fsdp, doc]` - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test` - If this PR breaks any API (CLI arguments, config, function signature, etc.), add `[BREAKING]` to the beginning of the title. - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching` ### Test > For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluation results, etc. ### API and Usage Example > Demonstrate how the API changes if any, and provide usage example(s) if possible. ```python # Add code snippet or script demonstrating how to use this ``` ### Design & Code Changes > Demonstrate the high-level design if this PR is complex, and list the specific changes. ### Checklist Before Submitting > [!IMPORTANT] > Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review. - [X] Read the [Contribute Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md). - [X] Apply [pre-commit checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting): `pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always` - [X] Add / Update [the documentation](https://github.com/volcengine/verl/tree/main/docs). - [X] Add unit or end-to-end test(s) to [the CI workflow](https://github.com/volcengine/verl/tree/main/.github/workflows) to cover all the code. If not feasible, explain why: ... - [X] Once your PR is ready for CI, send a message in [the `ci-request` channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the `verl` Slack workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ). (If not accessible, please try [the Feishu group (飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).) Signed-off-by: Hollow Man <hollowman@opensuse.org>
* [recipe] feat: add Experimental VLA RL Support (#3918)
### What does this PR do?
# Experimental VLA RL Support
This recipe introduces experimental support for training SimpleVLA-OFT,
a VLA model.
A key challenge in VLA RL training, which differs from standard LLM RL
training, is that the environment/simulation phase has a higher
computational overhead than the generation phase. To achieve high
efficiency, RL in this context requires an effective environment
scheduling mechanism in addition to verl's existing efficient training
and inference scheduling. The goal is to reduce the inefficiency caused
by the environment and the model's generation process waiting on each
other.
The core computational model of this PR is inspired by the pipeline
parallelism design from RLinf. It aims to overlap the environment's
execution time with the model's generation time, thereby maximizing
environment utilization.
This PR also proposes a future direction: creating a unified `Env`
class. This class would encapsulate functionalities like tool calling,
MCP, etc., under a single interface. The environment would manage its
state internally, allowing the agent to communicate simply by calling
`step(action)` to submit an action and receive an observation.
Currently, this code is located independently within the `recipes`
folder. Much of the design is tightly coupled with the SimpleVLA model
and the Libero environment, serving as an initial version for
demonstration and discussion.
## Supported Simulators
| Simulator | Env Name | Difference | Benchmark data source |
| --- | --- | --- | --- |
| Mujoco | LiberoEnv | 1. init task from init_states in Libero
dataset<br>2. each env can have different tasks |
https://github.com/Lifelong-Robot-Learning/LIBERO |
| IsaacSim | IsaacEnv | 1. init task from random states, which has more
variety than init_states in dataset<br>2. each sim process must using
the same task for its envs |
https://huggingface.co/datasets/china-sae-robotics/IsaacLabPlayGround_Dataset
|
## Hardware Requirements
* Simulator GPU: NVIDIA L20 or L40 with 48GB memory and RT Cores
Notes:
1. Mujoco can failback to CPU mode with degraded performance if no RT
Cores is available
2. IsaacSim only support GPU with RT Cores
3. RTX GPU will be supported in the future release with remote
deployment feature, but it can not work with colocated mode because of
the limitation of GPU memory capacity.
## Docker image
The Isaac Lab support for libero dataset depends on RobotLearningLab
project from The Isaac Lab Project Developers team. The project is in
the process of being public available and is currently build in this
image with BSD-3-Clause license.
`recipe/vla/run_simpleVLA_libero_grpo.sh` is the example of training
SimpleVLA-OFT with this image:
`vemlp-cn-shanghai.cr.volces.com/preset-images/verl_vla:preview_vla_0.1`
## Disaggregation Mode for Train-Rollout / Simulation
Disaggregate Train-Rollout workers and Simulation workers into different
nodes.
To enable disaggregation mode for Train-Rollout nodes and Simulation
nodes, we need to establish ray connection before running verl.
* On Train-Rollout node (default main node):
```shell
ray start --head --dashboard-host=0.0.0.0 --resources='{"train_rollout": 1}'
```
* On Simulation node:
```shell
ray start --address='<main_node_ip>:6379' --resources='{"sim": 1}'
```
Then run verl on main node **only**. See `run_simpleVLA_isaac_disagg.sh`
for example.
- `env.disagg_sim.enable=True` enable disagg mode
- `trainer.n_env_gpus_per_node` GPUs for simulaton per node
- `trainer.n_rollout_gpus_per_node` GPUs for train-rollout node
- `env.disagg_sim.nnodes` sim node num
- `trainer.nnodes` train-rollout node num
*Tips: you can run the following command on the sim node to check
whether sim workers are scheduled up*
```shell
python -c "import ray; ray.init(address=\"<main_node_ip>:6379\"); print(ray._private.state.available_resources_per_node())"
```
*If you see output pattern like "'train_rollout': 0.9992" and "'sim':
0.9992", the sim workers are scheduled up successfully*
*The actual value depends on your GPUs per node, usually <1 - 1e-4 *
num_gpus>*
**References:**
*
[https://github.com/PRIME-RL/SimpleVLA-RL](https://github.com/PRIME-RL/SimpleVLA-RL)
* [https://github.com/RLinf/RLinf](https://github.com/RLinf/RLinf)
### Checklist Before Starting
- [ ] Search for similar PRs. Paste at least one query link here: ...
- [ ] Format the PR title as `[{modules}] {type}: {description}` (This
will be checked by the CI)
- `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`,
`trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`,
`ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`,
`env`, `tool`, `ckpt`, `doc`, `data`
- If this PR involves multiple modules, separate them with `,` like
`[megatron, fsdp, doc]`
- `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test`
- If this PR breaks any API (CLI arguments, config, function signature,
etc.), add `[BREAKING]` to the beginning of the title.
- Example: `[BREAKING][fsdp, megatron] feat: dynamic batching`
### Test
Using libero dataset with openvla-oft model in batch 8, the result is
the same with data from SimpleVLA-RL paper (batch 64):
<img width="347" height="321" alt="截屏2025-11-12 下午6 05 52"
src="https://github.com/user-attachments/assets/ee562aa6-0245-4dc4-92d9-41a3750c56eb"
/>
<img width="347" height="312" alt="截屏2025-11-12 下午6 05 44"
src="https://github.com/user-attachments/assets/6defc57f-7b07-4af1-a203-01eba7722308"
/>
<img width="694" height="316" alt="截屏2025-11-12 下午6 05 35"
src="https://github.com/user-attachments/assets/4a1270d6-a674-4fa8-bb1e-f12e14ac91fb"
/>
### API and Usage Example
> Demonstrate how the API changes if any, and provide usage example(s)
if possible.
```python
# Add code snippet or script demonstrating how to use this
```
### Design & Code Changes
> Demonstrate the high-level design if this PR is complex, and list the
specific changes.
### Checklist Before Submitting
> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.
- [ x] Read the [Contribute
Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md).
- [ x] Apply [pre-commit
checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [ x] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [ x] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [ x] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).
(If not accessible, please try [the Feishu group
(飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)
---------
Co-authored-by: Kang Sheng <kangsheng.ks@bytedance.com>
Co-authored-by: Chen Haiquan <chenhaiquan@bytedance.com>
Co-authored-by: HanlinDu <1700017832@pku.edu.cn>
* [recipe, data] feat: TransferQueue - Support managing multiple data partitions for Train/Val/Test in controller (#4175)
Support managing multiple data partitions for Train/Val/Test in
controller
### What does this PR do?
> Add **concise** overview of what this PR aims to achieve or
accomplish. Reference related GitHub issues and PRs that help with the
review.
### Checklist Before Starting
- [ ] Search for similar PRs. Paste at least one query link here: ...
- [ ] Format the PR title as `[{modules}] {type}: {description}` (This
will be checked by the CI)
- `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`,
`trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`,
`ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`,
`env`, `tool`, `ckpt`, `doc`, `data`
- If this PR involves multiple modules, separate them with `,` like
`[megatron, fsdp, doc]`
- `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test`
- If this PR breaks any API (CLI arguments, config, function signature,
etc.), add `[BREAKING]` to the beginning of the title.
- Example: `[BREAKING][fsdp, megatron] feat: dynamic batching`
### Test
> For changes that can not be tested by CI (e.g., algorithm
implementation, new model support), validate by experiment(s) and show
results like training curve plots, evaluation results, etc.
### API and Usage Example
> Demonstrate how the API changes if any, and provide usage example(s)
if possible.
```python
# Add code snippet or script demonstrating how to use this
```
### Design & Code Changes
> Demonstrate the high-level design if this PR is complex, and list the
specific changes.
### Checklist Before Submitting
> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.
- [ ] Read the [Contribute
Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md).
- [ ] Apply [pre-commit
checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [ ] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [ ] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [ ] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).
(If not accessible, please try [the Feishu group
(飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)
---------
Signed-off-by: 0oshowero0 <o0shower0o@outlook.com>
Co-authored-by: ji-huazhong <hzji210@gmail.com>
Co-authored-by: 0oshowero0 <o0shower0o@outlook.com>
* [ci] feat: Increase e2e_sft timeout from 25 to 30 minutes (#4279)
### What does this PR do?
- As title
### Checklist Before Starting
- [ ] Search for similar PRs. Paste at least one query link here: ...
- [ ] Format the PR title as `[{modules}] {type}: {description}` (This
will be checked by the CI)
- `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`,
`trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`,
`ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`,
`env`, `tool`, `ckpt`, `doc`, `data`
- If this PR involves multiple modules, separate them with `,` like
`[megatron, fsdp, doc]`
- `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test`
- If this PR breaks any API (CLI arguments, config, function signature,
etc.), add `[BREAKING]` to the beginning of the title.
- Example: `[BREAKING][fsdp, megatron] feat: dynamic batching`
### Test
> For changes that can not be tested by CI (e.g., algorithm
implementation, new model support), validate by experiment(s) and show
results like training curve plots, evaluation results, etc.
### API and Usage Example
> Demonstrate how the API changes if any, and provide usage example(s)
if possible.
```python
# Add code snippet or script demonstrating how to use this
```
### Design & Code Changes
> Demonstrate the high-level design if this PR is complex, and list the
specific changes.
### Checklist Before Submitting
> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.
- [ ] Read the [Contribute
Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md).
- [ ] Apply [pre-commit
checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [ ] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [ ] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [ ] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).
(If not accessible, please try [the Feishu group
(飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)
* [megatron] feat: Integrate Megatron-Bridge and support LoRA/PEFT (#4063)
### What does this PR do?
> Add **concise** overview of what this PR aims to achieve or
accomplish. Reference related GitHub issues and PRs that help with the
review.
This PR aims to add LoRA/PEFT support by integrating Megatron-Bridge
into Verl while maintaining compatibility with mbridge. As a result.
LoRA/PEFT support can be added to Verl with megatron backend.
Resolves #3857
Resolves #3402
Resolves #3279
### Checklist Before Starting
- [X] Search for similar PRs. Paste at least one query link here: ...
- [X] Format the PR title as `[{modules}] {type}: {description}` (This
will be checked by the CI)
- `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`,
`trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`,
`ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`,
`env`, `tool`, `ckpt`, `doc`, `data`
- If this PR involves multiple modules, separate them with `,` like
`[megatron, fsdp, doc]`
- `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test`
- If this PR breaks any API (CLI arguments, config, function signature,
etc.), add `[BREAKING]` to the beginning of the title.
- Example: `[BREAKING][fsdp, megatron] feat: dynamic batching`
### Test
> For changes that can not be tested by CI (e.g., algorithm
implementation, new model support), validate by experiment(s) and show
results like training curve plots, evaluation results, etc.
To be added later
### API and Usage Example
> Demonstrate how the API changes if any, and provide usage example(s)
if possible.
To be added later
```python
# Add code snippet or script demonstrating how to use this
```
### Design & Code Changes
> Demonstrate the high-level design if this PR is complex, and list the
specific changes.
To be added later
### Checklist Before Submitting
> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.
- [X] Read the [Contribute
Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md).
- [X] Apply [pre-commit
checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [ ] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [ ] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [ ] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).
(If not accessible, please try [the Feishu group
(飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)
<sub>✨ Presented to you with <a href="https://macaron.im">Mind Lab</a> -
A Lab for Experiential Intelligence.</sub>
---------
Signed-off-by: Hollow Man <hollowman@opensuse.org>
Co-authored-by: Yan Bai <bayan@nvidia.com>
* [single_controller] feat: support resource_pool split (#4273)
### What does this PR do?
Safer implementation of split resource pool.
relevant design and discussion see
https://github.com/volcengine/verl/issues/4261
add more ci test
### Checklist Before Starting
- [ ] Search for similar PRs. Paste at least one query link here: ...
- [ ] Format the PR title as `[{modules}] {type}: {description}` (This
will be checked by the CI)
- `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`,
`trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`,
`ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`,
`env`, `tool`, `ckpt`, `doc`, `data`
- If this PR involves multiple modules, separate them with `,` like
`[megatron, fsdp, doc]`
- `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test`
- If this PR breaks any API (CLI arguments, config, function signature,
etc.), add `[BREAKING]` to the beginning of the title.
- Example: `[BREAKING][fsdp, megatron] feat: dynamic batching`
### Test
> For changes that can not be tested by CI (e.g., algorithm
implementation, new model support), validate by experiment(s) and show
results like training curve plots, evaluation results, etc.
### API and Usage Example
> Demonstrate how the API changes if any, and provide usage example(s)
if possible.
```python
# Add code snippet or script demonstrating how to use this
```
### Design & Code Changes
> Demonstrate the high-level design if this PR is complex, and list the
specific changes.
### Checklist Before Submitting
> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.
- [ ] Read the [Contribute
Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md).
- [ ] Apply [pre-commit
checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [ ] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [ ] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [ ] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).
(If not accessible, please try [the Feishu group
(飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)
* [recipe] feat: move recipes to new repository verl-recipe (#4283)
### What does this PR do?
Move `recipe/retool` and `recipe/langgraph_agent` to new repository
[verl-recipe](https://github.com/verl-project/verl-recipe).
cc@chenhaiq @0oshowero0 @ArronHZG
* [worker] feat: restore colocate workers based on new splited resource pool (#4282)
### What does this PR do?
feat: restore colocate workers based on new resource pool
previous pr: https://github.com/volcengine/verl/pull/4233
### Checklist Before Starting
- [ ] Search for similar PRs. Paste at least one query link here: ...
- [ ] Format the PR title as `[{modules}] {type}: {description}` (This
will be checked by the CI)
- `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`,
`trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`,
`ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`,
`env`, `tool`, `ckpt`, `doc`, `data`
- If this PR involves multiple modules, separate them with `,` like
`[megatron, fsdp, doc]`
- `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test`
- If this PR breaks any API (CLI arguments, config, function signature,
etc.), add `[BREAKING]` to the beginning of the title.
- Example: `[BREAKING][fsdp, megatron] feat: dynamic batching`
### Test
> For changes that can not be tested by CI (e.g., algorithm
implementation, new model support), validate by experiment(s) and show
results like training curve plots, evaluation results, etc.
### API and Usage Example
> Demonstrate how the API changes if any, and provide usage example(s)
if possible.
```python
# Add code snippet or script demonstrating how to use this
```
### Design & Code Changes
> Demonstrate the high-level design if this PR is complex, and list the
specific changes.
### Checklist Before Submitting
> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.
- [ ] Read the [Contribute
Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md).
- [ ] Apply [pre-commit
checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [ ] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [ ] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [ ] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).
(If not accessible, please try [the Feishu group
(飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)
* [misc] feat: Add `actor_rollout_ref.actor.calculate_entropy` for entropy fwd (#4239)
Currently, `entropys` is only calculated in non-bypass when calculating
`old_log_prob`
* [trainer] feat: Self-Normalized Importance Sampling (#3980)
Self-Normalized Importance Sampling for rollout:backwards mismatch, adds
`algorithm.rollout_is_self_norm`
SNIS applied to `rollout_is_weights`
• `geo_mean`: per-sequence geometric mean
• `seq-mean-token-mean` / `seq-mean-token-sum`: per-sequence masked
mean/sum
• `token-mean`, `seq-mean-token-sum-norm`: global denominator
Given $w_i=\dfrac{p(x_i)}{q(x_i)}$, the self-normalized estimator is
$$\widehat{\mu}_{\text{SNIS}}=\frac{\sum_{i=1}^{N} w_i\cdot
f(x_i)}{\sum_{i=1}^{N} w_i}$$
```yaml
algorithm:
rollout_is: true
rollout_is_self_norm: true
```
Example
<img width="1443" height="987" alt="image"
src="https://github.com/user-attachments/assets/7ce88eb4-7eb5-4ce6-83e4-b61803d45536"
/>
Experimental, only `geo_mean` has been properly tested, please test
yourself, most of these are not standard SNIS
Sequence index $b$, token $t$, mask $m_{b t}\in{0,1}$, per-token IS
weights $w_{b t}>0$
Per-sequence $`w'_{bt}=\tfrac{w_{bt}}{d_b}`$
- `geo_mean` $\quad d_b=\exp\Bigg(\frac{\sum_t m_{bt}\cdot \log
w_{bt}}{\sum_t m_{bt}}\Bigg)$
- `seq-mean-token-mean` $\quad d_b=\frac{\sum_t m_{bt}\cdot
w_{bt}}{\sum_t m_{bt}}$
- `seq-mean-token-sum` $\quad d_b=\sum_t m_{bt}\cdot w_{bt}$
Global $`w'_{bt}=\tfrac{w_{bt}}{d}`$
- `token_mean` $\quad d=\frac{\sum_{b,t} m_{bt}\cdot w_{bt}}{\sum_{b,t}
m_{bt}}$
- `seq-mean-token-sum-norm` given $T$ token dimension length
`weights_full.shape[-1]` $\quad d=\frac{\sum_{b,t} m_{bt}\cdot
w_{bt}}{T}$
* [ci, megatron] fix: add `rotary_pos_cos_sin` to forward (#4291)
### What does this PR do?
Fix
https://github.com/volcengine/verl/actions/runs/19672442639/job/56349016000
`rotary_pos_cos_sin` is used to store combined cos/sin embeddings, which
is exclusively for flash infer rope and not related to Verl's code here,
but we need this param to be present in the kwargs so that this can
support higher version of mcore:
https://github.com/NVIDIA/Megatron-LM/blob/6f655365fd1dcdbcd996f3be850c2c80b33f9eaf/megatron/core/models/gpt/gpt_model.py#L311
Note that eventually it's better to migrate everything to
mbridge/Megatron-Bridge, this can be done in a separate PR and this is a
temporary solution for the CI.
Refer to
https://github.com/ISEEKYAN/mbridge/blob/89eb10887887bc74853f89a4de258c0702932a1c/mbridge/models/qwen2_5_vl/attention.py#L41C9-L41C27
### Checklist Before Starting
- [X] Search for similar PRs. Paste at least one query link here: ...
- [X] Format the PR title as `[{modules}] {type}: {description}` (This
will be checked by the CI)
- `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`,
`trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`,
`ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`,
`env`, `tool`, `ckpt`, `doc`, `data`
- If this PR involves multiple modules, separate them with `,` like
`[megatron, fsdp, doc]`
- `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test`
- If this PR breaks any API (CLI arguments, config, function signature,
etc.), add `[BREAKING]` to the beginning of the title.
- Example: `[BREAKING][fsdp, megatron] feat: dynamic batching`
### Test
> For changes that can not be tested by CI (e.g., algorithm
implementation, new model support), validate by experiment(s) and show
results like training curve plots, evaluation results, etc.
### API and Usage Example
> Demonstrate how the API changes if any, and provide usage example(s)
if possible.
```python
# Add code snippet or script demonstrating how to use this
```
### Design & Code Changes
> Demonstrate the high-level design if this PR is complex, and list the
specific changes.
### Checklist Before Submitting
> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.
- [X] Read the [Contribute
Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md).
- [X] Apply [pre-commit
checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [ ] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [ ] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [ ] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).
(If not accessible, please try [the Feishu group
(飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)
Signed-off-by: Hollow Man <hollowman@opensuse.org>
* [megatron] fix: pass trust_remote_code to get_generation_config (#4196)
### What does this PR do?
Pass on `trust_remote_code` to `get_generation_config` so that the
fallback code path that creates it from the model config also respects
it.
### Checklist Before Starting
- [X] Search for similar PRs. Paste at least one query link here:
https://github.com/volcengine/verl/pulls?q=is%3Apr+is%3Aopen+trust_remote_code
- [X] Format the PR title as `[{modules}] {type}: {description}` (This
will be checked by the CI)
- `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`,
`trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`,
`ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`,
`env`, `tool`, `ckpt`, `doc`, `data`
- If this PR involves multiple modules, separate them with `,` like
`[megatron, fsdp, doc]`
- `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test`
- If this PR breaks any API (CLI arguments, config, function signature,
etc.), add `[BREAKING]` to the beginning of the title.
- Example: `[BREAKING][fsdp, megatron] feat: dynamic batching`
### Checklist Before Submitting
> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.
- [X] Read the [Contribute
Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md).
- [X] Apply [pre-commit
checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [X] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [X] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [ ] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).
(If not accessible, please try [the Feishu group
(飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)
Co-authored-by: Jonas Prellberg <jonas.prellberg@deepl.com>
* [misc] fix: support nested datastructure in dataproto to convert to tensordict (#4296)
## What does this PR do?
Fixes `ValueError: TensorDict conversion only supports... Got <class
'list'>` when converting `DataProto` with nested non-tensor data to
`TensorDict`.
**Problem:** Agent loop workflows with nested structures (lists of
lists, lists of dicts) in `non_tensor_batch` failed during
`to_tensordict()` conversion:
- `turn_scores`: `[[], [0.5, 0.8]]` - lists of varying lengths
- `reward_extra_info`: `[{"acc": 1.0}, {"acc": 0.0}]` - lists of dicts
- `raw_prompt`: `[[{"content": "...", "role": "user"}]]` - lists of
lists of dicts
- `tool_rewards`: `[[0.0], []]` - lists of lists
**Solution:** Wrap nested data in `NonTensorStack` (TensorDict's
supported type for non-tensor sequences) instead of converting to plain
Python lists.
**Impact:** Enables agent loop and multi-turn dialogue workflows to use
DataProto ↔ TensorDict conversions without errors.
---
## Test
Added 5 comprehensive tests in `tests/test_protocol_on_cpu.py`:
1. **`test_to_tensordict_with_nested_lists`** - Lists of lists (e.g.,
`turn_scores`)
2. **`test_to_tensordict_with_nested_dicts`** - Lists of dicts (e.g.,
`reward_extra_info`)
3. **`test_to_tensordict_with_complex_nested_structures`** - Lists of
lists of dicts (e.g., `raw_prompt`)
4. **`test_to_tensordict_and_back_with_nested_data`** - Round-trip data
integrity
5. **`test_to_tensordict_agent_loop_scenario`** - Real-world agent loop
scenario with all nested types
All tests verify:
- ✅ No conversion errors
- ✅ Data accessibility and correctness
- ✅ Round-trip conversion preserves data
Run tests:
```bash
pytest tests/test_protocol_on_cpu.py -k "test_to_tensordict" -v
```
---
## Design & Code Changes
### Modified Files
**1. `verl/protocol.py` (lines 1118-1133)**
```python
# Before: Plain list conversion (fails for nested structures)
tensor_batch[key] = val.tolist()
# After: Wrap in NonTensorStack
from tensordict.tensorclass import NonTensorData, NonTensorStack
tensor_batch[key] = NonTensorStack.from_list([NonTensorData(item) for item in val])
```
**2. `verl/utils/tensordict_utils.py` (lines 109-127)**
```python
# Add validation skip for NonTensorStack objects (already properly formatted)
if isinstance(val, NonTensorStack):
if batch_size is None:
batch_size = len(val)
continue
```
### Why This Works
- `NonTensorStack` is TensorDict's native type for storing sequences of
arbitrary Python objects
- Preserves nested structures (lists, dicts, complex objects) without
serialization
- Maintains batch semantics - each element corresponds to one batch
sample
- Enables round-trip conversion: `DataProto → TensorDict → DataProto`
without data loss
---
## Checklist Before Submitting
- [x] Read the Contribute Guide
- [x] Apply pre-commit checks
- [ ] Add/Update documentation (if needed - this is a bug fix, not new
API)
- [x] Add unit tests covering all code paths (5 new tests added)
- [ ] CI request (ready for review)
---
### Design & Code Changes
> Demonstrate the high-level design if this PR is complex, and list the
specific changes.
### Checklist Before Submitting
> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.
- [ ] Read the [Contribute
Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md).
- [ ] Apply [pre-commit
checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [ ] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [ ] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [ ] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).
(If not accessible, please try [the Feishu group
(飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)
Signed-off-by: petersh6 <petershengwhu@gmail.com>
* [ci] fix: use local hf model path (#4299)
### What does this PR do?
As title
* [data] feat: TransferQueue - Support AgentLoop performance metrics & minor fix (#4289)
### What does this PR do?
1. Support performance metrics statistics that requires tensor data
2. Add stand-alone config structure for TransferQueue
3. Modify TransferQueue initialization process to suit for multiple
backends
4. Fix `create_transferqueue_client` usage
5. Unify some function names
6. Add TODO
### Checklist Before Starting
- [x] Search for similar PRs. Paste at least one query link here: ...
- [x] Format the PR title as `[{modules}] {type}: {description}` (This
will be checked by the CI)
- `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`,
`trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`,
`ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`,
`env`, `tool`, `ckpt`, `doc`, `data`
- If this PR involves multiple modules, separate them with `,` like
`[megatron, fsdp, doc]`
- `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test`
- If this PR breaks any API (CLI arguments, config, function signature,
etc.), add `[BREAKING]` to the beginning of the title.
- Example: `[BREAKING][fsdp, megatron] feat: dynamic batching`
### Test
> For changes that can not be tested by CI (e.g., algorithm
implementation, new model support), validate by experiment(s) and show
results like training curve plots, evaluation results, etc.
### API and Usage Example
> Demonstrate how the API changes if any, and provide usage example(s)
if possible.
```python
# Add code snippet or script demonstrating how to use this
```
### Design & Code Changes
> Demonstrate the high-level design if this PR is complex, and list the
specific changes.
### Checklist Before Submitting
> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.
- [x] Read the [Contribute
Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md).
- [x] Apply [pre-commit
checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [x] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [x] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [ ] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).
(If not accessible, please try [the Feishu group
(飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)
---------
Signed-off-by: 0oshowero0 <o0shower0o@outlook.com>
* [recipe] feat: support reward_loop for recipe/fully_async_policy (#4224)
### What does this PR do?
This PR mainly supports **reward_loop** for
**recipe/fully_async_policy**.
The main changes are as follows:
* refactor recipe/fully_async_policy to support reward_loop
* fix recipe/fully_async_policy final validation bug
* extract `_agent_loop_postprocess`: Move the padding/mask/position-id
post-processing logic out of _run_agent_loop into a dedicated function
`async def _agent_loop_postprocess(...)`.
- [x] Search for similar PRs. Paste at least one query link here: ...
- [x] Format the PR title as `[{modules}] {type}: {description}` (This
will be checked by the CI)
- `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`,
`trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`,
`ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`,
`env`, `tool`, `ckpt`, `doc`, `data`
- If this PR involves multiple modules, separate them with `,` like
`[megatron, fsdp, doc]`
- `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test`
- If this PR breaks any API (CLI arguments, config, function signature,
etc.), add `[BREAKING]` to the beginning of the title.
- Example: `[BREAKING][fsdp, megatron] feat: dynamic batching`
### Test
> For changes that can not be tested by CI (e.g., algorithm
implementation, new model support), validate by experiment(s) and show
results like training curve plots, evaluation results, etc.
### API and Usage Example
> Demonstrate how the API changes if any, and provide usage example(s)
if possible.
```python
# Add code snippet or script demonstrating how to use this
```
### Design & Code Changes
> Demonstrate the high-level design if this PR is complex, and list the
specific changes.
### Checklist Before Submitting
> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.
- [x] Read the [Contribute
Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md).
- [x] Apply [pre-commit
checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [x] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [x] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [x] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).
(If not accessible, please try [the Feishu group
(飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)
---------
Co-authored-by: WP <yrzr12345678@gmail.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
* [misc] fix: fix list conversion in get_tensordict (#4304)
---
## What does this PR do?
This PR fixes a `ValueError` that occurs when converting `DataProto`
containing nested Python structures (lists of lists, lists of dicts,
etc.) to `TensorDict`. The issue manifested during distributed training
when `non_tensor_batch` fields like `turn_scores`, `reward_extra_info`,
`raw_prompt`, and `tool_rewards` contained nested structures that
`TensorDict` couldn't handle directly.
**Root Cause:**
`TensorDict` cannot accept raw nested Python objects like `[[], [0.5,
0.8]]` or `[{"acc": 1.0}, {"acc": 0.0}]`. These must be wrapped using
`NonTensorData` and organized into `NonTensorStack` for proper handling.
**Solution:**
- Explicitly wrap each element in nested lists with `NonTensorData`
before creating `NonTensorStack`
- Added helper functions `assign_non_tensor_stack()` and
`assign_non_tensor()` in `tensordict_utils.py`
- Updated `DataProto.to_tensordict()` and `DataProto.from_tensordict()`
for proper round-trip conversion
- Added automatic nested structure detection in `get_tensordict()`
Previous PR: [4296 ](https://github.com/volcengine/verl/pull/4296)
---
## Test
### Unit Tests Added
**`tests/test_protocol_v2_on_cpu.py`** (8 new tests):
- `test_assign_non_tensor_stack_with_nested_lists` - Lists of lists
- `test_assign_non_tensor_stack_with_nested_dicts` - Lists of dicts
- `test_assign_non_tensor_stack_with_complex_nested` - Lists of lists of
dicts
- `test_assign_non_tensor_with_auto_detection` - Auto type detection
- `test_get_tensordict_with_nested_lists` - Integration with
get_tensordict
- `test_get_tensordict_with_nested_dicts` - Integration with
get_tensordict
- `test_get_tensordict_with_complex_nested_structures` - Complex nested
case
- `test_get_tensordict_agent_loop_scenario` - Real-world agent loop
scenario
### How to Run Tests
```bash
# Test tensordict_utils nested structure support
pytest third_party/open_verl/tests/test_protocol_v2_on_cpu.py -v
```
### Validation
✅ All new tests pass
✅ Existing tests remain passing
✅ Successfully handles empty lists in nested structures (e.g.,
`turn_scores = [[], [0.5, 0.8]]`)
✅ Round-trip conversion (DataProto → TensorDict → DataProto) preserves
data integrity
---
### Checklist Before Submitting
> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.
- [ ] Read the [Contribute
Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md).
- [ ] Apply [pre-commit
checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [ ] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [ ] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [ ] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).
(If not accessible, please try [the Feishu group
(飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)
---------
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
* [hardware] fix: Workaround for torch-npu's lack of support for creating nested tensors from NPU tensors. (#4309)
### What does this PR do?
As per title.
### Checklist Before Starting
- [x] Search for similar PRs. Paste at least one query link here: ...
- [x] Format the PR title as `[{modules}] {type}: {description}` (This
will be checked by the CI)
- `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`,
`trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`,
`ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`,
`env`, `tool`, `ckpt`, `doc`, `data`
- If this PR involves multiple modules, separate them with `,` like
`[megatron, fsdp, doc]`
- `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test`
- If this PR breaks any API (CLI arguments, config, function signature,
etc.), add `[BREAKING]` to the beginning of the title.
- Example: `[BREAKING][fsdp, megatron] feat: dynamic batching`
### Test
> For changes that can not be tested by CI (e.g., algorithm
implementation, new model support), validate by experiment(s) and show
results like training curve plots, evaluation results, etc.
### API and Usage Example
> Demonstrate how the API changes if any, and provide usage example(s)
if possible.
```python
# Add code snippet or script demonstrating how to use this
```
### Design & Code Changes
> Demonstrate the high-level design if this PR is complex, and list the
specific changes.
### Checklist Before Submitting
> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.
- [x] Read the [Contribute
Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md).
- [x] Apply [pre-commit
checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [ ] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [ ] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [ ] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).
(If not accessible, please try [the Feishu group
(飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)
* [rollout] fix: some compatibility changes in agent loop and reward (#4293)
### What does this PR do?
Some compatibility changes, including
* `agent_loop`:
* compatible with model without system prompt
* compatible with other multi-modal model with processor available
* `reward`:
* allow override_config for huggingface model
### Test
* train Qwen VL and other internal multi-modal models with customized
reward on agent loop
* CI
### Checklist Before Submitting
> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.
- [x] Read the [Contribute
Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md).
- [x] Apply [pre-commit
checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [x] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [x] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [x] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).
(If not accessible, please try [the Feishu group
(飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)
---------
Signed-off-by: peng.wu <peng.wu@bytedance.com>
* [worker] fix: do not pass router address and tokenizer is their value is none (#4310)
### What does this PR do?
as title
### Checklist Before Starting
- [ ] Search for similar PRs. Paste at least one query link here: ...
- [ ] Format the PR title as `[{modules}] {type}: {description}` (This
will be checked by the CI)
- `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`,
`trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`,
`ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`,
`env`, `tool`, `ckpt`, `doc`, `data`
- If this PR involves multiple modules, separate them with `,` like
`[megatron, fsdp, doc]`
- `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test`
- If this PR breaks any API (CLI arguments, config, function signature,
etc.), add `[BREAKING]` to the beginning of the title.
- Example: `[BREAKING][fsdp, megatron] feat: dynamic batching`
### Test
> For changes that can not be tested by CI (e.g., algorithm
implementation, new model support), validate by experiment(s) and show
results like training curve plots, evaluation results, etc.
### API and Usage Example
> Demonstrate how the API changes if any, and provide usage example(s)
if possible.
```python
# Add code snippet or script demonstrating how to use this
```
### Design & Code Changes
> Demonstrate the high-level design if this PR is complex, and list the
specific changes.
### Checklist Before Submitting
> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.
- [ ] Read the [Contribute
Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md).
- [ ] Apply [pre-commit
checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [ ] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [ ] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [ ] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).
(If not accessible, please try [the Feishu group
(飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)
* [doc] chore: Update ascend quickstart doc (#4321)
### What does this PR do?
For `vllm==0.11.0`, manually pip install `requirement/build.txt` is not
forcely required, can be removed.
### Checklist Before Starting
- [ ] Search for similar PRs. Paste at least one query link here: ...
- [ ] Format the PR title as `[{modules}] {type}: {description}` (This
will be checked by the CI)
- `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`,
`trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`,
`ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`,
`env`, `tool`, `ckpt`, `doc`, `data`
- If this PR involves multiple modules, separate them with `,` like
`[megatron, fsdp, doc]`
- `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test`
- If this PR breaks any API (CLI arguments, config, function signature,
etc.), add `[BREAKING]` to the beginning of the title.
- Example: `[BREAKING][fsdp, megatron] feat: dynamic batching`
### Test
Not related.
### API and Usage Example
Not related.
### Design & Code Changes
Not related.
### Checklist Before Submitting
> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.
- [ ] Read the [Contribute
Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md).
- [ ] Apply [pre-commit
checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [ ] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [ ] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [ ] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).
(If not accessible, please try [the Feishu group
(飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)
* [misc] feat: add more utils of tensordict (#4322)
### What does this PR do?
- Add get/get_keys/pop/pop_keys of tensordict
### Checklist Before Starting
- [ ] Search for similar PRs. Paste at least one query link here: ...
- [ ] Format the PR title as `[{modules}] {type}: {description}` (This
will be checked by the CI)
- `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`,
`trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`,
`ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`,
`env`, `tool`, `ckpt`, `doc`, `data`
- If this PR involves multiple modules, separate them with `,` like
`[megatron, fsdp, doc]`
- `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test`
- If this PR breaks any API (CLI arguments, config, function signature,
etc.), add `[BREAKING]` to the beginning of the title.
- Example: `[BREAKING][fsdp, megatron] feat: dynamic batching`
### Test
> For changes that can not be tested by CI (e.g., algorithm
implementation, new model support), validate by experiment(s) and show
results like training curve plots, evaluation results, etc.
### API and Usage Example
> Demonstrate how the API changes if any, and provide usage example(s)
if possible.
```python
# Add code snippet or script demonstrating how to use this
```
### Design & Code Changes
> Demonstrate the high-level design if this PR is complex, and list the
specific changes.
### Checklist Before Submitting
> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.
- [ ] Read the [Contribute
Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md).
- [ ] Apply [pre-commit
checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [ ] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [ ] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [ ] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).
(If not accessible, please try [the Feishu group
(飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)
---------
Co-authored-by: Guangming Sheng <petershengwhu@gmail.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
* [recipe] fix: Fixed scripts for one_step_off_policy async not implemention (#4350)
### What does this PR do?
as titile
### Checklist Before Starting
- [x] Search for similar PRs. Paste at least one query link here: ...
- [x] Format the PR title as `[{modules}] {type}: {description}` (This
will be checked by the CI)
- `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`,
`trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`,
`ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`,
`env`, `tool`, `ckpt`, `doc`, `data`
- If this PR involves multiple modules, separate them with `,` like
`[megatron, fsdp, doc]`
- `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test`
- If this PR breaks any API (CLI arguments, config, function signature,
etc.), add `[BREAKING]` to the beginning of the title.
- Example: `[BREAKING][fsdp, megatron] feat: dynamic batching`
### Test
> For changes that can not be tested by CI (e.g., algorithm
implementation, new model support), validate by experiment(s) and show
results like training curve plots, evaluation results, etc.
### API and Usage Example
> Demonstrate how the API changes if any, and provide usage example(s)
if possible.
```python
# Add code snippet or script demonstrating how to use this
```
### Design & Code Changes
> Demonstrate the high-level design if this PR is complex, and list the
specific changes.
### Checklist Before Submitting
> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.
- [x] Read the [Contribute
Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md).
- [x] Apply [pre-commit
checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [ ] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [ ] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [ ] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).
(If not accessible, please try [the Feishu group
(飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)
* [model] feat: refactor engine folder structure (#4352)
* [recipe] feat: move char count recipe to verl-recipe (#4351)
### What does this PR do?
- As title.
- https://github.com/verl-project/verl-recipe/pull/4
### Checklist Before Starting
- [ ] Search for similar PRs. Paste at least one query link here: ...
- [ ] Format the PR title as `[{modules}] {type}: {description}` (This
will be checked by the CI)
- `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`,
`trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`,
`ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`,
`env`, `tool`, `ckpt`, `doc`, `data`
- If this PR involves multiple modules, separate them with `,` like
`[megatron, fsdp, doc]`
- `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test`
- If this PR breaks any API (CLI arguments, config, function signature,
etc.), add `[BREAKING]` to the beginning of the title.
- Example: `[BREAKING][fsdp, megatron] feat: dynamic batching`
### Test
> For changes that can not be tested by CI (e.g., algorithm
implementation, new model support), validate by experiment(s) and show
results like training curve plots, evaluation results, etc.
### API and Usage Example
> Demonstrate how the API changes if any, and provide usage example(s)
if possible.
```python
# Add code snippet or script demonstrating how to use this
```
### Design & Code Changes
> Demonstrate the high-level design if this PR is complex, and list the
specific changes.
### Checklist Before Submitting
> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.
- [ ] Read the [Contribute
Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md).
- [ ] Apply [pre-commit
checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [ ] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [ ] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [ ] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).
(If not accessible, please try [the Feishu group
(飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)
* [ci] chore: switch ascend ci calculation resource (#4347)
### What does this PR do?
To address the frequent queuing issues with the current `e2e_ascend` CI
check, we plan to switch the computing resources behind it and increase
the resource quantity to enable concurrent execution of `e2e_ascend` CI
check.
Additionally, all batch_size ,rollout_n and global_training_steps in
Ascend test cases are all minimized to accelerate running procedure.
### Checklist Before Starting
- [ ] Search for similar PRs. Paste at least one query link here: ...
- [ ] Format the PR title as `[{modules}] {type}: {description}` (This
will be checked by the CI)
- `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`,
`trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`,
`ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`,
`env`, `tool`, `ckpt`, `doc`, `data`
- If this PR involves multiple modules, separate them with `,` like
`[megatron, fsdp, doc]`
- `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test`
- If this PR breaks any API (CLI arguments, config, function signature,
etc.), add `[BREAKING]` to the beginning of the title.
- Example: `[BREAKING][fsdp, megatron] feat: dynamic batching`
### Test
Not related.
### API and Usage Example
Not related.
### Design & Code Changes
Not related.
### Checklist Before Submitting
> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.
- [ ] Read the [Contribute
Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md).
- [ ] Apply [pre-commit
checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [ ] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [ ] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [ ] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).
(If not accessible, please try [the Feishu group
(飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)
* feat(actor): add loss_scale_factor for seq-mean-token-sum-norm mode (#4360)
Add configurable loss_scale_factor to replace hardcoded divisor in
seq-mean-token-sum-norm loss aggregation. Users can set a constant value
to ensure consistent normalization throughout training.
- Add loss_scale_factor param to agg_loss() in core_algos.py
- Add loss_scale_factor field to ActorConfig
- Propagate via global_batch_info to policy loss functions
- Update all direct agg_loss calls in trainers and recipes
- Update YAML configs and DrGRPO documentation
### What does this PR do?
> Add **concise** overview of what this PR aims to achieve or
accomplish. Reference related GitHub issues and PRs that help with the
review.
### Checklist Before Starting
- [ ] Search for similar PRs. Paste at least one query link here: ...
- [ ] Format the PR title as `[{modules}] {type}: {description}` (This
will be checked by the CI)
- `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`,
`trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`,
`ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`,
`env`, `tool`, `ckpt`, `doc`, `data`
- If this PR involves multiple modules, separate them with `,` like
`[megatron, fsdp, doc]`
- `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test`
- If this PR breaks any API (CLI arguments, config, function signature,
etc.), add `[BREAKING]` to the beginning of the title.
- Example: `[BREAKING][fsdp, megatron] feat: dynamic batching`
### Test
> For changes that can not be tested by CI (e.g., algorithm
implementation, new model support), validate by experiment(s) and show
results like training curve plots, evaluation results, etc.
### API and Usage Example
> Demonstrate how the API changes if any, and provide usage example(s)
if possible.
```python
# Add code snippet or script demonstrating how to use this
```
### Design & Code Changes
> Demonstrate the high-level design if this PR is complex…
…project#4702) ### What does this PR do? Resolves: - verl-project#4063 (comment) - verl-project#4514 Related to: verl-project#4566 NVIDIA-NeMo/Megatron-Bridge@953aabf ### Checklist Before Starting - [X] Search for similar PRs. Paste at least one query link here: ... - [X] Format the PR title as `[{modules}] {type}: {description}` (This will be checked by the CI) - `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`, `trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`, `ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`, `env`, `tool`, `ckpt`, `doc`, `data`, `cfg`, `reward` - If this PR involves multiple modules, separate them with `,` like `[megatron, fsdp, doc]` - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test` - If this PR breaks any API (CLI arguments, config, function signature, etc.), add `[BREAKING]` to the beginning of the title. - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching` ### Test > For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluation results, etc. ### API and Usage Example > Demonstrate how the API changes if any, and provide usage example(s) if possible. ```python # Add code snippet or script demonstrating how to use this ``` ### Design & Code Changes > Demonstrate the high-level design if this PR is complex, and list the specific changes. ### Checklist Before Submitting > [!IMPORTANT] > Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review. - [X] Read the [Contribute Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md). - [X] Apply [pre-commit checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting): `pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always` - [X] Add / Update [the documentation](https://github.com/volcengine/verl/tree/main/docs). - [X] Add unit or end-to-end test(s) to [the CI workflow](https://github.com/volcengine/verl/tree/main/.github/workflows) to cover all the code. If not feasible, explain why: ... - [X] Once your PR is ready for CI, send a message in [the `ci-request` channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the `verl` Slack workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ). (If not accessible, please try [the Feishu group (飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).) Signed-off-by: Hollow Man <hollowman@opensuse.org>
… & add a separate lora merge script (#4839) ### What does this PR do? When using LoRA, MegatronCheckpointManager.save_checkpoint not only saves the adapter but also saves the huggingface checkpoint, which is unnecessary. This PR skips saving the huggingface checkpoint, and provides a separate script for merging the adapter. Relating to #4063 ### Checklist Before Starting - [x] Search for similar PRs. Paste at least one query link here: https://github.com/volcengine/verl/pulls?q=is%3Apr+megatron+lora+save - [x] Format the PR title as `[{modules}] {type}: {description}` (This will be checked by the CI) - `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`, `trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`, `ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`, `env`, `tool`, `ckpt`, `doc`, `data`, `cfg`, `reward` - If this PR involves multiple modules, separate them with `,` like `[megatron, fsdp, doc]` - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test` - If this PR breaks any API (CLI arguments, config, function signature, etc.), add `[BREAKING]` to the beginning of the title. - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching` ### Test > For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluation results, etc. ### API and Usage Example > Demonstrate how the API changes if any, and provide usage example(s) if possible. ```bash python ./scripts/megatron_merge_lora.py \ --config-name='ppo_megatron_trainer' \ actor_rollout_ref.model.lora.adapter_path=$APAPTER_PATH \ ... # same config as your training script ``` ### Design & Code Changes > Demonstrate the high-level design if this PR is complex, and list the specific changes. ### Checklist Before Submitting > [!IMPORTANT] > Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review. - [x] Read the [Contribute Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md). - [x] Apply [pre-commit checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting): `pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always` - [ ] Add / Update [the documentation](https://github.com/volcengine/verl/tree/main/docs). - [ ] Add unit or end-to-end test(s) to [the CI workflow](https://github.com/volcengine/verl/tree/main/.github/workflows) to cover all the code. If not feasible, explain why: ... - [x] Once your PR is ready for CI, send a message in [the `ci-request` channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the `verl` Slack workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ). (If not accessible, please try [the Feishu group (飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).) - [ ] If your PR is related to the `recipe` submodule, please also update the reference to the submodule commit via `git submodule update --remote` or `cd recipe && git pull origin main`.
…project#4702) ### What does this PR do? Resolves: - verl-project#4063 (comment) - verl-project#4514 Related to: verl-project#4566 NVIDIA-NeMo/Megatron-Bridge@953aabf ### Checklist Before Starting - [X] Search for similar PRs. Paste at least one query link here: ... - [X] Format the PR title as `[{modules}] {type}: {description}` (This will be checked by the CI) - `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`, `trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`, `ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`, `env`, `tool`, `ckpt`, `doc`, `data`, `cfg`, `reward` - If this PR involves multiple modules, separate them with `,` like `[megatron, fsdp, doc]` - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test` - If this PR breaks any API (CLI arguments, config, function signature, etc.), add `[BREAKING]` to the beginning of the title. - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching` ### Test > For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluation results, etc. ### API and Usage Example > Demonstrate how the API changes if any, and provide usage example(s) if possible. ```python # Add code snippet or script demonstrating how to use this ``` ### Design & Code Changes > Demonstrate the high-level design if this PR is complex, and list the specific changes. ### Checklist Before Submitting > [!IMPORTANT] > Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review. - [X] Read the [Contribute Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md). - [X] Apply [pre-commit checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting): `pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always` - [X] Add / Update [the documentation](https://github.com/volcengine/verl/tree/main/docs). - [X] Add unit or end-to-end test(s) to [the CI workflow](https://github.com/volcengine/verl/tree/main/.github/workflows) to cover all the code. If not feasible, explain why: ... - [X] Once your PR is ready for CI, send a message in [the `ci-request` channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the `verl` Slack workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ). (If not accessible, please try [the Feishu group (飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).) Signed-off-by: Hollow Man <hollowman@opensuse.org>
… & add a separate lora merge script (verl-project#4839) ### What does this PR do? When using LoRA, MegatronCheckpointManager.save_checkpoint not only saves the adapter but also saves the huggingface checkpoint, which is unnecessary. This PR skips saving the huggingface checkpoint, and provides a separate script for merging the adapter. Relating to verl-project#4063 ### Checklist Before Starting - [x] Search for similar PRs. Paste at least one query link here: https://github.com/volcengine/verl/pulls?q=is%3Apr+megatron+lora+save - [x] Format the PR title as `[{modules}] {type}: {description}` (This will be checked by the CI) - `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`, `trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`, `ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`, `env`, `tool`, `ckpt`, `doc`, `data`, `cfg`, `reward` - If this PR involves multiple modules, separate them with `,` like `[megatron, fsdp, doc]` - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test` - If this PR breaks any API (CLI arguments, config, function signature, etc.), add `[BREAKING]` to the beginning of the title. - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching` ### Test > For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluation results, etc. ### API and Usage Example > Demonstrate how the API changes if any, and provide usage example(s) if possible. ```bash python ./scripts/megatron_merge_lora.py \ --config-name='ppo_megatron_trainer' \ actor_rollout_ref.model.lora.adapter_path=$APAPTER_PATH \ ... # same config as your training script ``` ### Design & Code Changes > Demonstrate the high-level design if this PR is complex, and list the specific changes. ### Checklist Before Submitting > [!IMPORTANT] > Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review. - [x] Read the [Contribute Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md). - [x] Apply [pre-commit checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting): `pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always` - [ ] Add / Update [the documentation](https://github.com/volcengine/verl/tree/main/docs). - [ ] Add unit or end-to-end test(s) to [the CI workflow](https://github.com/volcengine/verl/tree/main/.github/workflows) to cover all the code. If not feasible, explain why: ... - [x] Once your PR is ready for CI, send a message in [the `ci-request` channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the `verl` Slack workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ). (If not accessible, please try [the Feishu group (飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).) - [ ] If your PR is related to the `recipe` submodule, please also update the reference to the submodule commit via `git submodule update --remote` or `cd recipe && git pull origin main`.
* [recipe] feat: migrate `recipe` to the dedicated repo `verl-recipe` as a submodule (#4795)
### What does this PR do?
This PR
1. migrates most recipes from the `recipe` directory to the dedicated
repo [`verl-recipe`](https://github.com/verl-project/verl-recipe),
2. adds `verl-recipe` as a submodule,
3. adds instruction to update the submodule reference in the PR
template,
4. migrates [`transfer_queue`](verl/experimental/transfer_queue),
[`fully_async_policy`](verl/experimental/fully_async_policy),
[`one_step_off_policy`](verl/experimental/one_step_off_policy) and
[`vla`](verl/experimental/vla) to
[`verl/experimental`](verl/experimental) since they are planned to be
merged into the main library,
5. updates related CI and paths accordingly,
6. updates the README news and awesome projects about this migration,
7. forces into a single commit and tries its best to recognize `rename`
to keep the commit history trackable.
See the "conjugate" PR at
https://github.com/verl-project/verl-recipe/pull/7.
### Test
See the CI.
### Todo
- [ ] Ignore the final PR commit in git blame if it shows up too
frequently.
* [model] fix: fix temp dtype (#4813)
### What does this PR do?
- As title. Prevent temperature to be int.
### Checklist Before Starting
- [ ] Search for similar PRs. Paste at least one query link here: ...
- [ ] Format the PR title as `[{modules}] {type}: {description}` (This
will be checked by the CI)
- `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`,
`trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`,
`ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`,
`env`, `tool`, `ckpt`, `doc`, `data`, `cfg`, `reward`
- If this PR involves multiple modules, separate them with `,` like
`[megatron, fsdp, doc]`
- `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test`
- If this PR breaks any API (CLI arguments, config, function signature,
etc.), add `[BREAKING]` to the beginning of the title.
- Example: `[BREAKING][fsdp, megatron] feat: dynamic batching`
### Test
> For changes that can not be tested by CI (e.g., algorithm
implementation, new model support), validate by experiment(s) and show
results like training curve plots, evaluation results, etc.
### API and Usage Example
> Demonstrate how the API changes if any, and provide usage example(s)
if possible.
```python
# Add code snippet or script demonstrating how to use this
```
### Design & Code Changes
> Demonstrate the high-level design if this PR is complex, and list the
specific changes.
### Checklist Before Submitting
> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.
- [ ] Read the [Contribute
Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md).
- [ ] Apply [pre-commit
checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [ ] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [ ] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [ ] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).
(If not accessible, please try [the Feishu group
(飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)
- [ ] If your PR is related to the `recipe` submodule, please also
update the reference to the submodule commit via `git submodule update
--remote` or `cd recipe && git pull origin main`.
---------
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
* [vllm, sglang, rollout] fix: Fix a mistake when running run_qwen3_vl-30b-megatron.sh with latest verl and vllm0.12 (#4810)
* [ckpt] feat: add checkpoint-engine abstraction (#4775)
### What does this PR do?
Add Checkpoint Engine abstraction.
#### Overview
Checkpoint Engine is an unified abstract layer to synchronize weights
between various training backends and inference backends. It provides
three unified APIs:
- send_weights: get named tensors from generator and send them in
streaming manner.
- receive_weights: return a tensor generator that yield named tensors in
streaming manner.
- get_weights: return a tensor generator that yield named tensors in
streaming manner, used for each inference instance update weight
independently from local cache (e.g share memory, disk).
For more detail, see `verl/checkpoint_engine/README.md`.
#### verl core
<img width="640" height="167" alt="image"
src="https://github.com/user-attachments/assets/fbd125d7-b461-4c89-9678-b95a2ef89c33"
/>
#### checkpoint engine
<img width="1004" height="409" alt="checkpoint-engine"
src="https://github.com/user-attachments/assets/fc263c1f-17b2-4579-9842-87b24e12abc7"
/>
* [doc, ci] fix: Update Ascend doc and fix e2e_ascend CI (#4816)
### What does this PR do?
> Add **concise** overview of what this PR aims to achieve or
accomplish. Reference related GitHub issues and PRs that help with the
review.
As title.
### Checklist Before Starting
- [ ] Search for similar PRs. Paste at least one query link here: ...
- [ ] Format the PR title as `[{modules}] {type}: {description}` (This
will be checked by the CI)
- `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`,
`trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`,
`ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`,
`env`, `tool`, `ckpt`, `doc`, `data`, `cfg`, `reward`
- If this PR involves multiple modules, separate them with `,` like
`[megatron, fsdp, doc]`
- `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test`
- If this PR breaks any API (CLI arguments, config, function signature,
etc.), add `[BREAKING]` to the beginning of the title.
- Example: `[BREAKING][fsdp, megatron] feat: dynamic batching`
### Test
> For changes that can not be tested by CI (e.g., algorithm
implementation, new model support), validate by experiment(s) and show
results like training curve plots, evaluation results, etc.
Not related.
### API and Usage Example
> Demonstrate how the API changes if any, and provide usage example(s)
if possible.
Not related.
### Design & Code Changes
> Demonstrate the high-level design if this PR is complex, and list the
specific changes.
Not related.
### Checklist Before Submitting
> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.
- [ ] Read the [Contribute
Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md).
- [ ] Apply [pre-commit
checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [ ] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [ ] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [ ] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).
(If not accessible, please try [the Feishu group
(飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)
- [ ] If your PR is related to the `recipe` submodule, please also
update the reference to the submodule commit via `git submodule update
--remote` or `cd recipe && git pull origin main`.
* [trainer] feat: VeOmniEngine supports qwen3_vl ulysses (#4806)
### What does this PR do?
as title.
### Checklist Before Starting
- [x] Search for similar PRs. Paste at least one query link here: ...
- [x] Format the PR title as `[{modules}] {type}: {description}` (This
will be checked by the CI)
- `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`,
`trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`,
`ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`,
`env`, `tool`, `ckpt`, `doc`, `data`, `cfg`, `reward`
- If this PR involves multiple modules, separate them with `,` like
`[megatron, fsdp, doc]`
- `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test`
- If this PR breaks any API (CLI arguments, config, function signature,
etc.), add `[BREAKING]` to the beginning of the title.
- Example: `[BREAKING][fsdp, megatron] feat: dynamic batching`
### Test
> For changes that can not be tested by CI (e.g., algorithm
implementation, new model support), validate by experiment(s) and show
results like training curve plots, evaluation results, etc.
### API and Usage Example
> Demonstrate how the API changes if any, and provide usage example(s)
if possible.
```python
# Add code snippet or script demonstrating how to use this
```
### Design & Code Changes
> Demonstrate the high-level design if this PR is complex, and list the
specific changes.
### Checklist Before Submitting
> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.
- [x] Read the [Contribute
Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md).
- [x] Apply [pre-commit
checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [ ] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [ ] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [ ] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).
(If not accessible, please try [the Feishu group
(飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)
* [doc] chore: fix checkpoint engine image link (#4821)
### What does this PR do?
As title
* [hardware] fix: automatically set device for SFT case (#4828)
### What does this PR do?
auto_set_device does not cover SFT case.
> Add **concise** overview of what this PR aims to achieve or
accomplish. Reference related GitHub issues and PRs that help with the
review.
### Checklist Before Starting
- [x] Search for similar PRs. Paste at least one query link here: ...
- [x] Format the PR title as `[{modules}] {type}: {description}` (This
will be checked by the CI)
- `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`,
`trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`,
`ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`,
`env`, `tool`, `ckpt`, `doc`, `data`, `cfg`, `reward`
- If this PR involves multiple modules, separate them with `,` like
`[megatron, fsdp, doc]`
- `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test`
- If this PR breaks any API (CLI arguments, config, function signature,
etc.), add `[BREAKING]` to the beginning of the title.
- Example: `[BREAKING][fsdp, megatron] feat: dynamic batching`
### Test
> For changes that can not be tested by CI (e.g., algorithm
implementation, new model support), validate by experiment(s) and show
results like training curve plots, evaluation results, etc.
### API and Usage Example
> Demonstrate how the API changes if any, and provide usage example(s)
if possible.
```python
# Add code snippet or script demonstrating how to use this
```
### Design & Code Changes
> Demonstrate the high-level design if this PR is complex, and list the
specific changes.
### Checklist Before Submitting
> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.
- [x] Read the [Contribute
Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md).
- [x] Apply [pre-commit
checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [ ] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [ ] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [ ] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).
(If not accessible, please try [the Feishu group
(飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)
- [ ] If your PR is related to the `recipe` submodule, please also
update the reference to the submodule commit via `git submodule update
--remote` or `cd recipe && git pull origin main`.
* [data] feat: TransferQueue - Update TransferQueue version and docs (#4829)
### What does this PR do?
- Update TQ to formal release version.
- Fix the shallow copy bug for chunking `BatchMeta`
https://gitcode.com/Ascend/TransferQueue/pull/2
- Fix race condition for modifying torch num_threads
https://gitcode.com/Ascend/TransferQueue/pull/5
- More robust port binding
https://gitcode.com/Ascend/TransferQueue/pull/3
- Optimize memory usage for zero-copy transfer
https://github.com/TransferQueue/TransferQueue/pull/163
- add check_data_production_status and check_consumption_status and
support polling get metadata
https://github.com/TransferQueue/TransferQueue/pull/157 @NINGBENZHE
- (alpha) Support Mooncake Store backend
https://github.com/TransferQueue/TransferQueue/pull/162 @zhaohaidao
- (alpha) Support Ray RDT backend
https://github.com/TransferQueue/TransferQueue/pull/167
- Update docs.
### Checklist Before Starting
- [x] Search for similar PRs. Paste at least one query link here: ...
- [x] Format the PR title as `[{modules}] {type}: {description}` (This
will be checked by the CI)
- `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`,
`trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`,
`ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`,
`env`, `tool`, `ckpt`, `doc`, `data`, `cfg`, `reward`
- If this PR involves multiple modules, separate them with `,` like
`[megatron, fsdp, doc]`
- `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test`
- If this PR breaks any API (CLI arguments, config, function signature,
etc.), add `[BREAKING]` to the beginning of the title.
- Example: `[BREAKING][fsdp, megatron] feat: dynamic batching`
### Test
> For changes that can not be tested by CI (e.g., algorithm
implementation, new model support), validate by experiment(s) and show
results like training curve plots, evaluation results, etc.
### API and Usage Example
> Demonstrate how the API changes if any, and provide usage example(s)
if possible.
```python
# Add code snippet or script demonstrating how to use this
```
### Design & Code Changes
> Demonstrate the high-level design if this PR is complex, and list the
specific changes.
### Checklist Before Submitting
> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.
- [x] Read the [Contribute
Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md).
- [x] Apply [pre-commit
checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [x] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [x] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [ ] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).
(If not accessible, please try [the Feishu group
(飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)
- [ ] If your PR is related to the `recipe` submodule, please also
update the reference to the submodule commit via `git submodule update
--remote` or `cd recipe && git pull origin main`.
---------
Signed-off-by: 0oshowero0 <o0shower0o@outlook.com>
* [doc] Update docs about fully_async_policy (#4826)
### What does this PR do?
Update documentation about fully_async_policy and adjust the formatting
of the table.
---------
Co-authored-by: jsfanfanfan <2981866535@qq.com>
* [ckpt] fix: FSDP save ckpt after validation (#4799)
### What does this PR do?
This PR fixes a bug in the `save_checkpoint` function for FSDPEngine.
In the original logic, if the model engine is used
(`use_legacy_worker_impl=disable`), the `wake_up` function in
`verl/workers/engine_workers.py` will be invoked during the rollout
phase of each step, which will offload the model to CPU.
Under normal circumstances, the `compute_log_prob` function called
during the training phase can load the model back to GPU. However, the
training process is not executed during the validation phase, leaving
the model on the CPU. If a checkpoint is saved immediately after
validation, it will trigger the following error: `AssertionError:
Expects tensor to be on the compute device cuda:0, was on cpu.`
<details>
<summary>Details</summary>
Script:
```
set -x
python examples/data_preprocess/geo3k.py --local_dir ~/data/geo3k
python -m verl.trainer.main_ppo \
algorithm.adv_estimator=grpo \
data.train_files=$HOME/data/geo3k/train.parquet \
data.val_files=$HOME/data/geo3k/test.parquet \
data.train_batch_size=512 \
data.max_prompt_length=1024 \
data.max_response_length=2048 \
data.filter_overlong_prompts=True \
data.truncation='error' \
data.image_key=images \
actor_rollout_ref.model.path=Qwen/Qwen2.5-VL-3B-Instruct \
actor_rollout_ref.actor.optim.lr=1e-6 \
actor_rollout_ref.model.use_remove_padding=True \
actor_rollout_ref.actor.ppo_mini_batch_size=128 \
actor_rollout_ref.actor.ppo_micro_batch_size_per_gpu=4 \
actor_rollout_ref.actor.use_kl_loss=True \
actor_rollout_ref.actor.kl_loss_coef=0.01 \
actor_rollout_ref.actor.kl_loss_type=low_var_kl \
actor_rollout_ref.actor.entropy_coeff=0 \
actor_rollout_ref.model.enable_gradient_checkpointing=True \
actor_rollout_ref.actor.fsdp_config.param_offload=False \
actor_rollout_ref.actor.fsdp_config.optimizer_offload=False \
actor_rollout_ref.rollout.log_prob_micro_batch_size_per_gpu=16 \
actor_rollout_ref.rollout.tensor_model_parallel_size=2 \
actor_rollout_ref.rollout.name=vllm \
actor_rollout_ref.rollout.gpu_memory_utilization=0.6 \
actor_rollout_ref.rollout.enable_chunked_prefill=False \
actor_rollout_ref.rollout.enforce_eager=False \
actor_rollout_ref.rollout.free_cache_engine=False \
actor_rollout_ref.rollout.n=5 \
actor_rollout_ref.ref.log_prob_micro_batch_size_per_gpu=16 \
actor_rollout_ref.ref.fsdp_config.param_offload=False \
algorithm.use_kl_in_reward=False \
trainer.use_legacy_worker_impl=disable \
trainer.critic_warmup=0 \
trainer.logger=['console','wandb'] \
trainer.project_name='verl_ci_grpo_example_geo3k' \
trainer.experiment_name='qwen2_5_vl_3b_function_rm' \
trainer.n_gpus_per_node=8 \
trainer.nnodes=1 \
trainer.log_val_generations=20 \
trainer.save_freq=5 \
trainer.test_freq=5 \
trainer.total_epochs=15
```
Error:
```
(WorkerDict pid=42417, ip=[fdbd:dccd:cdd2:2207::30f]) ERROR:2026-01-05
07:35:49,128:Got error when executing task.
(WorkerDict pid=42417, ip=[fdbd:dccd:cdd2:2207::30f]) Traceback (most
recent call last):
(WorkerDict pid=42417, ip=[fdbd:dccd:cdd2:2207::30f]) File
"python/ray/_raylet.pyx", line 1890, in ray._raylet.execute_task
(WorkerDict pid=42417, ip=[fdbd:dccd:cdd2:2207::30f]) File
"python/ray/_raylet.pyx", line 1998, in ray._raylet.execute_task
(WorkerDict pid=42417, ip=[fdbd:dccd:cdd2:2207::30f]) File
"python/ray/_raylet.pyx", line 1897, in ray._raylet.execute_task
(WorkerDict pid=42417, ip=[fdbd:dccd:cdd2:2207::30f]) File
"python/ray/_raylet.pyx", line 1825, in
ray._raylet.execute_task.function_executor
(WorkerDict pid=42417, ip=[fdbd:dccd:cdd2:2207::30f]) File
"python/ray/_raylet.pyx", line 4651, in
ray._raylet.CoreWorker.run_async_func_or_coro_in_event_loop
(WorkerDict pid=42417, ip=[fdbd:dccd:cdd2:2207::30f]) File
"/usr/lib/python3.12/concurrent/futures/_base.py", line 449, in result
(WorkerDict pid=42417, ip=[fdbd:dccd:cdd2:2207::30f]) return
self.__get_result()
(WorkerDict pid=42417, ip=[fdbd:dccd:cdd2:2207::30f])
^^^^^^^^^^^^^^^^^^^
(WorkerDict pid=42417, ip=[fdbd:dccd:cdd2:2207::30f]) File
"/usr/lib/python3.12/concurrent/futures/_base.py", line 401, in
__get_result
(WorkerDict pid=42417, ip=[fdbd:dccd:cdd2:2207::30f]) raise
self._exception
(WorkerDict pid=42417, ip=[fdbd:dccd:cdd2:2207::30f]) File
"python/ray/_raylet.pyx", line 4638, in async_func
(WorkerDict pid=42417, ip=[fdbd:dccd:cdd2:2207::30f]) File
"/usr/local/lib/python3.12/dist-packages/ray/_private/async_compat.py",
line 50, in wrapper
(WorkerDict pid=42417, ip=[fdbd:dccd:cdd2:2207::30f]) return func(*args,
**kwargs)
(WorkerDict pid=42417, ip=[fdbd:dccd:cdd2:2207::30f])
^^^^^^^^^^^^^^^^^^^^^
(WorkerDict pid=42417, ip=[fdbd:dccd:cdd2:2207::30f]) File
"/usr/local/lib/python3.12/dist-packages/ray/_private/function_manager.py",
line 691, in actor_method_executor
(WorkerDict pid=42417, ip=[fdbd:dccd:cdd2:2207::30f]) return
method(__ray_actor, *args, **kwargs)
(WorkerDict pid=42417, ip=[fdbd:dccd:cdd2:2207::30f])
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(WorkerDict pid=42417, ip=[fdbd:dccd:cdd2:2207::30f]) File
"/usr/local/lib/python3.12/dist-packages/ray/util/tracing/tracing_helper.py",
line 463, in _resume_span
(WorkerDict pid=42417, ip=[fdbd:dccd:cdd2:2207::30f]) return
method(self, *_args, **_kwargs)
(WorkerDict pid=42417, ip=[fdbd:dccd:cdd2:2207::30f])
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(WorkerDict pid=42417, ip=[fdbd:dccd:cdd2:2207::30f]) File
"/opt/tiger/open_verl/verl/single_controller/ray/base.py", line 841, in
func
(WorkerDict pid=42417, ip=[fdbd:dccd:cdd2:2207::30f]) return
getattr(self.worker_dict[key], name)(*args, **kwargs)
(WorkerDict pid=42417, ip=[fdbd:dccd:cdd2:2207::30f])
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(WorkerDict pid=42417, ip=[fdbd:dccd:cdd2:2207::30f]) File
"/opt/tiger/open_verl/verl/single_controller/base/decorator.py", line
456, in inner
(WorkerDict pid=42417, ip=[fdbd:dccd:cdd2:2207::30f]) return func(*args,
**kwargs)
(WorkerDict pid=42417, ip=[fdbd:dccd:cdd2:2207::30f])
^^^^^^^^^^^^^^^^^^^^^
(WorkerDict pid=42417, ip=[fdbd:dccd:cdd2:2207::30f]) File
"/opt/tiger/open_verl/verl/utils/transferqueue_utils.py", line 314, in
dummy_inner
(WorkerDict pid=42417, ip=[fdbd:dccd:cdd2:2207::30f]) output =
func(*args, **kwargs)
(WorkerDict pid=42417, ip=[fdbd:dccd:cdd2:2207::30f])
^^^^^^^^^^^^^^^^^^^^^
(WorkerDict pid=42417, ip=[fdbd:dccd:cdd2:2207::30f]) File
"/opt/tiger/open_verl/verl/workers/engine_workers.py", line 541, in
save_checkpoint
(WorkerDict pid=42417, ip=[fdbd:dccd:cdd2:2207::30f])
self.actor.save_checkpoint(local_path, hdfs_path, global_step,
max_ckpt_to_keep)
(WorkerDict pid=42417, ip=[fdbd:dccd:cdd2:2207::30f]) File
"/opt/tiger/open_verl/verl/single_controller/base/decorator.py", line
456, in inner
(WorkerDict pid=42417, ip=[fdbd:dccd:cdd2:2207::30f]) return func(*args,
**kwargs)
(WorkerDict pid=42417, ip=[fdbd:dccd:cdd2:2207::30f])
^^^^^^^^^^^^^^^^^^^^^
(WorkerDict pid=42417, ip=[fdbd:dccd:cdd2:2207::30f]) File
"/opt/tiger/open_verl/verl/utils/transferqueue_utils.py", line 314, in
dummy_inner
(WorkerDict pid=42417, ip=[fdbd:dccd:cdd2:2207::30f]) output =
func(*args, **kwargs)
(WorkerDict pid=42417, ip=[fdbd:dccd:cdd2:2207::30f])
^^^^^^^^^^^^^^^^^^^^^
(WorkerDict pid=42417, ip=[fdbd:dccd:cdd2:2207::30f]) File
"/opt/tiger/open_verl/verl/workers/engine_workers.py", line 343, in
save_checkpoint
(WorkerDict pid=42417, ip=[fdbd:dccd:cdd2:2207::30f]) return
self.engine.save_checkpoint(local_path, hdfs_path, global_step,
max_ckpt_to_keep)
(WorkerDict pid=42417, ip=[fdbd:dccd:cdd2:2207::30f])
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(WorkerDict pid=42417, ip=[fdbd:dccd:cdd2:2207::30f]) File
"/opt/tiger/open_verl/verl/workers/engine/fsdp/transformer_impl.py",
line 607, in save_checkpoint
(WorkerDict pid=42417, ip=[fdbd:dccd:cdd2:2207::30f])
self.checkpoint_manager.save_checkpoint(
(WorkerDict pid=42417, ip=[fdbd:dccd:cdd2:2207::30f]) File
"/opt/tiger/open_verl/verl/utils/checkpoint/fsdp_checkpoint_manager.py",
line 238, in save_checkpoint
(WorkerDict pid=42417, ip=[fdbd:dccd:cdd2:2207::30f]) model_state_dict =
self.model.state_dict()
(WorkerDict pid=42417, ip=[fdbd:dccd:cdd2:2207::30f])
^^^^^^^^^^^^^^^^^^^^^^^
(WorkerDict pid=42417, ip=[fdbd:dccd:cdd2:2207::30f]) File
"/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py",
line 2256, in state_dict
(WorkerDict pid=42417, ip=[fdbd:dccd:cdd2:2207::30f]) hook(self, prefix,
keep_vars)
(WorkerDict pid=42417, ip=[fdbd:dccd:cdd2:2207::30f]) File
"/usr/local/lib/python3.12/dist-packages/torch/utils/_contextlib.py",
line 120, in decorate_context
(WorkerDict pid=42417, ip=[fdbd:dccd:cdd2:2207::30f]) return func(*args,
**kwargs)
(WorkerDict pid=42417, ip=[fdbd:dccd:cdd2:2207::30f])
^^^^^^^^^^^^^^^^^^^^^
(WorkerDict pid=42417, ip=[fdbd:dccd:cdd2:2207::30f]) File
"/usr/local/lib/python3.12/dist-packages/torch/distributed/fsdp/_state_dict_utils.py",
line 777, in _pre_state_dict_hook
(WorkerDict pid=42417, ip=[fdbd:dccd:cdd2:2207::30f])
_pre_state_dict_hook_fn[fsdp_state._state_dict_type](
(WorkerDict pid=42417, ip=[fdbd:dccd:cdd2:2207::30f]) File
"/usr/local/lib/python3.12/dist-packages/torch/distributed/fsdp/_state_dict_utils.py",
line 517, in _sharded_pre_state_dict_hook
(WorkerDict pid=42417, ip=[fdbd:dccd:cdd2:2207::30f])
_common_unshard_pre_state_dict_hook(
(WorkerDict pid=42417, ip=[fdbd:dccd:cdd2:2207::30f]) File
"/usr/local/lib/python3.12/dist-packages/torch/distributed/fsdp/_state_dict_utils.py",
line 161, in _common_unshard_pre_state_dict_hook
(WorkerDict pid=42417, ip=[fdbd:dccd:cdd2:2207::30f])
_enter_unshard_params_ctx(
(WorkerDict pid=42417, ip=[fdbd:dccd:cdd2:2207::30f]) File
"/usr/local/lib/python3.12/dist-packages/torch/distributed/fsdp/_state_dict_utils.py",
line 125, in _enter_unshard_params_ctx
(WorkerDict pid=42417, ip=[fdbd:dccd:cdd2:2207::30f])
fsdp_state._unshard_params_ctx[module].__enter__()
(WorkerDict pid=42417, ip=[fdbd:dccd:cdd2:2207::30f]) File
"/usr/lib/python3.12/contextlib.py", line 137, in __enter__
(WorkerDict pid=42417, ip=[fdbd:dccd:cdd2:2207::30f]) return
next(self.gen)
(WorkerDict pid=42417, ip=[fdbd:dccd:cdd2:2207::30f]) ^^^^^^^^^^^^^^
(WorkerDict pid=42417, ip=[fdbd:dccd:cdd2:2207::30f]) File
"/usr/local/lib/python3.12/dist-packages/torch/distributed/fsdp/_unshard_param_utils.py",
line 199, in _unshard_fsdp_state_params
(WorkerDict pid=42417, ip=[fdbd:dccd:cdd2:2207::30f]) _unshard(state,
handle, computation_stream, computation_stream)
(WorkerDict pid=42417, ip=[fdbd:dccd:cdd2:2207::30f]) File
"/usr/local/lib/python3.12/dist-packages/torch/distributed/fsdp/_runtime_utils.py",
line 290, in _unshard
(WorkerDict pid=42417, ip=[fdbd:dccd:cdd2:2207::30f]) ran_pre_unshard =
handle.pre_unshard()
(WorkerDict pid=42417, ip=[fdbd:dccd:cdd2:2207::30f])
^^^^^^^^^^^^^^^^^^^^
(WorkerDict pid=42417, ip=[fdbd:dccd:cdd2:2207::30f]) File
"/usr/local/lib/python3.12/dist-packages/torch/distributed/fsdp/_flat_param.py",
line 1303, in pre_unshard
(WorkerDict pid=42417, ip=[fdbd:dccd:cdd2:2207::30f])
self._check_on_compute_device(self.flat_param)
(WorkerDict pid=42417, ip=[fdbd:dccd:cdd2:2207::30f]) File
"/usr/local/lib/python3.12/dist-packages/torch/distributed/fsdp/_flat_param.py",
line 2582, in _check_on_compute_device
(WorkerDict pid=42417, ip=[fdbd:dccd:cdd2:2207::30f]) _p_assert(
(WorkerDict pid=42417, ip=[fdbd:dccd:cdd2:2207::30f]) File
"/usr/local/lib/python3.12/dist-packages/torch/distributed/utils.py",
line 159, in _p_assert
(WorkerDict pid=42417, ip=[fdbd:dccd:cdd2:2207::30f]) raise
AssertionError(s)
(WorkerDict pid=42417, ip=[fdbd:dccd:cdd2:2207::30f]) AssertionError:
Expects tensor to be on the compute device cuda:0, was on cpu
```
</details>
To fix this bug, this PR checks whether the model is located on the CPU
before saving the checkpoint and loads it onto the GPU if that is the
case. The same bug also exists in Megatron, which requires further
fixes.
---------
Co-authored-by: weidongliang.339 <weidongliang.339@bytedance.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
* [perf] feat: Add MFU for Qwen3-VL dense (#4753)
### What does this PR do?
Add the _estimate_qwen3_vit_flop and _estimate_qwen3_vl_flops function
to calculate the FLOPs of Qwen3-VL dense models. Update the test cases
to verify the calculation accuracy of Qwen3-VL models.
### Checklist Before Starting
- [ ] Search for similar PRs. Paste at least one query link here: ...
- [ ] Format the PR title as `[{modules}] {type}: {description}` (This
will be checked by the CI)
- `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`,
`trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`,
`ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`,
`env`, `tool`, `ckpt`, `doc`, `data`, `cfg`, `reward`
- If this PR involves multiple modules, separate them with `,` like
`[megatron, fsdp, doc]`
- `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test`
- If this PR breaks any API (CLI arguments, config, function signature,
etc.), add `[BREAKING]` to the beginning of the title.
- Example: `[BREAKING][fsdp, megatron] feat: dynamic batching`
### Test
The following is the output result of running the test file.
<img width="1271" height="152" alt="image"
src="https://github.com/user-attachments/assets/2a3d426c-bd32-4369-9c07-c8a17c60e98b"
/>
> For changes that can not be tested by CI (e.g., algorithm
implementation, new model support), validate by experiment(s) and show
results like training curve plots, evaluation results, etc.
### API and Usage Example
> Demonstrate how the API changes if any, and provide usage example(s)
if possible.
```python
# Add code snippet or script demonstrating how to use this
```
### Design & Code Changes
> Demonstrate the high-level design if this PR is complex, and list the
specific changes.
### Checklist Before Submitting
> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.
- [ ] Read the [Contribute
Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md).
- [ ] Apply [pre-commit
checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [ ] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [ ] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [ ] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).
(If not accessible, please try [the Feishu group
(飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)
* [tool] fix: avoid nested ToolResponse in SandboxFusionTool (#4833)
### What does this PR do?
Fix an incorrect double-wrapping of `ToolResponse` in
`SandboxFusionTool.execute()`.
- `execute_code()` already returns a `ToolResponse`, but `execute()`
previously wrapped it again as `ToolResponse(text=result)`.
- Since `ToolResponse.text` expects `str | None`, the old behavior could
produce an invalid/nested response (or confusing stringified output).
- This PR makes `execute()` return the `ToolResponse` directly when
appropriate, and only wraps once when the worker returns a
non-`ToolResponse` result.
### Checklist Before Starting
- [x] Search for similar PRs. Paste at least one query link here:
- [x] Format the PR title as `[{modules}] {type}: {description}` (This
will be checked by the CI)
- `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`,
`trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`,
`ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`,
`env`, `tool`, `ckpt`, `doc`, `data`, `cfg`, `reward`
- If this PR involves multiple modules, separate them with `,` like
`[megatron, fsdp, doc]`
- `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test`
- If this PR breaks any API (CLI arguments, config, function signature,
etc.), add `[BREAKING]` to the beginning of the title.
- Example: `[BREAKING][fsdp, megatron] feat: dynamic batching`
### Test
- pre-commit:
- `pre-commit install`
- `pre-commit run --all-files --show-diff-on-failure --color=always`
- Result: **Passed**
(ruff/format/mypy/autogen-trainer-cfg/docstring/license/compileall)
### API and Usage Example
No API changes. `SandboxFusionTool.execute()` still returns
`tuple[ToolResponse, float, dict]`.
```python
# Add code snippet or script demonstrating how to use this
```
### Design & Code Changes
- `verl/tools/sandbox_fusion_tools.py`
- If the execution worker returns a `ToolResponse`, return it directly.
- Otherwise, convert the result to `str` (or `None`) and wrap once as
`ToolResponse(text=...)`.
### Checklist Before Submitting
> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.
- [x] Read the Contribute Guide:
https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md
- [x] Apply pre-commit checks: `pre-commit install && pre-commit run
--all-files --show-diff-on-failure --color=always`
- [ ] Add / Update the documentation:
https://github.com/volcengine/verl/tree/main/docs
- Not needed for this small bug fix.
- [ ] Add unit or end-to-end test(s) to the CI workflow:
https://github.com/volcengine/verl/tree/main/.github/workflows
- Not added. This change is a small correctness fix and is covered by
existing type/validation expectations; pre-commit checks passed.
- [ ] Once your PR is ready for CI, send a message in the `ci-request`
channel:
- https://verl-project.slack.com/archives/C091TCESWB1
- If not accessible:
https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a
- [ ] Recipe submodule update (if applicable).
- Not applicable for this PR.
Co-authored-by: winston <email@example.com>
* [vllm] fix: fix error in vllm patch for diff vllm version and add ci for moe with fp8 rollout (#4824)
### What does this PR do?
fix error in vllm patch for diff vllm version and add ci for moe with
fp8 rollout
### Checklist Before Starting
- [x] Search for similar PRs. Paste at least one query link here: ...
- [x] Format the PR title as `[{modules}] {type}: {description}` (This
will be checked by the CI)
- `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`,
`trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`,
`ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`,
`env`, `tool`, `ckpt`, `doc`, `data`, `cfg`, `reward`
- If this PR involves multiple modules, separate them with `,` like
`[megatron, fsdp, doc]`
- `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test`
- If this PR breaks any API (CLI arguments, config, function signature,
etc.), add `[BREAKING]` to the beginning of the title.
- Example: `[BREAKING][fsdp, megatron] feat: dynamic batching`
### Test
> For changes that can not be tested by CI (e.g., algorithm
implementation, new model support), validate by experiment(s) and show
results like training curve plots, evaluation results, etc.
### API and Usage Example
> Demonstrate how the API changes if any, and provide usage example(s)
if possible.
```python
# Add code snippet or script demonstrating how to use this
```
### Design & Code Changes
> Demonstrate the high-level design if this PR is complex, and list the
specific changes.
### Checklist Before Submitting
> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.
- [ ] Read the [Contribute
Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md).
- [ ] Apply [pre-commit
checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [ ] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [ ] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [ ] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).
(If not accessible, please try [the Feishu group
(飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)
- [ ] If your PR is related to the `recipe` submodule, please also
update the reference to the submodule commit via `git submodule update
--remote` or `cd recipe && git pull origin main`.
---------
Co-authored-by: Xue Huang <xueh@nvidia.com>
* [algo] feat: add optimal token baseline and variance proxy (#4678)
# Optimal Token Baseline
## Main feature
- Register `AdvantageEstimator.OPTIMAL_TOKEN_BASELINE`.
- Extend the DP actor to emit `sum_pi_squared`, expose
`calculate_sum_pi_squared` and checkpointing toggles across configs, and
add a reusable `calculate_sum_pi_squared_from_logits` function.
- Introduce `compute_variance_proxy_metrics` to surface signal/total
power/noise diagnostics during training.
- Document the method in `docs/algo/otb.md` and ship an executable
example at `examples/otb_trainer/run_qwen2_5-7b.sh`.
## Usage
- Enable OTB by overriding config keys (OmegaConf overlay):
```yaml
algorithm.adv_estimator: optimal_token_baseline
actor_rollout_ref:
actor:
calculate_sum_pi_squared: true
sum_pi_squared_checkpointing: false # optional for long contexts
rollout:
n: 8
```
- Run the example script (adjust dataset paths & WandB project as
needed):
```bash
bash examples/otb_trainer/run_qwen2_5-7b.sh
```
- Monitor the new variance proxies in trainer logs:
`variance_proxy/proxy1_signal_strength`, `proxy2_total_power`,
`proxy3_pure_noise`.
## keyNote
- `actor.calculate_sum_pi_squared` requires
`actor_rollout_ref.model.use_fused_kernels=False`; fused kernels must
surface logits before OTB can run there.
- Group sampling is mandatory (`rollout.n > 1`); with single-rollout
batches OTB collapses to vanilla returns.
---
UPDATE(@tongyx361 ): `compute_sum_pi_squared` is changed to
`calculate_sum_pi_squared` for consistency with `calculate_entropy`.
---------
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: Shawn/Yuxuan Tong <tongyuxuan361@gmail.com>
* [megatron] fix: Fix error in megatron workers (#4832)
### What does this PR do?
There is a bug in megatron_workers.py, 745 line is redundant and
introduces a bug. It overwrites the estimated_flops and promised_flops
calculated on lines 742-744.
Also, the condition "vl" in func.__name__ is brittle as it relies on a
naming convention. This could lead to silent miscalculations of MFU if a
new vision-language model's estimation function is named differently. A
more robust approach is to attempt calling the function with the extra
arguments and handle the TypeError if it doesn't support them.
### Checklist Before Starting
- [ ] Search for similar PRs. Paste at least one query link here: ...
- [ ] Format the PR title as `[{modules}] {type}: {description}` (This
will be checked by the CI)
- `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`,
`trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`,
`ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`,
`env`, `tool`, `ckpt`, `doc`, `data`, `cfg`, `reward`
- If this PR involves multiple modules, separate them with `,` like
`[megatron, fsdp, doc]`
- `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test`
- If this PR breaks any API (CLI arguments, config, function signature,
etc.), add `[BREAKING]` to the beginning of the title.
- Example: `[BREAKING][fsdp, megatron] feat: dynamic batching`
### Test
> For changes that can not be tested by CI (e.g., algorithm
implementation, new model support), validate by experiment(s) and show
results like training curve plots, evaluation results, etc.
### API and Usage Example
> Demonstrate how the API changes if any, and provide usage example(s)
if possible.
```python
# Add code snippet or script demonstrating how to use this
```
### Design & Code Changes
> Demonstrate the high-level design if this PR is complex, and list the
specific changes.
### Checklist Before Submitting
> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.
- [ ] Read the [Contribute
Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md).
- [ ] Apply [pre-commit
checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [ ] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [ ] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [ ] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).
(If not accessible, please try [the Feishu group
(飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)
- [ ] If your PR is related to the `recipe` submodule, please also
update the reference to the submodule commit via `git submodule update
--remote` or `cd recipe && git pull origin main`.
* [misc] feat: delete unnecessary base class in agent loop worker and vLLMHttpServer (#4838)
* [misc] feat: consolidate tensordict before dispatch (#4830)
* [training_utils] fix: json encode error in filelogger (#4811)
### What does this PR do?
- fix: json encode error in filelogger
error message: "TypeError: Object of type int32 is not JSON
serializable"
- ensure it's not Tensor object when logging to metrics
### Checklist Before Starting
- [x] Search for similar PRs. Paste at least one query link here: ...
- [x] Format the PR title as `[{modules}] {type}: {description}` (This
will be checked by the CI)
- `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`,
`trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`,
`ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`,
`env`, `tool`, `ckpt`, `doc`, `data`, `cfg`, `reward`
- If this PR involves multiple modules, separate them with `,` like
`[megatron, fsdp, doc]`
- `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test`
- If this PR breaks any API (CLI arguments, config, function signature,
etc.), add `[BREAKING]` to the beginning of the title.
- Example: `[BREAKING][fsdp, megatron] feat: dynamic batching`
### Test
> For changes that can not be tested by CI (e.g., algorithm
implementation, new model support), validate by experiment(s) and show
results like training curve plots, evaluation results, etc.
### API and Usage Example
> Demonstrate how the API changes if any, and provide usage example(s)
if possible.
```python
# Add code snippet or script demonstrating how to use this
```
### Design & Code Changes
> Demonstrate the high-level design if this PR is complex, and list the
specific changes.
### Checklist Before Submitting
> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.
- [x] Read the [Contribute
Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md).
- [x] Apply [pre-commit
checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [x] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [x] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [x] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).
(If not accessible, please try [the Feishu group
(飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)
- [x] If your PR is related to the `recipe` submodule, please also
update the reference to the submodule commit via `git submodule update
--remote` or `cd recipe && git pull origin main`.
Signed-off-by: zhuangqh <zhuangqhc@gmail.com>
* [ckpt] chore: skip saving hf_checkpoint during megatron+lora training & add a separate lora merge script (#4839)
### What does this PR do?
When using LoRA, MegatronCheckpointManager.save_checkpoint not only
saves the adapter but also saves the huggingface checkpoint, which is
unnecessary. This PR skips saving the huggingface checkpoint, and
provides a separate script for merging the adapter.
Relating to #4063
### Checklist Before Starting
- [x] Search for similar PRs. Paste at least one query link here:
https://github.com/volcengine/verl/pulls?q=is%3Apr+megatron+lora+save
- [x] Format the PR title as `[{modules}] {type}: {description}` (This
will be checked by the CI)
- `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`,
`trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`,
`ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`,
`env`, `tool`, `ckpt`, `doc`, `data`, `cfg`, `reward`
- If this PR involves multiple modules, separate them with `,` like
`[megatron, fsdp, doc]`
- `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test`
- If this PR breaks any API (CLI arguments, config, function signature,
etc.), add `[BREAKING]` to the beginning of the title.
- Example: `[BREAKING][fsdp, megatron] feat: dynamic batching`
### Test
> For changes that can not be tested by CI (e.g., algorithm
implementation, new model support), validate by experiment(s) and show
results like training curve plots, evaluation results, etc.
### API and Usage Example
> Demonstrate how the API changes if any, and provide usage example(s)
if possible.
```bash
python ./scripts/megatron_merge_lora.py \
--config-name='ppo_megatron_trainer' \
actor_rollout_ref.model.lora.adapter_path=$APAPTER_PATH \
... # same config as your training script
```
### Design & Code Changes
> Demonstrate the high-level design if this PR is complex, and list the
specific changes.
### Checklist Before Submitting
> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.
- [x] Read the [Contribute
Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md).
- [x] Apply [pre-commit
checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [ ] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [ ] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [x] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).
(If not accessible, please try [the Feishu group
(飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)
- [ ] If your PR is related to the `recipe` submodule, please also
update the reference to the submodule commit via `git submodule update
--remote` or `cd recipe && git pull origin main`.
* [rollout, vllm] fix: accuracy issue in verl serve mode + vllm-ascend + dp + ep + tp scenarios (#4783)
### What does this PR do?
Fix the accuracy issue in verl + vllm-ascend dp+ep+tp+server scenarios,
issue:https://github.com/vllm-project/vllm-ascend/issues/5544
### Checklist Before Starting
- [x] Search for similar PRs. Paste at least one query link here: ...
- [x] Format the PR title as `[{modules}] {type}: {description}` (This
will be checked by the CI)
- `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`,
`trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`,
`ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`,
`env`, `tool`, `ckpt`, `doc`, `data`, `cfg`, `reward`
- If this PR involves multiple modules, separate them with `,` like
`[megatron, fsdp, doc]`
- `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test`
- If this PR breaks any API (CLI arguments, config, function signature,
etc.), add `[BREAKING]` to the beginning of the title.
- Example: `[BREAKING][fsdp, megatron] feat: dynamic batching`
### Test
Tested GRPO on local NPU host:
<img width="1047" height="117"
alt="58274edd-d0d3-454c-8e39-3188f6f19e71"
src="https://github.com/user-attachments/assets/dee7bf2f-6faf-4f44-a8b3-64670d5b1e10"
/>
### Design & Code Changes
Root cause analysis: currently, the version of Verl + Ascend NPU +
vllm-ascend is
[v0.11.0](https://verl.readthedocs.io/en/latest/ascend_tutorial/ascend_quick_start.html).
In the vllm-ascend v0.11.0 code, the all2all backend
(flashinfer_all2allv) is selected and updated to the vllm worker
environment. However, verl's ExternalZeroMQDistributedExecutor does not
pass this environment to the vllm worker processes like vllm's
[RayDistributedExecutor](https://github.com/vllm-project/vllm/blob/0d4044edd85de30d7d4558aeea4d1e95c7c556d6/vllm/v1/executor/ray_executor.py#L337)
backend does. Therefore, due to the all2all backend for vllm-ascend is
wrong, and then there is a precision issue on vllm-ascend.
Implementation:
1. In vLLMAsyncRollout, when initiating vllm work, if it's an NPU
scenario, add the environment variables required by vllm-ascend.
2. Add vllm engine environment variables setting in rollout.yaml,
supports setting by user.
### Checklist Before Submitting
> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.
- [x] Read the [Contribute
Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md).
- [x] Apply [pre-commit
checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [ ] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [ ] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [ ] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).
(If not accessible, please try [the Feishu group
(飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)
Co-authored-by: FightingZhen
---------
Signed-off-by: leo-pony <nengjunma@outlook.com>
* [fsdp] feat: add validate process on trainer node when use_trainer_do_validate=True (#4683)
### What does this PR do?
> Add **concise** overview of what this PR aims to achieve or
accomplish. Reference related GitHub issues and PRs that help with the
review.
User Trainer node to do validate process when run mode on fully-async,
It can save time for validate computing and reduce perf/time_of_step
peak
- add new use_trainer_do_validate on fully_async async_training config
to decide whether using trainer node to do validate process
- use_trainer_do_validate: default is false
- It can improve performance of validate, such as in
`dapo_7b_math_fsdp2_8_8.sh`, it can improve about 1X speed
<img width="1440" height="608" alt="image"
src="https://github.com/user-attachments/assets/436e481e-4f51-4e8e-ad08-b038b3f0e89d"
/>
<img width="1030" height="762" alt="image"
src="https://github.com/user-attachments/assets/ed8e3237-d37d-4eff-b944-fb81ea63f87c"
/>
- optimized the `process_validation_metrics()` on `_validate()` process,
when input datasets len=1444, it latency reduce from 150+s to 40+s
<img width="2630" height="448" alt="image"
src="https://github.com/user-attachments/assets/b6fb50bc-5856-49c1-91dc-f845e9c410b4"
/>
<img width="2504" height="518" alt="image"
src="https://github.com/user-attachments/assets/b3b5f238-0c5e-4c63-9683-83f34d5a46fd"
/>
### Checklist Before Starting
- [ ] Search for similar PRs. Paste at least one query link here: ...
- [ ] Format the PR title as `[{modules}] {type}: {description}` (This
will be checked by the CI)
- `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`,
`trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`,
`ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`,
`env`, `tool`, `ckpt`, `doc`, `data`, `cfg`, `reward`
- If this PR involves multiple modules, separate them with `,` like
`[megatron, fsdp, doc]`
- `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test`
- If this PR breaks any API (CLI arguments, config, function signature,
etc.), add `[BREAKING]` to the beginning of the title.
- Example: `[BREAKING][fsdp, megatron] feat: dynamic batching`
### Test
> For changes that can not be tested by CI (e.g., algorithm
implementation, new model support), validate by experiment(s) and show
results like training curve plots, evaluation results, etc.
- on test scripts such as `dapo_7b_math_fsdp2_8_8.sh` add
`async_training.use_trainer_do_validate=True` command to do compute
- the result of this function on Qwen2.5-Math-7B model
- the baseline scripts is `dapo_7b_math_fsdp2_8_8.sh`
- the optimized scripts is `dapo_7b_math_fsdp2_8_8.sh`
+`async_training.use_trainer_do_validate=True`
- the acc and perfomance is below:
<img width="1650" height="702" alt="image"
src="https://github.com/user-attachments/assets/3419d7bb-a64c-4fe9-b776-3312925f51ab"
/>
<img width="1580" height="522" alt="image"
src="https://github.com/user-attachments/assets/2c3a7e24-7421-4f12-8527-7b997f9c3b89"
/>
- green: optimized case (`async_training.use_trainer_do_validate=True` )
- gray: baseline case (`async_training.use_trainer_do_validate=False` )
### API and Usage Example
> Demonstrate how the API changes if any, and provide usage example(s)
if possible.
```python
# Add code snippet or script demonstrating how to use this
async_training.use_trainer_do_validate=True \
```
### Design & Code Changes
> Demonstrate the high-level design if this PR is complex, and list the
specific changes.
### Checklist Before Submitting
> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.
- [ ] Read the [Contribute
Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md).
- [ ] Apply [pre-commit
checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [ ] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [ ] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [ ] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).
(If not accessible, please try [the Feishu group
(飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)
---------
Co-authored-by: Shangwei-Li <lishangwei@mail.ustc.edu.cn>
* [misc] fix: recipe submodule accidentally been removed (#4843)
### What does this PR do?
As title.
* [worker, training_utils] fix: Engine Metric Aggregation (#4778)
### What does this PR do?
Because some metrics are scaled by global_bsz/global_tokens in
`workers.utils.losses.ppo_loss`, the mean in `reduce_metrics` adds an
extra scaling of the metric by the number of gradient accumulation steps
(see examples in Test sec):
https://github.com/volcengine/verl/blob/c191c5eb5c9499dca6666a52bc5f360bfd4bbf4f/verl/utils/metric/utils.py#L53
Aggregation of the `loss` metric handles this by taking sum:
https://github.com/volcengine/verl/blob/c191c5eb5c9499dca6666a52bc5f360bfd4bbf4f/verl/workers/engine_workers.py#L143-L144
Depending on how metrics are handled in `workers.utils.losses.ppo_loss`,
it may not be correct to aggregate all of them using sum (as in #4785).
For example, `actor/pg_loss` and `actor/kl_loss` are scaled by global
batch sizes/ token counts, and should be aggregated using sum, while the
`pg_metrics` `pg_clipfrac`, `ppo_kl`, and `pg_clipfrac_lower` are scaled
by local token counts and should be aggregated using mean.
This PR introduces a metric management class to allow flexibility in
deciding the aggregation type on a per-metric basis.
### Test
This test demonstrates the scaling of metrics with the number of
gradient accumulation steps, as well as how this is resolved on this
branch. The command for running is below.
<img width="980" height="638" alt="image"
src="https://github.com/user-attachments/assets/e65ab291-3125-4df4-a0e0-3473bf64cb2a"
/>
```bash
gsm8k_train_path=$DATA_DIR/gsm8k/train.parquet
gsm8k_test_path=$DATA_DIR/gsm8k/test.parquet
train_files="['$gsm8k_train_path']"
test_files="['$gsm8k_test_path']"
ppo_micro_batch_size_per_gpu=2
ppo_micro_batch_size_per_gpu=8
branch=main
branch=fixEngineMetrics
python3 -m verl.trainer.main_ppo \
algorithm.adv_estimator=grpo \
data.dataloader_num_workers=0 \
data.return_full_prompt=True \
data.train_files="$train_files" \
data.val_files="$test_files" \
data.train_batch_size=8 \
data.max_prompt_length=512 \
data.max_response_length=1024 \
data.filter_overlong_prompts=True \
data.truncation='error' \
actor_rollout_ref.model.path=Qwen/Qwen3-0.6B \
actor_rollout_ref.actor.optim.lr=1e-6 \
actor_rollout_ref.model.use_remove_padding=True \
actor_rollout_ref.actor.ppo_mini_batch_size=8 \
actor_rollout_ref.actor.ppo_micro_batch_size_per_gpu=$ppo_micro_batch_size_per_gpu \
actor_rollout_ref.actor.use_kl_loss=True \
actor_rollout_ref.actor.fsdp_config.param_offload=True \
actor_rollout_ref.actor.fsdp_config.optimizer_offload=True \
actor_rollout_ref.rollout.tensor_model_parallel_size=1 \
actor_rollout_ref.rollout.log_prob_micro_batch_size_per_gpu=8 \
actor_rollout_ref.ref.log_prob_micro_batch_size_per_gpu=8 \
actor_rollout_ref.rollout.name=vllm \
actor_rollout_ref.rollout.gpu_memory_utilization=0.6 \
actor_rollout_ref.rollout.n=5 \
trainer.logger='["console","wandb"]' \
trainer.project_name='fixEngineMetrics' \
trainer.experiment_name="$branch/ppo_micro_batch_size_per_gpu$ppo_micro_batch_size_per_gpu" \
trainer.n_gpus_per_node=2 \
trainer.nnodes=1 \
trainer.save_freq=400 \
trainer.test_freq=40 \
trainer.use_legacy_worker_impl=disable \
trainer.total_epochs=2 \
trainer.total_training_steps=10 \
trainer.resume_mode=disable \
actor_rollout_ref.actor.use_torch_compile=False \
actor_rollout_ref.actor.fsdp_config.use_torch_compile=False \
trainer.val_before_train=False \
actor_rollout_ref.rollout.enforce_eager=True \
actor_rollout_ref.ref.fsdp_config.use_torch_compile=False
```
### Design & Code Changes
Adds a `Metric` class which tracks metric values and aggregation type.
* [rollout] fix: configurable agent loop + multimodal data for fully-async (#4842)
## Description
* **`verl/experimental/fully_async_policy/agent_loop/agent_loop.py`**
* Use `config.agent.default_agent_loop` as the default `agent_name` when
`agent_name` is not present in `batch.non_tensor_batch`.
* Pass `dataset_cls=self.dataset_cls` and
`dataset_config=self.config.data` into `hydra.utils.instantiate(...)`
when creating an agent loop instance.
*
**`verl/experimental/fully_async_policy/agent_loop/partial_tool_agent_loop.py`**
* Extract `video_data` from `multi_modal_data` and include `video_data`
in the created `AgentData` instance (in addition to existing
`image_data`).
* **`verl/experimental/fully_async_policy/detach_utils.py`**
* Stop popping original batch fields in
`prepare_single_generation_data`.
* Set `agent_name` to `async_partial_tool_agent` or
`partial_single_turn_agent` depending on
`config.actor_rollout_ref.rollout.multi_turn.enable`.
## Testing
* Verified the fully async training entry can run successfully on 4 GPU
server setup (multi-turn enabled, partial rollout enabled, vLLM async
mode).
## Related
* Fixes and extends the scope of:
[4834](https://github.com/volcengine/verl/issues/4834)
* [ci] test: switch the vlm rl test case in the npu environment to use the model engine (#4844)
* [ckpt] fix: Megatron save ckpt after validation (#4841)
### What does this PR do?
This PR fixes a bug in the `save_checkpoint` function for
MegatronEngine. https://github.com/volcengine/verl/pull/4799 is a
similar PR, which modifies FSDPEngine.
In the original logic, if the model engine is used
(`use_legacy_worker_impl=disable`), the `wake_up` function in
`verl/workers/engine_workers.py` will be invoked during the rollout
phase of each step, which will offload the model to CPU.
Under normal circumstances, the `compute_log_prob` function called
during the training phase can load the model back to GPU. However, the
training process is not executed during the validation phase, leaving
the model on the CPU. If a checkpoint is saved immediately after
validation, it will trigger the following error: `AssertionError:
Expects tensor to be on the compute device cuda:0, was on cpu.`
To fix this bug, this PR checks whether the model is located on the CPU
before saving the checkpoint and loads it onto the GPU if that is the
case.
---------
Co-authored-by: weidongliang.339 <weidongliang.339@bytedance.com>
* [megatron] feat: Share actor and ref in LoRA (#4673)
For `compute_ref_log_prob`, we can do that by disabling lora layers
temporarily for the forward pass, as base weight are frozen and only
lora layers are trained.
This has already been supported in FSDP LoRA.
### What does this PR do?
> Add **concise** overview of what this PR aims to achieve or
accomplish. Reference …
### What does this PR do? So that by default, it will follow the TP size. Related to verl-project#4063 (comment) ### Checklist Before Starting - [X] Search for similar PRs. Paste at least one query link here: ... - [X] Format the PR title as `[{modules}] {type}: {description}` (This will be checked by the CI) - `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`, `trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`, `ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`, `env`, `tool`, `ckpt`, `doc`, `data` - If this PR involves multiple modules, separate them with `,` like `[megatron, fsdp, doc]` - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test` - If this PR breaks any API (CLI arguments, config, function signature, etc.), add `[BREAKING]` to the beginning of the title. - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching` ### Test > For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluation results, etc. ### API and Usage Example > Demonstrate how the API changes if any, and provide usage example(s) if possible. ```python # Add code snippet or script demonstrating how to use this ``` ### Design & Code Changes > Demonstrate the high-level design if this PR is complex, and list the specific changes. ### Checklist Before Submitting > [!IMPORTANT] > Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review. - [X] Read the [Contribute Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md). - [X] Apply [pre-commit checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting): `pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always` - [X] Add / Update [the documentation](https://github.com/volcengine/verl/tree/main/docs). - [X] Add unit or end-to-end test(s) to [the CI workflow](https://github.com/volcengine/verl/tree/main/.github/workflows) to cover all the code. If not feasible, explain why: ... - [X] Once your PR is ready for CI, send a message in [the `ci-request` channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the `verl` Slack workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ). (If not accessible, please try [the Feishu group (飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).) Signed-off-by: Hollow Man <hollowman@opensuse.org>
…l-project#4063) ### What does this PR do? > Add **concise** overview of what this PR aims to achieve or accomplish. Reference related GitHub issues and PRs that help with the review. This PR aims to add LoRA/PEFT support by integrating Megatron-Bridge into Verl while maintaining compatibility with mbridge. As a result. LoRA/PEFT support can be added to Verl with megatron backend. Resolves verl-project#3857 Resolves verl-project#3402 Resolves verl-project#3279 ### Checklist Before Starting - [X] Search for similar PRs. Paste at least one query link here: ... - [X] Format the PR title as `[{modules}] {type}: {description}` (This will be checked by the CI) - `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`, `trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`, `ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`, `env`, `tool`, `ckpt`, `doc`, `data` - If this PR involves multiple modules, separate them with `,` like `[megatron, fsdp, doc]` - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test` - If this PR breaks any API (CLI arguments, config, function signature, etc.), add `[BREAKING]` to the beginning of the title. - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching` ### Test > For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluation results, etc. To be added later ### API and Usage Example > Demonstrate how the API changes if any, and provide usage example(s) if possible. To be added later ```python # Add code snippet or script demonstrating how to use this ``` ### Design & Code Changes > Demonstrate the high-level design if this PR is complex, and list the specific changes. To be added later ### Checklist Before Submitting > [!IMPORTANT] > Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review. - [X] Read the [Contribute Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md). - [X] Apply [pre-commit checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting): `pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always` - [ ] Add / Update [the documentation](https://github.com/volcengine/verl/tree/main/docs). - [ ] Add unit or end-to-end test(s) to [the CI workflow](https://github.com/volcengine/verl/tree/main/.github/workflows) to cover all the code. If not feasible, explain why: ... - [ ] Once your PR is ready for CI, send a message in [the `ci-request` channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the `verl` Slack workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ). (If not accessible, please try [the Feishu group (飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).) <sub>✨ Presented to you with <a href="https://macaron.im">Mind Lab</a> - A Lab for Experiential Intelligence.</sub> --------- Signed-off-by: Hollow Man <hollowman@opensuse.org> Co-authored-by: Yan Bai <bayan@nvidia.com>
…project#4702) ### What does this PR do? Resolves: - verl-project#4063 (comment) - verl-project#4514 Related to: verl-project#4566 NVIDIA-NeMo/Megatron-Bridge@953aabf ### Checklist Before Starting - [X] Search for similar PRs. Paste at least one query link here: ... - [X] Format the PR title as `[{modules}] {type}: {description}` (This will be checked by the CI) - `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`, `trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`, `ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`, `env`, `tool`, `ckpt`, `doc`, `data`, `cfg`, `reward` - If this PR involves multiple modules, separate them with `,` like `[megatron, fsdp, doc]` - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test` - If this PR breaks any API (CLI arguments, config, function signature, etc.), add `[BREAKING]` to the beginning of the title. - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching` ### Test > For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluation results, etc. ### API and Usage Example > Demonstrate how the API changes if any, and provide usage example(s) if possible. ```python # Add code snippet or script demonstrating how to use this ``` ### Design & Code Changes > Demonstrate the high-level design if this PR is complex, and list the specific changes. ### Checklist Before Submitting > [!IMPORTANT] > Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review. - [X] Read the [Contribute Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md). - [X] Apply [pre-commit checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting): `pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always` - [X] Add / Update [the documentation](https://github.com/volcengine/verl/tree/main/docs). - [X] Add unit or end-to-end test(s) to [the CI workflow](https://github.com/volcengine/verl/tree/main/.github/workflows) to cover all the code. If not feasible, explain why: ... - [X] Once your PR is ready for CI, send a message in [the `ci-request` channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the `verl` Slack workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ). (If not accessible, please try [the Feishu group (飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).) Signed-off-by: Hollow Man <hollowman@opensuse.org>
… & add a separate lora merge script (verl-project#4839) ### What does this PR do? When using LoRA, MegatronCheckpointManager.save_checkpoint not only saves the adapter but also saves the huggingface checkpoint, which is unnecessary. This PR skips saving the huggingface checkpoint, and provides a separate script for merging the adapter. Relating to verl-project#4063 ### Checklist Before Starting - [x] Search for similar PRs. Paste at least one query link here: https://github.com/volcengine/verl/pulls?q=is%3Apr+megatron+lora+save - [x] Format the PR title as `[{modules}] {type}: {description}` (This will be checked by the CI) - `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`, `trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`, `ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`, `env`, `tool`, `ckpt`, `doc`, `data`, `cfg`, `reward` - If this PR involves multiple modules, separate them with `,` like `[megatron, fsdp, doc]` - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test` - If this PR breaks any API (CLI arguments, config, function signature, etc.), add `[BREAKING]` to the beginning of the title. - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching` ### Test > For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluation results, etc. ### API and Usage Example > Demonstrate how the API changes if any, and provide usage example(s) if possible. ```bash python ./scripts/megatron_merge_lora.py \ --config-name='ppo_megatron_trainer' \ actor_rollout_ref.model.lora.adapter_path=$APAPTER_PATH \ ... # same config as your training script ``` ### Design & Code Changes > Demonstrate the high-level design if this PR is complex, and list the specific changes. ### Checklist Before Submitting > [!IMPORTANT] > Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review. - [x] Read the [Contribute Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md). - [x] Apply [pre-commit checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting): `pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always` - [ ] Add / Update [the documentation](https://github.com/volcengine/verl/tree/main/docs). - [ ] Add unit or end-to-end test(s) to [the CI workflow](https://github.com/volcengine/verl/tree/main/.github/workflows) to cover all the code. If not feasible, explain why: ... - [x] Once your PR is ready for CI, send a message in [the `ci-request` channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the `verl` Slack workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ). (If not accessible, please try [the Feishu group (飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).) - [ ] If your PR is related to the `recipe` submodule, please also update the reference to the submodule commit via `git submodule update --remote` or `cd recipe && git pull origin main`.
…project#4702) ### What does this PR do? Resolves: - verl-project#4063 (comment) - verl-project#4514 Related to: verl-project#4566 NVIDIA-NeMo/Megatron-Bridge@953aabf ### Checklist Before Starting - [X] Search for similar PRs. Paste at least one query link here: ... - [X] Format the PR title as `[{modules}] {type}: {description}` (This will be checked by the CI) - `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`, `trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`, `ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`, `env`, `tool`, `ckpt`, `doc`, `data`, `cfg`, `reward` - If this PR involves multiple modules, separate them with `,` like `[megatron, fsdp, doc]` - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test` - If this PR breaks any API (CLI arguments, config, function signature, etc.), add `[BREAKING]` to the beginning of the title. - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching` ### Test > For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluation results, etc. ### API and Usage Example > Demonstrate how the API changes if any, and provide usage example(s) if possible. ```python # Add code snippet or script demonstrating how to use this ``` ### Design & Code Changes > Demonstrate the high-level design if this PR is complex, and list the specific changes. ### Checklist Before Submitting > [!IMPORTANT] > Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review. - [X] Read the [Contribute Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md). - [X] Apply [pre-commit checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting): `pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always` - [X] Add / Update [the documentation](https://github.com/volcengine/verl/tree/main/docs). - [X] Add unit or end-to-end test(s) to [the CI workflow](https://github.com/volcengine/verl/tree/main/.github/workflows) to cover all the code. If not feasible, explain why: ... - [X] Once your PR is ready for CI, send a message in [the `ci-request` channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the `verl` Slack workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ). (If not accessible, please try [the Feishu group (飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).) Signed-off-by: Hollow Man <hollowman@opensuse.org>
… & add a separate lora merge script (verl-project#4839) ### What does this PR do? When using LoRA, MegatronCheckpointManager.save_checkpoint not only saves the adapter but also saves the huggingface checkpoint, which is unnecessary. This PR skips saving the huggingface checkpoint, and provides a separate script for merging the adapter. Relating to verl-project#4063 ### Checklist Before Starting - [x] Search for similar PRs. Paste at least one query link here: https://github.com/volcengine/verl/pulls?q=is%3Apr+megatron+lora+save - [x] Format the PR title as `[{modules}] {type}: {description}` (This will be checked by the CI) - `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`, `trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`, `ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`, `env`, `tool`, `ckpt`, `doc`, `data`, `cfg`, `reward` - If this PR involves multiple modules, separate them with `,` like `[megatron, fsdp, doc]` - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test` - If this PR breaks any API (CLI arguments, config, function signature, etc.), add `[BREAKING]` to the beginning of the title. - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching` ### Test > For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluation results, etc. ### API and Usage Example > Demonstrate how the API changes if any, and provide usage example(s) if possible. ```bash python ./scripts/megatron_merge_lora.py \ --config-name='ppo_megatron_trainer' \ actor_rollout_ref.model.lora.adapter_path=$APAPTER_PATH \ ... # same config as your training script ``` ### Design & Code Changes > Demonstrate the high-level design if this PR is complex, and list the specific changes. ### Checklist Before Submitting > [!IMPORTANT] > Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review. - [x] Read the [Contribute Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md). - [x] Apply [pre-commit checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting): `pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always` - [ ] Add / Update [the documentation](https://github.com/volcengine/verl/tree/main/docs). - [ ] Add unit or end-to-end test(s) to [the CI workflow](https://github.com/volcengine/verl/tree/main/.github/workflows) to cover all the code. If not feasible, explain why: ... - [x] Once your PR is ready for CI, send a message in [the `ci-request` channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the `verl` Slack workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ). (If not accessible, please try [the Feishu group (飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).) - [ ] If your PR is related to the `recipe` submodule, please also update the reference to the submodule commit via `git submodule update --remote` or `cd recipe && git pull origin main`.
### What does this PR do? Resolves: - verl-project/verl#4063 (comment) - verl-project/verl#4514 Related to: verl-project/verl#4566 NVIDIA-NeMo/Megatron-Bridge@953aabf ### Checklist Before Starting - [X] Search for similar PRs. Paste at least one query link here: ... - [X] Format the PR title as `[{modules}] {type}: {description}` (This will be checked by the CI) - `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`, `trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`, `ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`, `env`, `tool`, `ckpt`, `doc`, `data`, `cfg`, `reward` - If this PR involves multiple modules, separate them with `,` like `[megatron, fsdp, doc]` - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test` - If this PR breaks any API (CLI arguments, config, function signature, etc.), add `[BREAKING]` to the beginning of the title. - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching` ### Test > For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluation results, etc. ### API and Usage Example > Demonstrate how the API changes if any, and provide usage example(s) if possible. ```python # Add code snippet or script demonstrating how to use this ``` ### Design & Code Changes > Demonstrate the high-level design if this PR is complex, and list the specific changes. ### Checklist Before Submitting > [!IMPORTANT] > Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review. - [X] Read the [Contribute Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md). - [X] Apply [pre-commit checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting): `pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always` - [X] Add / Update [the documentation](https://github.com/volcengine/verl/tree/main/docs). - [X] Add unit or end-to-end test(s) to [the CI workflow](https://github.com/volcengine/verl/tree/main/.github/workflows) to cover all the code. If not feasible, explain why: ... - [X] Once your PR is ready for CI, send a message in [the `ci-request` channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the `verl` Slack workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ). (If not accessible, please try [the Feishu group (飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).) Signed-off-by: Hollow Man <hollowman@opensuse.org>
### What does this PR do? Resolves: - verl-project/verl#4063 (comment) - verl-project/verl#4514 Related to: verl-project/verl#4566 NVIDIA-NeMo/Megatron-Bridge@953aabf ### Checklist Before Starting - [X] Search for similar PRs. Paste at least one query link here: ... - [X] Format the PR title as `[{modules}] {type}: {description}` (This will be checked by the CI) - `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`, `trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`, `ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`, `env`, `tool`, `ckpt`, `doc`, `data`, `cfg`, `reward` - If this PR involves multiple modules, separate them with `,` like `[megatron, fsdp, doc]` - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test` - If this PR breaks any API (CLI arguments, config, function signature, etc.), add `[BREAKING]` to the beginning of the title. - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching` ### Test > For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluation results, etc. ### API and Usage Example > Demonstrate how the API changes if any, and provide usage example(s) if possible. ```python # Add code snippet or script demonstrating how to use this ``` ### Design & Code Changes > Demonstrate the high-level design if this PR is complex, and list the specific changes. ### Checklist Before Submitting > [!IMPORTANT] > Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review. - [X] Read the [Contribute Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md). - [X] Apply [pre-commit checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting): `pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always` - [X] Add / Update [the documentation](https://github.com/volcengine/verl/tree/main/docs). - [X] Add unit or end-to-end test(s) to [the CI workflow](https://github.com/volcengine/verl/tree/main/.github/workflows) to cover all the code. If not feasible, explain why: ... - [X] Once your PR is ready for CI, send a message in [the `ci-request` channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the `verl` Slack workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ). (If not accessible, please try [the Feishu group (飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).) Signed-off-by: Hollow Man <hollowman@opensuse.org>
What does this PR do?
This PR aims to add LoRA/PEFT support by integrating Megatron-Bridge into Verl while maintaining compatibility with mbridge. As a result. LoRA/PEFT support can be added to Verl with megatron backend.
Resolves #3857
Resolves #3402
Resolves #3279
Checklist Before Starting
[{modules}] {type}: {description}(This will be checked by the CI){modules}includefsdp,megatron,sglang,vllm,rollout,trainer,ci,training_utils,recipe,hardware,deployment,ray,worker,single_controller,misc,perf,model,algo,env,tool,ckpt,doc,data,like[megatron, fsdp, doc]{type}is infeat,fix,refactor,chore,test[BREAKING]to the beginning of the title.[BREAKING][fsdp, megatron] feat: dynamic batchingTest
To be added later
API and Usage Example
To be added later
# Add code snippet or script demonstrating how to use thisDesign & Code Changes
To be added later
Checklist Before Submitting
Important
Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review.
pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=alwaysci-requestchannel in theverlSlack workspace. (If not accessible, please try the Feishu group (飞书群).)✨ Presented to you with Mind Lab - A Lab for Experiential Intelligence.