Merged
Conversation
…ct#1637) ### Checklist Before Starting - [x] Search for similar PR(s). ### What does this PR do? Added position_ids parameter to the model.generate method call to provide explicit control over token positions during text generation. I don't quite understand why have obtained position ids above but not passed them to generate, so I modified this.😂 ### High-Level Design > Demonstrate the high-level design if this PR is complex. ### Specific Changes > List the specific changes. ### API > Demonstrate how the API changes if any. ### Usage Example > Provide usage example(s) for easier usage. ```python # Add code snippet or script demonstrating how to use this ``` ### Test > For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluatuion results, etc. ### Additional Info. - **Issue Number**: Fixes issue # or discussion # if any. - **Training**: [Note which backend this PR will affect: FSDP, Megatron, both, or none] - **Inference**: [Note which backend this PR will affect: vLLM, SGLang, both, or none] ### Checklist Before Submitting - [x] Read the [Contribute Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide). - [x] Apply [pre-commit checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting). - [x] Add `[BREAKING]` to the PR title if it breaks any API. - [ ] Update the documentation about your changes in the [docs](https://github.com/volcengine/verl/tree/main/docs). - [ ] Add CI test(s) if necessary.
…erl-project#1562) ### Checklist Before Starting - [x] Search for similar PR(s). ### What does this PR do? This PR enables the Megatron backend checkpoint manager to save hf model config into verl checkpoints, and simplify our CI since the `--hf_model_path` has been deprecated in verl-project#1468, fixes the comment verl-project#1468 (comment). Note: several changed lines in `verl/utils/megatron_utils.py` are unrelated to this PR; they were automatically reformatted by pre-commit hooks. ### Test The current CI e2e tests should sufficient cover for this PR. ### Additional Info. - **Training**: Megatron - **Inference**: none ### Checklist Before Submitting - [x] Read the [Contribute Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide). - [x] Apply [pre-commit checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting). - [x] Add `[BREAKING]` to the PR title if it breaks any API. - [x] Update the documentation about your changes in the [docs](https://github.com/volcengine/verl/tree/main/docs). - [x] Add CI test(s) if neccessary.
### Checklist Before Starting - [x] Search for similar PR(s). ### What does this PR do? This PR supports activation offloading, and currently it's only for FSDP backend. ### High-Level Design Our implementation is based on the [one](https://github.com/NVIDIA/TransformerEngine/blob/main/transformer_engine/pytorch/cpu_offload.py) in TransformerEngine. For efficiency, it groups activations by TransformerLayer and offloads activation groups asynchronously. This means that the offloading of the i-th activation group and the computation of the i+1-th activation group happen at the same time, and there are at most two activation groups in GPU memory. ### Specific Changes 1. Add activation offloading support. ### API ### Usage Example ``` export VLLM_ATTENTION_BACKEND=XFORMERS python3 -m verl.trainer.main_ppo \ algorithm.adv_estimator=grpo \ data.train_files=./data/gsm8k/train.parquet \ data.val_files=./data/gsm8k/test.parquet \ data.train_batch_size=512 \ data.max_prompt_length=512 \ data.max_response_length=1024 \ data.filter_overlong_prompts=True \ data.truncation='error' \ actor_rollout_ref.model.path=./huggingface.co/Qwen/Qwen2-7B-Instruct \ actor_rollout_ref.actor.optim.lr=1e-6 \ actor_rollout_ref.model.use_remove_padding=True \ actor_rollout_ref.actor.ppo_mini_batch_size=256 \ actor_rollout_ref.actor.ppo_micro_batch_size_per_gpu=64 \ actor_rollout_ref.actor.use_kl_loss=True \ actor_rollout_ref.actor.kl_loss_coef=0.001 \ actor_rollout_ref.actor.kl_loss_type=low_var_kl \ actor_rollout_ref.actor.entropy_coeff=0 \ actor_rollout_ref.model.enable_gradient_checkpointing=True \ actor_rollout_ref.model.enable_activation_offload=True \ actor_rollout_ref.actor.fsdp_config.param_offload=False \ actor_rollout_ref.actor.fsdp_config.optimizer_offload=False \ actor_rollout_ref.rollout.log_prob_micro_batch_size_per_gpu=64 \ actor_rollout_ref.rollout.tensor_model_parallel_size=2 \ actor_rollout_ref.rollout.name=vllm \ actor_rollout_ref.rollout.gpu_memory_utilization=0.6 \ actor_rollout_ref.rollout.n=5 \ actor_rollout_ref.ref.log_prob_micro_batch_size_per_gpu=64 \ actor_rollout_ref.ref.fsdp_config.param_offload=True \ algorithm.use_kl_in_reward=False \ trainer.critic_warmup=0 \ trainer.logger=['console','tensorboard'] \ trainer.project_name='verl_grpo_example_gsm8k' \ trainer.experiment_name='qwen2_7b_function_rm' \ trainer.n_gpus_per_node=8 \ trainer.val_before_train=False \ trainer.nnodes=1 \ trainer.save_freq=-1 \ trainer.test_freq=5 \ trainer.total_epochs=15 ``` ### Test We conducted experiments on the Qwen2 7B model based on the above script. The memory and throughput data are shown in the figures below, where the blue line represents activation offloading. <img width="351" alt="image" src="https://github.com/user-attachments/assets/207576a1-3f47-4b40-bf19-60cf8105d609" /> <img width="361" alt="image" src="https://github.com/user-attachments/assets/d58f0f8b-eb5f-4e19-a892-4d778ff26135" /> ### Additional Info. - **Issue Number**: none - **Training**: This PR will affect FSDP backend - **Inference**: none ### Checklist Before Submitting - [x] Read the [Contribute Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide). - [x] Apply [pre-commit checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting). - [x] Add `[BREAKING]` to the PR title if it breaks any API. - [x] Update the documentation about your changes in the [docs](https://github.com/volcengine/verl/tree/main/docs). - [x] Add CI test(s) if neccessary.
# What does this PR do? This PR adds `loss_agg_mode` to critics. # Before submitting - [x] Did you read the [Contribute Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide) and finish the [code format check](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting)? - [x] Did you make sure to update the documentations with your changes in the [docs](https://github.com/volcengine/verl/tree/main/docs) especially for breaking config etc? - [x] Did you write any test cases if neccessary? Please add CI tests to your new feature. # Additional Info - **Issue Number**: none - **Training**: both - **Inference**: none
### Checklist Before Starting - [x] Search for similar PR(s). ### What does this PR do? This PR fixes wrong initialization so that verl only loads reference policy when needed. ### Additional Info. - **Issue Number**: none - **Training**: none - **Inference**: none ### Checklist Before Submitting - [x] Read the [Contribute Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide). - [x] Apply [pre-commit checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting). - [x] Add `[BREAKING]` to the PR title if it breaks any API. - [x] Update the documentation about your changes in the [docs](https://github.com/volcengine/verl/tree/main/docs). - [x] Add CI test(s) if necessary.
### Checklist Before Starting - [x] Search for similar PR(s). ### What does this PR do? This PR adds ignore patterns to CI for SPIN & SPPO. ### Additional Info. - **Issue Number**: none - **Training**: none - **Inference**: none ### Checklist Before Submitting - [x] Read the [Contribute Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide). - [x] Apply [pre-commit checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting). - [x] Add `[BREAKING]` to the PR title if it breaks any API. - [x] Update the documentation about your changes in the [docs](https://github.com/volcengine/verl/tree/main/docs). - [x] Add CI test(s) if necessary.
### Checklist Before Starting - [x] Search for similar PR(s). ### What does this PR do? This PR adds entropy computation and logging to DAPO trainer, aligning with other trainers. ### Additional Info. - **Issue Number**: verl-project#1455 - **Training**: none - **Inference**: none ### Checklist Before Submitting - [x] Read the [Contribute Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide). - [x] Apply [pre-commit checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting). - [x] Add `[BREAKING]` to the PR title if it breaks any API. - [x] Update the documentation about your changes in the [docs](https://github.com/volcengine/verl/tree/main/docs). - [x] Add CI test(s) if necessary.
For developers, you can follow the docs: docs/ascend/ascend.rst This pr is committed for supporting Ascend NPU backend. Co-authored-by: Chendong98 [chendong136@huawei.com](mailto:chendong136@huawei.com) Co-authored-by: zheliuyu <15750543867@163.com> Co-authored-by: celestialli [celestialli@outlook.com](mailto:celestialli@outlook.com) In this pr, we add the capability to determine the type of NPU device and we also add a new script for training on NPU. These are change lists: 1. pyproject.toml change verison of vllm 2. requirements-npu.txt requirements for NPU 3. verl/bert_padding.py Adapted from https://github.com/mlcommons/training_results_v1.1/blob/main/NVIDIA/benchmarks/bert/implementations/pytorch/padding.py 4. verl/single_controller/ray/base.py 5. verl/third_party/vllm/vllm_spmd/dtensor_weight_loaders.py 6. verl/trainer/fsdp_sft_trainer.py 7. verl/utils/flops_counter.py 8. verl/utils/fsdp_utils.py 9. verl/workers/actor/dp_actor.py 10. verl/workers/critic/dp_critic.py 11. verl/workers/fsdp_workers.py 12. verl/workers/rollout/vllm_rollout/vllm_rollout_spmd.py 13. verl/workers/sharding_manager/fsdp_vllm.py 14. verl/utils/device.py get device type for different device 15. docs/ascend/ascend.md Here are our roadmap: **RoadMap** - [x] sft - [x] ppo - [x] grpo News [2025.03.31] Add result of SFT and GRPO. Qwen2-7B-Instruct was tested on 2*8 devices, and many params related to batch_size need to be reduced. So this result is only for reference. We will announce the reward results of the default params as soon as sleep mode is supported. [2025.03.03] Modify the adaptation method of Ray [2025.02.25] The PPO algorithm is supported for training on NPU with the FSDP backend. [2025.02.23] The SFT algorithm is supported for training on NPU with the FSDP backend. [2025.02.21] The GRPO algorithm is supported for training on NPU with the FSDP backend. Requirements We use this PR testing on Ascend NPU and GPU to ensure the same codes can run on different devices. The device information is 8 Atlas 800T A2 and 8 A100. Other software information is shown in the following table. | Software | Version | |:-------|-------:| | transformers | 4.47.1 | | accelerate | 1.3.0 | | torch_npu | 2.5.1.rc1| |CANN | 8.1.RC1 (Not Released)| About mean error Due to differences in hardware structure, we cannot guarantee that the loss of Ascend NPU is exactly the same as that of the GPU. According to our experience, the loss differences less than 2% is acceptable. If the loss difference is greater than 2%, we will try to fix it. The calculation formula is as follows.  N represents the number of training steps. For more information, please refer to [Calculation accuracy description](https://www.hiascend.com/document/detail/zh/Pytorch/600/ptmoddevg/trainingmigrguide/LMaccuracy_0001.html) --------- Co-authored-by: Chendong98 <chendong136@huawei.com> Co-authored-by: zheliuyu <15750543867@163.com>
verl-project#1627) …l_len before generation ### Checklist Before Starting - [x] Search for similar PR(s). ### What does this PR do? This PR adds a validation step to prevent generation requests that exceed the model’s maximum context length in SGLang. Without this check, multi-turn RL training can fail when the combined length of the prompt and the maximum response exceeds the model limit. The new validation ensures `prompt_len + max_resp_len <= max_model_len` before sending requests to the SGLang engine. ### Test Successfully tested with my multiturn RL dataset with `max_turns==30` which keeps failing with the following error before this change(Qwen2.5-32B-instruct + GRPO): ``` Traceback (most recent call last): File "/home/jobuser/resources/verl/trainer/main_ppo.py", line 64, in main run_ppo(config) File "/home/jobuser/resources/verl/trainer/main_ppo.py", line 76, in run_ppo ray.get(runner.run.remote(config)) File "/home/jobuser/.local/lib/python3.10/site-packages/ray/_private/auto_init_hook.py", line 21, in auto_init_wrapper return fn(*args, **kwargs) File "/home/jobuser/.local/lib/python3.10/site-packages/ray/_private/client_mode_hook.py", line 103, in wrapper return func(*args, **kwargs) File "/home/jobuser/.local/lib/python3.10/site-packages/ray/_private/worker.py", line 2822, in get values, debugger_breakpoint = worker.get_objects(object_refs, timeout=timeout) File "/home/jobuser/.local/lib/python3.10/site-packages/ray/_private/worker.py", line 930, in get_objects raise value.as_instanceof_cause() ray.exceptions.RayTaskError(ValueError): ray::TaskRunner.run() (pid=1150536, ip=100.96.248.206, actor_id=85b22be1ed8ef671c739638a01000000, repr=<main_ppo.TaskRunner object at 0x796b0bba7010>) File "/home/jobuser/resources/verl/trainer/main_ppo.py", line 183, in run trainer.fit() File "/home/jobuser/resources/verl/trainer/ppo/ray_trainer.py", line 872, in fit val_metrics = self._validate() File "/home/jobuser/resources/verl/trainer/ppo/ray_trainer.py", line 607, in _validate test_output_gen_batch_padded = self.actor_rollout_wg.generate_sequences(test_gen_batch_padded) File "/home/jobuser/resources/verl/single_controller/ray/base.py", line 49, in func output = ray.get(output) ray.exceptions.RayTaskError(ValueError): ray::WorkerDict.actor_rollout_generate_sequences() (pid=1169888, ip=100.96.248.206, actor_id=6deb9fd4b4ff01530920ada301000000, repr=<verl.single_controller.ray.base.WorkerDict object at 0x7e41e90afa90>) File "/home/jobuser/resources/verl/single_controller/ray/base.py", line 625, in func return getattr(self.worker_dict[key], name)(*args, **kwargs) File "/home/jobuser/resources/verl/single_controller/base/decorator.py", line 534, in inner return func(*args, **kwargs) File "/home/jobuser/resources/verl/workers/fsdp_workers.py", line 630, in generate_sequences output = self.rollout.generate_sequences_with_tools(prompts=prompts) File "/home/jobuser/resources/verl/utils/debug/performance.py", line 78, in f return self.log(decorated_function, *args, **kwargs) File "/home/jobuser/resources/verl/utils/debug/performance.py", line 88, in log output = func(*args, **kwargs) File "/home/jobuser/.local/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context return func(*args, **kwargs) File "/home/jobuser/resources/verl/workers/rollout/sglang_rollout/async_sglang_rollout.py", line 613, in generate_sequences_with_tools output_req_list = loop.run_until_complete( File "uvloop/loop.pyx", line 1518, in uvloop.loop.Loop.run_until_complete File "/home/jobuser/resources/verl/workers/rollout/sglang_rollout/async_sglang_rollout.py", line 529, in _async_rollout_a_request output = await self._engine.async_generate( File "/home/jobuser/.local/lib/python3.10/site-packages/sglang/srt/entrypoints/engine.py", line 265, in async_generate return await generator.__anext__() File "/home/jobuser/.local/lib/python3.10/site-packages/sglang/srt/managers/tokenizer_manager.py", line 403, in generate_request tokenized_obj = await self._tokenize_one_request(obj) File "/home/jobuser/.local/lib/python3.10/site-packages/sglang/srt/managers/tokenizer_manager.py", line 450, in _tokenize_one_request self._validate_token_len(obj, input_ids) File "/home/jobuser/.local/lib/python3.10/site-packages/sglang/srt/managers/tokenizer_manager.py", line 482, in _validate_token_len raise ValueError(error_msg) ValueError: Requested token count exceeds the model's maximum context length of 32768 tokens. You requested a total of 34009 tokens: 23769 tokens from the input messages and 10240 tokens for the completion. Please reduce the number of tokens in the input messages or the completion to fit within the limit. ``` ### Additional Info. - **Inference**: SGLang, ### Checklist Before Submitting - [x] Read the [Contribute Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide). - [x] Apply [pre-commit checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting). - [x] Add `[BREAKING]` to the PR title if it breaks any API. - [ ] Update the documentation about your changes in the [docs](https://github.com/volcengine/verl/tree/main/docs). - [x] Add CI test(s) if necessary.
…t#1638) ### Checklist Before Starting - [x] Search for similar PR(s). ### What does this PR do? This simple PR adds support for [ChainedOptimizer](https://github.com/NVIDIA/Megatron-LM/blob/75b1ca13618bded85c81fb572f58df83ba095dc9/megatron/core/optimizer/optimizer.py#L938) offloading in the Megatron-LM training environment. In Megatron-LM, ChainedOptimizer is used when expert parallelism (expert_parallel > 1, related to verl-project#1467 ) is enabled—commonly in Mixture-of-Experts (MoE) models. This has been tested and validated with the Qwen3-235B-22A model configuration. ### High-Level Design > Demonstrate the high-level design if this PR is complex. ### Specific Changes > List the specific changes. ### API > Demonstrate how the API changes if any. ### Usage Example > Provide usage example(s) for easier usage. ```python ... actor_rollout_ref.actor.megatron.optimizer_offload=True \ actor_rollout_ref.actor.megatron.expert_model_parallel_size=16 \ ... ``` ### Test > For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluatuion results, etc. ### Additional Info. - **Issue Number**: Fixes issue # or discussion # if any. - **Training**: [Megatron] - **Inference**: [none] ### Checklist Before Submitting - [x] Read the [Contribute Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide). - [x] Apply [pre-commit checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting). - [ ] Add `[BREAKING]` to the PR title if it breaks any API. - [ ] Update the documentation about your changes in the [docs](https://github.com/volcengine/verl/tree/main/docs). - [x] Add CI test(s) if necessary. --------- Co-authored-by: charlie.cs <charlie.cs@kakaocorp.com> Co-authored-by: ETOgaosion <gaoziyuan19@mails.ucas.ac.cn>
### Checklist Before Starting - [x] Search for similar PR(s). ### What does this PR do? Shifts fused_linear_for_ppo into model.forward for FSDP ### High-Level Design Self explaining ### Specific Changes - Update monkey patch to return log_probs and entropy instead of last_hidden_state. ### API No changes ### Usage Example ```sh actor_rollout_ref.model.use_fused_kernels=True ``` ### Test  ### Additional Info. - This is to fix verl-project#1565 - The original bug arises because we tried to access model.lm_head.weight from outside of the FSDP wrapped context. ### Checklist Before Submitting - [x] Read the [Contribute Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide). - [x] Apply [pre-commit checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting). - [x] Add `[BREAKING]` to the PR title if it breaks any API. - [x] Update the documentation about your changes in the [docs](https://github.com/volcengine/verl/tree/main/docs). - [x] Add CI test(s) if necessary.
### Checklist Before Starting - [x] Search for similar PR(s). ### What does this PR do? This PR fixes: - DAPO CI triggering path patterns outdated since verl-project#1392 - `response_mask` computation missing but skipping the CI test in verl-project#1652 ### Tests - [x] DAPO CI is correctly triggered and passed, e.g., https://github.com/volcengine/verl/actions/runs/15223958183/job/42823610223?pr=1666 ### Additional Info. - **Issue Number**: verl-project#1392 , verl-project#1652 - **Training**: none - **Inference**: none ### Checklist Before Submitting - [x] Read the [Contribute Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide). - [x] Apply [pre-commit checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting). - [x] Add `[BREAKING]` to the PR title if it breaks any API. - [x] Update the documentation about your changes in the [docs](https://github.com/volcengine/verl/tree/main/docs). - [x] Add CI test(s) if necessary.
### Checklist Before Starting - [x] Search for similar PR(s). ### What does this PR do? modify the instructions for using verl on ASCEND NPU ### High-Level Design > Demonstrate the high-level design if this PR is complex. ### Specific Changes 1、Modify table format 2、Modify the installation method of vllm and vllm-ascend ### API > Demonstrate how the API changes if any. ### Usage Example > Provide usage example(s) for easier usage. ```python # Add code snippet or script demonstrating how to use this ``` ### Test > For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluatuion results, etc. ### Additional Info. - **Issue Number**: Fixes issue # or discussion # if any. - **Training**: [Note which backend this PR will affect: FSDP, Megatron, both, or none] - **Inference**: [Note which backend this PR will affect: vLLM, SGLang, both, or none] ### Checklist Before Submitting - [x] Read the [Contribute Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide). - [x] Apply [pre-commit checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting). - [x] Add `[BREAKING]` to the PR title if it breaks any API. - [x] Update the documentation about your changes in the [docs](https://github.com/volcengine/verl/tree/main/docs). - [x] Add CI test(s) if necessary.
…& CI tasks (verl-project#1602) ### Checklist Before Starting - [x] Search for similar PR(s). ### What does this PR do? > Add one-line overview of what this PR aims to achieve or accomplish. - Fix sglang megatron support - Add sglang_async megatron support - Add CI task to protect megatron-sglang impl ### Test > For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluatuion results, etc. https://wandb.ai/swordfaith/gsm8k_async_rl/runs/6h7apmbn?nw=nwuserswordfaith ### Additional Info. - **Issue Number**: Fixes issue # or discussion # if any. - **Training**: [Note which backend this PR will affect: FSDP, Megatron, both, or none] - **Inference**: SGLang ### Checklist Before Submitting - [x] Read the [Contribute Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide). - [x] Apply [pre-commit checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting). - [x] Add `[BREAKING]` to the PR title if it breaks any API. - [x] Update the documentation about your changes in the [docs](https://github.com/volcengine/verl/tree/main/docs). - [x] Add CI test(s) if necessary. --------- Co-authored-by: BlueSpace <gaoziyuan19@mails.ucas.ac.cn>
… hyperlink (verl-project#1673) …res and hyperlink ### Checklist Before Starting - [x] Search for similar PR(s). ### What does this PR do? modify the installation method of vllm on different architectures and hyperlink ### High-Level Design > Demonstrate the high-level design if this PR is complex. ### Specific Changes 1、modify the installation method of vllm on different architectures 2、modify syntax of hyperlink ### API > Demonstrate how the API changes if any. ### Usage Example > Provide usage example(s) for easier usage. ```python # Add code snippet or script demonstrating how to use this ``` ### Test > For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluatuion results, etc. ### Additional Info. - **Issue Number**: Fixes issue # or discussion # if any. - **Training**: [Note which backend this PR will affect: FSDP, Megatron, both, or none] - **Inference**: [Note which backend this PR will affect: vLLM, SGLang, both, or none] ### Checklist Before Submitting - [x] Read the [Contribute Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide). - [x] Apply [pre-commit checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting). - [x] Add `[BREAKING]` to the PR title if it breaks any API. - [x] Update the documentation about your changes in the [docs](https://github.com/volcengine/verl/tree/main/docs). - [x] Add CI test(s) if necessary.
### Checklist Before Starting - [ ] Search for similar PR(s). ### What does this PR do? In megatron-core, `vocab_parallel_log_probs_from_logits` is an inplace operator that would modify the logits in place to save memory. This makes the `vocab_parallel_entropy` produces incorrect results if `vocab_parallel_entropy` is computed after `vocab_parallel_log_probs_from_logits`. We swap the order to make sure the result is correct. ### High-Level Design > Demonstrate the high-level design if this PR is complex. ### Specific Changes > List the specific changes. ### API > Demonstrate how the API changes if any. ### Usage Example > Provide usage example(s) for easier usage. ```python # Add code snippet or script demonstrating how to use this ``` ### Test > For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluatuion results, etc. ### Additional Info. - **Issue Number**: Fixes issue # or discussion # if any. - **Training**: [Note which backend this PR will affect: FSDP, Megatron, both, or none] - **Inference**: [Note which backend this PR will affect: vLLM, SGLang, both, or none] ### Checklist Before Submitting - [ ] Read the [Contribute Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide). - [ ] Apply [pre-commit checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting). - [ ] Add `[BREAKING]` to the PR title if it breaks any API. - [ ] Update the documentation about your changes in the [docs](https://github.com/volcengine/verl/tree/main/docs). - [ ] Add CI test(s) if necessary.
### Checklist Before Starting - [X] Search for similar PR(s). ### What does this PR do? Currently, the device to run on depends on whether `is_cuda_available` is True on the driver process. However, the driver process may be a CPU process that can't see cuda devices even when cuda devices are available. Thus, it's not appropriate to use `is_cuda_available` to set the device. Instead, we should set the device explicitly. In the future, we may have a ray cluster with both NPU and GPU, and we can use different devices for different workloads. Thus, setting device explicitly would be a better choice in the long run. Why CI can't trigger this problem: because we directly run `python3 xxx` on CI machine instead of using a standard ray cluster that has dedicated CPUs for head. CI machines all have GPUs. ### High-Level Design > Demonstrate the high-level design if this PR is complex. ### Specific Changes > List the specific changes. ### API > Demonstrate how the API changes if any. ### Usage Example > Provide usage example(s) for easier usage. ```python # Add code snippet or script demonstrating how to use this ``` ### Test > For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluatuion results, etc. ### Additional Info. - **Issue Number**: Fixes issue # or discussion # if any. - **Training**: [Note which backend this PR will affect: FSDP, Megatron, both, or none] - **Inference**: [Note which backend this PR will affect: vLLM, SGLang, both, or none] ### Checklist Before Submitting - [ ] Read the [Contribute Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide). - [ ] Apply [pre-commit checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting). - [ ] Add `[BREAKING]` to the PR title if it breaks any API. - [ ] Update the documentation about your changes in the [docs](https://github.com/volcengine/verl/tree/main/docs). - [ ] Add CI test(s) if necessary.
### Checklist Before Starting - [x] Search for similar PR(s). ### What does this PR do? update ascend_quick_start.rst ### High-Level Design > Demonstrate the high-level design if this PR is complex. ### Specific Changes 1. rename ascend_quick_start.rst 2. add the accuracy and throughput data of GRPO. ### API > Demonstrate how the API changes if any. ### Usage Example > Provide usage example(s) for easier usage. ```python # Add code snippet or script demonstrating how to use this ``` ### Test > For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluatuion results, etc. ### Additional Info. - **Issue Number**: Fixes issue # or discussion # if any. - **Training**: [Note which backend this PR will affect: FSDP, Megatron, both, or none] - **Inference**: [Note which backend this PR will affect: vLLM, SGLang, both, or none] ### Checklist Before Submitting - [x] Read the [Contribute Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide). - [x] Apply [pre-commit checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting). - [x] Add `[BREAKING]` to the PR title if it breaks any API. - [x] Update the documentation about your changes in the [docs](https://github.com/volcengine/verl/tree/main/docs). - [x] Add CI test(s) if necessary.
### Checklist Before Starting - [ ] Search for similar PR(s). ### What does this PR do? Non_fused_kernels passing arguments error causes Qwen2_5_VL failed. ### High-Level Design > Demonstrate the high-level design if this PR is complex. ### Specific Changes > List the specific changes. ### API > Demonstrate how the API changes if any. ### Usage Example > Provide usage example(s) for easier usage. ```python # Add code snippet or script demonstrating how to use this ``` ### Test > For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluatuion results, etc. ### Additional Info. - **Issue Number**: Fixes issue # or discussion # if any. - **Training**: [Note which backend this PR will affect: FSDP, Megatron, both, or none] - **Inference**: [Note which backend this PR will affect: vLLM, SGLang, both, or none] ### Checklist Before Submitting - [ ] Read the [Contribute Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide). - [ ] Apply [pre-commit checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting). - [ ] Add `[BREAKING]` to the PR title if it breaks any API. - [ ] Update the documentation about your changes in the [docs](https://github.com/volcengine/verl/tree/main/docs). - [x] Add CI test(s) if necessary. --------- Co-authored-by: hoshi-hiyouga <hiyouga@buaa.edu.cn>
### Checklist Before Starting - [ ] Search for similar PR(s). ### What does this PR do? Refactor and reduce some tests scope to reduce unrelated tests. ### High-Level Design > Demonstrate the high-level design if this PR is complex. ### Specific Changes > List the specific changes. ### API > Demonstrate how the API changes if any. ### Usage Example > Provide usage example(s) for easier usage. ```python # Add code snippet or script demonstrating how to use this ``` ### Test > For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluatuion results, etc. ### Additional Info. - **Issue Number**: Fixes issue # or discussion # if any. - **Training**: [Note which backend this PR will affect: FSDP, Megatron, both, or none] - **Inference**: [Note which backend this PR will affect: vLLM, SGLang, both, or none] ### Checklist Before Submitting - [ ] Read the [Contribute Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide). - [ ] Apply [pre-commit checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting). - [ ] Add `[BREAKING]` to the PR title if it breaks any API. - [ ] Update the documentation about your changes in the [docs](https://github.com/volcengine/verl/tree/main/docs). - [x] Add CI test(s) if necessary.
…ion (verl-project#1709) ### Checklist Before Starting - [x] Search for similar PR(s). ### What does this PR do? Add a visual explanation of the configuration to the documentation ### Checklist Before Submitting - [x] Read the [Contribute Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide). - [x] Apply [pre-commit checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting). - [x] Add `[BREAKING]` to the PR title if it breaks any API. - [x] Update the documentation about your changes in the [docs](https://github.com/volcengine/verl/tree/main/docs). - [x] Add CI test(s) if necessary.
Co-authored-by: Bihan Rana <bihan@Bihans-MacBook-Pro.local> Co-authored-by: peterschmidt85 <andrey.cheptsov@gmail.com>
…in `trainer` and `utils` (verl-project#1397) ### Checklist Before Starting - [x] Search for similar PR(s). ### What does this PR do? * This PR adds doc string for the public methods inside `trainer` and `utils` module, so that these methods can be reused and referenced better. * Two new doc page `PPO Trainer Interface` and `Utilities` were also provided under the API Reference section. * Renamed one function `verl.utils._default_compute_score` to `verl.utils.default_compute_score`, as it was an external function used by other modules, i.e., trainer and recipe; <img width="1093" alt="Screenshot 2025-05-26 at 9 20 31 PM" src="https://github.com/user-attachments/assets/e361e6bd-a33b-426b-85b4-9fe93ab1e398" /> ### TODO This is the second of a series of PRs to improve and stabilize the docs and API. Stacked on top of verl-project#1396 TODO includes adding more useful utility functions to the doc with improved doc strings. ### Additional Info. - **Issue Number**: Fixes issue # or discussion # if any. - **Training**: [Note which backend this PR will affect: FSDP, Megatron, both, or none] - **Inference**: [Note which backend this PR will affect: vLLM, SGLang, both, or none] ### Checklist Before Submitting - [x] Read the [Contribute Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide). - [x] Apply [pre-commit checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting). - [ ] Add `[BREAKING]` to the PR title if it breaks any API. - [x] Update the documentation about your changes in the [docs](https://github.com/volcengine/verl/tree/main/docs). - [x] Add CI test(s) if neccessary. --------- Signed-off-by: Hongpeng Guo <hg5@illinois.edu> Co-authored-by: H <linhaibin.eric@gmail.com>
…ng purpose (verl-project#1712) ### Checklist Before Starting - [X] Search for similar PR(s). ### What does this PR do? - Support logging rollout probs vs. actor probs for debugging purpose - Support both vllm and sglang async ### High-Level Design > Demonstrate the high-level design if this PR is complex. ### Specific Changes > List the specific changes. ### API > Demonstrate how the API changes if any. ### Usage Example > Provide usage example(s) for easier usage. ```python # Add code snippet or script demonstrating how to use this ``` ### Test > For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluatuion results, etc. ### Additional Info. - **Issue Number**: Fixes issue # or discussion # if any. - **Training**: [Note which backend this PR will affect: FSDP, Megatron, both, or none] - **Inference**: [Note which backend this PR will affect: vLLM, SGLang, both, or none] ### Checklist Before Submitting - [ ] Read the [Contribute Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide). - [ ] Apply [pre-commit checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting). - [ ] Add `[BREAKING]` to the PR title if it breaks any API. - [ ] Update the documentation about your changes in the [docs](https://github.com/volcengine/verl/tree/main/docs). - [ ] Add CI test(s) if necessary.
… utils test (verl-project#1729) ### Checklist Before Starting - [x] Search for similar PR(s). ### What does this PR do? Handle comments after verl-project#1397 being merged: 1. Add back `_default_compute_score` API and mark it as deprecated; 2. Fix a broken ci test `ray_utils_test` on `parallel_put`; ### Checklist Before Submitting - [x] Read the [Contribute Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide). - [x] Apply [pre-commit checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting). - [ ] Add `[BREAKING]` to the PR title if it breaks any API. - [ ] Update the documentation about your changes in the [docs](https://github.com/volcengine/verl/tree/main/docs). - [ ] Add CI test(s) if necessary. --------- Signed-off-by: Hongpeng Guo <hg5@illinois.edu>
### Checklist Before Starting - [x] Search for similar PR(s). ### What does this PR do? This PR updates the README.md for the SPIN recipe to improve accuracy and completeness. Key changes include corrections and additions to the method description, the inclusion of related Works, and a more concise introduction. ### High-Level Design N/A - Focuses on documentation improvements for clarity and accuracy. ### Specific Changes - Corrected and supplemented the description of the SPIN methodology. - Inclusion of related Works along with concise introductions to relevant papers/concepts. - Refined and clarified the introductory sections of the README. ### API N/A - Changes are limited to README.md documentation. ### Usage Example N/A - This PR does not primarily focus on usage examples, but rather on descriptive content. ```python # No new standalone code snippets are part of this PR itself.
…l-project#1700) ### What does this PR do? Fix Configuration for Micro Batch Size in Megatron's Ref Policy ### High-Level Design This pull request addresses an issue with the micro batch size configuration in the ref policy of Megatron. The default ppo_megatron_trainer.yaml only includes two configurations: log_prob_micro_batch_size and log_prob_micro_batch_size_per_gpu. https://github.com/volcengine/verl/blob/54c9b7364c2d188b2ba4107404cfa3c2b446df19/verl/trainer/config/ppo_megatron_trainer.yaml#L119-L120 However, in `megatron_workers.py`, the required configuration is ref.log_prob_micro_batch_size_per_gpu https://github.com/volcengine/verl/blob/54c9b7364c2d188b2ba4107404cfa3c2b446df19/verl/workers/megatron_workers.py#L517-L518 or in `megatron_actor.py ` the required configuration is ref.ppo_micro_batch_size_per_gpu, https://github.com/volcengine/verl/blob/54c9b7364c2d188b2ba4107404cfa3c2b446df19/verl/workers/actor/megatron_actor.py#L271-L274 which are not directly related to ppo_micro_batch_size. To resolve this, I have made modifications to the configuration calculations and added raise ValueError statements to ensure that the necessary parameters are correctly defined. This update ensures that the required parameters are properly handled, preventing runtime errors and improving the overall robustness of the training process. ### Changes Made: - Modified the configuration calculations in megatron_workers.py. - Added raise ValueError statements to check for the presence of log_prob_micro_batch_size_per_gpu and ppo_micro_batch_size_per_gpu.
…e workloads (verl-project#1617) ### Checklist Before Starting - [x] Search for similar PR(s). ### What does this PR do? 1. Megatron support dynamic batch size, to rebalance the workloads. 2. Fix missing critic metrics. ### High-Level Design Follow the FSDP's dynamic batch size. ### Specific Changes Use the `rearrange_micro_batches` API, but compatible with Megatron VPP constraints. ```py vpp_size = mpu.get_virtual_pipeline_model_parallel_world_size() if vpp_size is not None and vpp_size > 1: microbatch_group_size_per_vp_stage = self.tf_config.microbatch_group_size_per_vp_stage micro_batches, indices = rearrange_micro_batches(batch=mini_batch.batch, num_batches_devided_by=microbatch_group_size_per_vp_stage, max_token_len=max_token_len) assert len(micro_batches) % self.tf_config.microbatch_group_size_per_vp_stage == 0, f"micro_batches {micro_batches} must be divisible by microbatch_group_size_per_vp_stage {microbatch_group_size_per_vp_stage} for megatron backend" else: micro_batches, indices = rearrange_micro_batches(batch=mini_batch.batch, max_token_len=max_token_len) ``` @vermouth1992 please check whether it makes sense. Megatron's constraint when using interleaving pipeline: ```py # If the final micro-batch group has fewer micro-batches than pipeline-parallel size, # the pipeline will have dependency bubbles. final_microbatch_group_size = num_microbatches % config.microbatch_group_size_per_vp_stage if 0 < final_microbatch_group_size < pipeline_parallel_size: msg = 'The remainder of M (the total micro-batches) divided by N (number of ' msg += 'contiguous micro-batches in a virtual pipeline stage) should be 0, ' msg += 'or larger than or equal to the pipeline-parallel size, but it is ' msg += f'{final_microbatch_group_size}. ' msg += 'Otherwise, it introduces dependency bubbles in the pipeline ' msg += 'and reduces throughput.' raise RuntimeError(msg) ``` ### API Megatron forward_backward_batch has changed input, and the output has become a dict, containing original `output` and the `indices` needed for compute_old_log_probs. ### Usage Example ```bash actor_rollout_ref.actor.use_dynamic_bsz=${USE_DYNAMIC_BSZ} \ actor_rollout_ref.actor.ppo_max_token_len_per_gpu=${ppo_max_token_len_per_gpu} \ critic.ppo_max_token_len_per_gpu=${forward_max_token_len_per_gpu} \ ``` Other models will directly copy the config. ### Test > For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluatuion results, etc. ### Additional Info. - **Issue Number**: Fixes issue # or discussion # if any. - **Training**: [Note which backend this PR will affect: FSDP, Megatron, both, or none] - **Inference**: [Note which backend this PR will affect: vLLM, SGLang, both, or none] ### Checklist Before Submitting - [x] Read the [Contribute Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide). - [x] Apply [pre-commit checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting). - [x] Add `[BREAKING]` to the PR title if it breaks any API. - [ ] Update the documentation about your changes in the [docs](https://github.com/volcengine/verl/tree/main/docs). - [x] Add CI test(s) if necessary.
Signed-off-by: Jianbing Dong <jianbingd@nvidia.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Checklist Before Starting
What does this PR do?
High-Level Design
Specific Changes
API
Usage Example
# Add code snippet or script demonstrating how to use thisTest
Additional Info.
Checklist Before Submitting
[BREAKING]to the PR title if it breaks any API.