Skip to content

Integrated kernel#6

Merged
ETOgaosion merged 42 commits intomainfrom
integrated-kernel
Jun 8, 2025
Merged

Integrated kernel#6
ETOgaosion merged 42 commits intomainfrom
integrated-kernel

Conversation

@ETOgaosion
Copy link
Collaborator

Checklist Before Starting

  • Search for similar PR(s).

What does this PR do?

Add one-line overview of what this PR aims to achieve or accomplish.

High-Level Design

Demonstrate the high-level design if this PR is complex.

Specific Changes

List the specific changes.

API

Demonstrate how the API changes if any.

Usage Example

Provide usage example(s) for easier usage.

# Add code snippet or script demonstrating how to use this 

Test

For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluatuion results, etc.

Additional Info.

  • Issue Number: Fixes issue # or discussion # if any.
  • Training: [Note which backend this PR will affect: FSDP, Megatron, both, or none]
  • Inference: [Note which backend this PR will affect: vLLM, SGLang, both, or none]

Checklist Before Submitting

  • Read the Contribute Guide.
  • Apply pre-commit checks.
  • Add [BREAKING] to the PR title if it breaks any API.
  • Update the documentation about your changes in the docs.
  • Add CI test(s) if necessary.

Geaming2002 and others added 30 commits May 23, 2025 09:43
…ct#1637)

### Checklist Before Starting

- [x] Search for similar PR(s).

### What does this PR do?

Added position_ids parameter to the model.generate method call to
provide explicit control over token positions during text generation. I
don't quite understand why have obtained position ids above but not
passed them to generate, so I modified this.😂

### High-Level Design

> Demonstrate the high-level design if this PR is complex.

### Specific Changes

> List the specific changes.

### API

> Demonstrate how the API changes if any.

### Usage Example

> Provide usage example(s) for easier usage.

```python
# Add code snippet or script demonstrating how to use this 
```

### Test

> For changes that can not be tested by CI (e.g., algorithm
implementation, new model support), validate by experiment(s) and show
results like training curve plots, evaluatuion results, etc.

### Additional Info.

- **Issue Number**: Fixes issue # or discussion # if any.
- **Training**: [Note which backend this PR will affect: FSDP, Megatron,
both, or none]
- **Inference**: [Note which backend this PR will affect: vLLM, SGLang,
both, or none]

### Checklist Before Submitting

- [x] Read the [Contribute
Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide).
- [x] Apply [pre-commit
checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting).
- [x] Add `[BREAKING]` to the PR title if it breaks any API.
- [ ] Update the documentation about your changes in the
[docs](https://github.com/volcengine/verl/tree/main/docs).
- [ ] Add CI test(s) if necessary.
…erl-project#1562)

### Checklist Before Starting

- [x] Search for similar PR(s).

### What does this PR do?

This PR enables the Megatron backend checkpoint manager to save hf model
config into verl checkpoints, and simplify our CI since the
`--hf_model_path` has been deprecated in
verl-project#1468, fixes the comment
verl-project#1468 (comment).

Note: several changed lines in `verl/utils/megatron_utils.py` are
unrelated to this PR; they were automatically reformatted by pre-commit
hooks.

### Test

The current CI e2e tests should sufficient cover for this PR.

### Additional Info.

- **Training**: Megatron
- **Inference**: none

### Checklist Before Submitting

- [x] Read the [Contribute
Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide).
- [x] Apply [pre-commit
checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting).
- [x] Add `[BREAKING]` to the PR title if it breaks any API.
- [x] Update the documentation about your changes in the
[docs](https://github.com/volcengine/verl/tree/main/docs).
- [x] Add CI test(s) if neccessary.
### Checklist Before Starting

- [x] Search for similar PR(s).

### What does this PR do?

This PR supports activation offloading, and currently it's only for FSDP
backend.

### High-Level Design

Our implementation is based on the
[one](https://github.com/NVIDIA/TransformerEngine/blob/main/transformer_engine/pytorch/cpu_offload.py)
in TransformerEngine. For efficiency, it groups activations by
TransformerLayer and offloads activation groups asynchronously. This
means that the offloading of the i-th activation group and the
computation of the i+1-th activation group happen at the same time, and
there are at most two activation groups in GPU memory.

### Specific Changes

1. Add activation offloading support.

### API

### Usage Example

``` 
export VLLM_ATTENTION_BACKEND=XFORMERS

python3 -m verl.trainer.main_ppo \
    algorithm.adv_estimator=grpo \
    data.train_files=./data/gsm8k/train.parquet \
    data.val_files=./data/gsm8k/test.parquet \
    data.train_batch_size=512 \
    data.max_prompt_length=512 \
    data.max_response_length=1024 \
    data.filter_overlong_prompts=True \
    data.truncation='error' \
    actor_rollout_ref.model.path=./huggingface.co/Qwen/Qwen2-7B-Instruct \
    actor_rollout_ref.actor.optim.lr=1e-6 \
    actor_rollout_ref.model.use_remove_padding=True \
    actor_rollout_ref.actor.ppo_mini_batch_size=256 \
    actor_rollout_ref.actor.ppo_micro_batch_size_per_gpu=64 \
    actor_rollout_ref.actor.use_kl_loss=True \
    actor_rollout_ref.actor.kl_loss_coef=0.001 \
    actor_rollout_ref.actor.kl_loss_type=low_var_kl \
    actor_rollout_ref.actor.entropy_coeff=0 \
    actor_rollout_ref.model.enable_gradient_checkpointing=True \
    actor_rollout_ref.model.enable_activation_offload=True \
    actor_rollout_ref.actor.fsdp_config.param_offload=False \
    actor_rollout_ref.actor.fsdp_config.optimizer_offload=False \
    actor_rollout_ref.rollout.log_prob_micro_batch_size_per_gpu=64 \
    actor_rollout_ref.rollout.tensor_model_parallel_size=2 \
    actor_rollout_ref.rollout.name=vllm \
    actor_rollout_ref.rollout.gpu_memory_utilization=0.6 \
    actor_rollout_ref.rollout.n=5 \
    actor_rollout_ref.ref.log_prob_micro_batch_size_per_gpu=64 \
    actor_rollout_ref.ref.fsdp_config.param_offload=True \
    algorithm.use_kl_in_reward=False \
    trainer.critic_warmup=0 \
    trainer.logger=['console','tensorboard'] \
    trainer.project_name='verl_grpo_example_gsm8k' \
    trainer.experiment_name='qwen2_7b_function_rm' \
    trainer.n_gpus_per_node=8 \
    trainer.val_before_train=False \
    trainer.nnodes=1 \
    trainer.save_freq=-1 \
    trainer.test_freq=5 \
    trainer.total_epochs=15

 ```


### Test

We conducted experiments on the Qwen2 7B model based on the above script. The memory and throughput data are shown in the figures below, where the blue line represents activation offloading.
<img width="351" alt="image" src="https://github.com/user-attachments/assets/207576a1-3f47-4b40-bf19-60cf8105d609" /> <img width="361" alt="image" src="https://github.com/user-attachments/assets/d58f0f8b-eb5f-4e19-a892-4d778ff26135" />

### Additional Info.

- **Issue Number**: none
- **Training**: This PR will affect FSDP backend
- **Inference**: none

### Checklist Before Submitting

- [x] Read the [Contribute Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide).
- [x] Apply [pre-commit checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting).
- [x] Add `[BREAKING]` to the PR title if it breaks any API.
- [x] Update the documentation about your changes in the [docs](https://github.com/volcengine/verl/tree/main/docs).
- [x] Add CI test(s) if neccessary.
# What does this PR do?

This PR adds `loss_agg_mode` to critics.

# Before submitting

- [x] Did you read the [Contribute
Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide)
and finish the [code format
check](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting)?
- [x] Did you make sure to update the documentations with your changes
in the [docs](https://github.com/volcengine/verl/tree/main/docs)
especially for breaking config etc?
- [x] Did you write any test cases if neccessary? Please add CI tests to
your new feature.

# Additional Info

- **Issue Number**: none
- **Training**: both
- **Inference**: none
### Checklist Before Starting

- [x] Search for similar PR(s).

### What does this PR do?

This PR fixes wrong initialization so that verl only loads reference
policy when needed.

### Additional Info.

- **Issue Number**: none
- **Training**: none
- **Inference**: none

### Checklist Before Submitting

- [x] Read the [Contribute
Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide).
- [x] Apply [pre-commit
checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting).
- [x] Add `[BREAKING]` to the PR title if it breaks any API.
- [x] Update the documentation about your changes in the
[docs](https://github.com/volcengine/verl/tree/main/docs).
- [x] Add CI test(s) if necessary.
### Checklist Before Starting

- [x] Search for similar PR(s).

### What does this PR do?

This PR adds ignore patterns to CI for SPIN & SPPO.

### Additional Info.

- **Issue Number**: none
- **Training**: none
- **Inference**: none

### Checklist Before Submitting

- [x] Read the [Contribute
Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide).
- [x] Apply [pre-commit
checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting).
- [x] Add `[BREAKING]` to the PR title if it breaks any API.
- [x] Update the documentation about your changes in the
[docs](https://github.com/volcengine/verl/tree/main/docs).
- [x] Add CI test(s) if necessary.
### Checklist Before Starting

- [x] Search for similar PR(s).

### What does this PR do?

This PR adds entropy computation and logging to DAPO trainer, aligning
with other trainers.

### Additional Info.

- **Issue Number**: verl-project#1455
- **Training**: none
- **Inference**: none

### Checklist Before Submitting

- [x] Read the [Contribute
Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide).
- [x] Apply [pre-commit
checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting).
- [x] Add `[BREAKING]` to the PR title if it breaks any API.
- [x] Update the documentation about your changes in the
[docs](https://github.com/volcengine/verl/tree/main/docs).
- [x] Add CI test(s) if necessary.
For developers, you can follow the docs: docs/ascend/ascend.rst

This pr is committed for supporting Ascend NPU backend.
Co-authored-by: Chendong98
[chendong136@huawei.com](mailto:chendong136@huawei.com)
Co-authored-by: zheliuyu <15750543867@163.com>
Co-authored-by: celestialli
[celestialli@outlook.com](mailto:celestialli@outlook.com)
In this pr, we add the capability to determine the type of NPU device
and we also add a new script for training on NPU.

These are change lists:

1. pyproject.toml change verison of vllm
2. requirements-npu.txt requirements for NPU
3. verl/bert_padding.py Adapted from
https://github.com/mlcommons/training_results_v1.1/blob/main/NVIDIA/benchmarks/bert/implementations/pytorch/padding.py
4. verl/single_controller/ray/base.py
5. verl/third_party/vllm/vllm_spmd/dtensor_weight_loaders.py
6. verl/trainer/fsdp_sft_trainer.py
7. verl/utils/flops_counter.py
8. verl/utils/fsdp_utils.py
9. verl/workers/actor/dp_actor.py
10. verl/workers/critic/dp_critic.py
11. verl/workers/fsdp_workers.py
12. verl/workers/rollout/vllm_rollout/vllm_rollout_spmd.py
13. verl/workers/sharding_manager/fsdp_vllm.py
14. verl/utils/device.py get device type for different device
15. docs/ascend/ascend.md 

Here are our roadmap:

**RoadMap**

- [x] sft
- [x] ppo
- [x] grpo

News

[2025.03.31] Add result of SFT and GRPO. Qwen2-7B-Instruct was tested on
2*8 devices, and many params related to batch_size need to be reduced.
So this result is only for reference. We will announce the reward
results of the default params as soon as sleep mode is supported.

[2025.03.03] Modify the adaptation method of Ray

[2025.02.25] The PPO algorithm is supported for training on NPU with the
FSDP backend.

[2025.02.23] The SFT algorithm is supported for training on NPU with the
FSDP backend.

[2025.02.21] The GRPO algorithm is supported for training on NPU with
the FSDP backend.

Requirements
We use this PR testing on Ascend NPU and GPU to ensure the same codes
can run on different devices. The device information is 8 Atlas 800T A2
and 8 A100. Other software information is shown in the following table.

| Software | Version | 
|:-------|-------:|
| transformers  | 4.47.1  | 
| accelerate      | 1.3.0  | 
| torch_npu      | 2.5.1.rc1|
|CANN             | 8.1.RC1 (Not Released)|

About mean error
Due to differences in hardware structure, we cannot guarantee that the
loss of Ascend NPU is exactly the same as that of the GPU. According to
our experience, the loss differences less than 2% is acceptable. If the
loss difference is greater than 2%, we will try to fix it. The
calculation formula is as follows.

![loss_comparison](https://github.com/user-attachments/assets/4f62f713-9240-4324-bf7d-3ae59fc85b05)


N represents the number of training steps. For more information, please
refer to [Calculation accuracy
description](https://www.hiascend.com/document/detail/zh/Pytorch/600/ptmoddevg/trainingmigrguide/LMaccuracy_0001.html)

---------

Co-authored-by: Chendong98 <chendong136@huawei.com>
Co-authored-by: zheliuyu <15750543867@163.com>
verl-project#1627)

…l_len before generation

### Checklist Before Starting

- [x] Search for similar PR(s).

### What does this PR do?

This PR adds a validation step to prevent generation requests that
exceed the model’s maximum context length in SGLang. Without this check,
multi-turn RL training can fail when the combined length of the prompt
and the maximum response exceeds the model limit. The new validation
ensures `prompt_len + max_resp_len <= max_model_len` before sending
requests to the SGLang engine.


### Test

Successfully tested with my multiturn RL dataset with `max_turns==30`
which keeps failing with the following error before this
change(Qwen2.5-32B-instruct + GRPO):
```
Traceback (most recent call last):
  File "/home/jobuser/resources/verl/trainer/main_ppo.py", line 64, in main
    run_ppo(config)
  File "/home/jobuser/resources/verl/trainer/main_ppo.py", line 76, in run_ppo
    ray.get(runner.run.remote(config))
  File "/home/jobuser/.local/lib/python3.10/site-packages/ray/_private/auto_init_hook.py", line 21, in auto_init_wrapper
    return fn(*args, **kwargs)
  File "/home/jobuser/.local/lib/python3.10/site-packages/ray/_private/client_mode_hook.py", line 103, in wrapper
    return func(*args, **kwargs)
  File "/home/jobuser/.local/lib/python3.10/site-packages/ray/_private/worker.py", line 2822, in get
    values, debugger_breakpoint = worker.get_objects(object_refs, timeout=timeout)
  File "/home/jobuser/.local/lib/python3.10/site-packages/ray/_private/worker.py", line 930, in get_objects
    raise value.as_instanceof_cause()
ray.exceptions.RayTaskError(ValueError): ray::TaskRunner.run() (pid=1150536, ip=100.96.248.206, actor_id=85b22be1ed8ef671c739638a01000000, repr=<main_ppo.TaskRunner object at 0x796b0bba7010>)
  File "/home/jobuser/resources/verl/trainer/main_ppo.py", line 183, in run
    trainer.fit()
  File "/home/jobuser/resources/verl/trainer/ppo/ray_trainer.py", line 872, in fit
    val_metrics = self._validate()
  File "/home/jobuser/resources/verl/trainer/ppo/ray_trainer.py", line 607, in _validate
    test_output_gen_batch_padded = self.actor_rollout_wg.generate_sequences(test_gen_batch_padded)
  File "/home/jobuser/resources/verl/single_controller/ray/base.py", line 49, in func
    output = ray.get(output)
ray.exceptions.RayTaskError(ValueError): ray::WorkerDict.actor_rollout_generate_sequences() (pid=1169888, ip=100.96.248.206, actor_id=6deb9fd4b4ff01530920ada301000000, repr=<verl.single_controller.ray.base.WorkerDict object at 0x7e41e90afa90>)
  File "/home/jobuser/resources/verl/single_controller/ray/base.py", line 625, in func
    return getattr(self.worker_dict[key], name)(*args, **kwargs)
  File "/home/jobuser/resources/verl/single_controller/base/decorator.py", line 534, in inner
    return func(*args, **kwargs)
  File "/home/jobuser/resources/verl/workers/fsdp_workers.py", line 630, in generate_sequences
    output = self.rollout.generate_sequences_with_tools(prompts=prompts)
  File "/home/jobuser/resources/verl/utils/debug/performance.py", line 78, in f
    return self.log(decorated_function, *args, **kwargs)
  File "/home/jobuser/resources/verl/utils/debug/performance.py", line 88, in log
    output = func(*args, **kwargs)
  File "/home/jobuser/.local/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
    return func(*args, **kwargs)
  File "/home/jobuser/resources/verl/workers/rollout/sglang_rollout/async_sglang_rollout.py", line 613, in generate_sequences_with_tools
    output_req_list = loop.run_until_complete(
  File "uvloop/loop.pyx", line 1518, in uvloop.loop.Loop.run_until_complete
  File "/home/jobuser/resources/verl/workers/rollout/sglang_rollout/async_sglang_rollout.py", line 529, in _async_rollout_a_request
    output = await self._engine.async_generate(
  File "/home/jobuser/.local/lib/python3.10/site-packages/sglang/srt/entrypoints/engine.py", line 265, in async_generate
    return await generator.__anext__()
  File "/home/jobuser/.local/lib/python3.10/site-packages/sglang/srt/managers/tokenizer_manager.py", line 403, in generate_request
    tokenized_obj = await self._tokenize_one_request(obj)
  File "/home/jobuser/.local/lib/python3.10/site-packages/sglang/srt/managers/tokenizer_manager.py", line 450, in _tokenize_one_request
    self._validate_token_len(obj, input_ids)
  File "/home/jobuser/.local/lib/python3.10/site-packages/sglang/srt/managers/tokenizer_manager.py", line 482, in _validate_token_len
    raise ValueError(error_msg)
ValueError: Requested token count exceeds the model's maximum context length of 32768 tokens. You requested a total of 34009 tokens: 23769 tokens from the input messages and 10240 tokens for the completion. Please reduce the number of tokens in the input messages or the completion to fit within the limit.
```

### Additional Info.

- **Inference**: SGLang,

### Checklist Before Submitting

- [x] Read the [Contribute
Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide).
- [x] Apply [pre-commit
checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting).
- [x] Add `[BREAKING]` to the PR title if it breaks any API.
- [ ] Update the documentation about your changes in the
[docs](https://github.com/volcengine/verl/tree/main/docs).
- [x] Add CI test(s) if necessary.
…t#1638)

### Checklist Before Starting

- [x] Search for similar PR(s).

### What does this PR do?

This simple PR adds support for
[ChainedOptimizer](https://github.com/NVIDIA/Megatron-LM/blob/75b1ca13618bded85c81fb572f58df83ba095dc9/megatron/core/optimizer/optimizer.py#L938)
offloading in the Megatron-LM training environment.

In Megatron-LM, ChainedOptimizer is used when expert parallelism
(expert_parallel > 1, related to verl-project#1467 ) is enabled—commonly in
Mixture-of-Experts (MoE) models.

This has been tested and validated with the Qwen3-235B-22A model
configuration.


### High-Level Design

> Demonstrate the high-level design if this PR is complex.

### Specific Changes

> List the specific changes.

### API

> Demonstrate how the API changes if any.

### Usage Example

> Provide usage example(s) for easier usage.

```python
...
actor_rollout_ref.actor.megatron.optimizer_offload=True \
actor_rollout_ref.actor.megatron.expert_model_parallel_size=16 \
...
```

### Test

> For changes that can not be tested by CI (e.g., algorithm
implementation, new model support), validate by experiment(s) and show
results like training curve plots, evaluatuion results, etc.

### Additional Info.

- **Issue Number**: Fixes issue # or discussion # if any.
- **Training**: [Megatron]
- **Inference**: [none]

### Checklist Before Submitting

- [x] Read the [Contribute
Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide).
- [x] Apply [pre-commit
checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting).
- [ ] Add `[BREAKING]` to the PR title if it breaks any API.
- [ ] Update the documentation about your changes in the
[docs](https://github.com/volcengine/verl/tree/main/docs).
- [x] Add CI test(s) if necessary.

---------

Co-authored-by: charlie.cs <charlie.cs@kakaocorp.com>
Co-authored-by: ETOgaosion <gaoziyuan19@mails.ucas.ac.cn>
### Checklist Before Starting

- [x] Search for similar PR(s).

### What does this PR do?

Shifts fused_linear_for_ppo into model.forward for FSDP

### High-Level Design

Self explaining

### Specific Changes

- Update monkey patch to return log_probs and entropy instead of
last_hidden_state.

### API

No changes

### Usage Example

```sh
actor_rollout_ref.model.use_fused_kernels=True
```

### Test


![image](https://github.com/user-attachments/assets/c6af68fb-0200-4aee-9596-0b445afdc562)


### Additional Info.

- This is to fix verl-project#1565 
- The original bug arises because we tried to access
model.lm_head.weight from outside of the FSDP wrapped context.

### Checklist Before Submitting

- [x] Read the [Contribute
Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide).
- [x] Apply [pre-commit
checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting).
- [x] Add `[BREAKING]` to the PR title if it breaks any API.
- [x] Update the documentation about your changes in the
[docs](https://github.com/volcengine/verl/tree/main/docs).
- [x] Add CI test(s) if necessary.
### Checklist Before Starting

- [x] Search for similar PR(s).

### What does this PR do?

This PR fixes:

- DAPO CI triggering path patterns outdated since verl-project#1392
- `response_mask` computation missing but skipping the CI test in verl-project#1652 

### Tests

- [x] DAPO CI is correctly triggered and passed, e.g.,
https://github.com/volcengine/verl/actions/runs/15223958183/job/42823610223?pr=1666

### Additional Info.

- **Issue Number**: verl-project#1392 , verl-project#1652 
- **Training**: none
- **Inference**: none

### Checklist Before Submitting

- [x] Read the [Contribute
Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide).
- [x] Apply [pre-commit
checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting).
- [x] Add `[BREAKING]` to the PR title if it breaks any API.
- [x] Update the documentation about your changes in the
[docs](https://github.com/volcengine/verl/tree/main/docs).
- [x] Add CI test(s) if necessary.
### Checklist Before Starting

- [x] Search for similar PR(s).

### What does this PR do?

modify the instructions for using verl on ASCEND NPU

### High-Level Design

> Demonstrate the high-level design if this PR is complex.

### Specific Changes

1、Modify table format
2、Modify the installation method of vllm and vllm-ascend

### API

> Demonstrate how the API changes if any.

### Usage Example

> Provide usage example(s) for easier usage.

```python
# Add code snippet or script demonstrating how to use this 
```

### Test

> For changes that can not be tested by CI (e.g., algorithm
implementation, new model support), validate by experiment(s) and show
results like training curve plots, evaluatuion results, etc.

### Additional Info.

- **Issue Number**: Fixes issue # or discussion # if any.
- **Training**: [Note which backend this PR will affect: FSDP, Megatron,
both, or none]
- **Inference**: [Note which backend this PR will affect: vLLM, SGLang,
both, or none]

### Checklist Before Submitting

- [x] Read the [Contribute
Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide).
- [x] Apply [pre-commit
checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting).
- [x] Add `[BREAKING]` to the PR title if it breaks any API.
- [x] Update the documentation about your changes in the
[docs](https://github.com/volcengine/verl/tree/main/docs).
- [x] Add CI test(s) if necessary.
…& CI tasks (verl-project#1602)

### Checklist Before Starting

- [x] Search for similar PR(s).

### What does this PR do?

> Add one-line overview of what this PR aims to achieve or accomplish. 

- Fix sglang megatron support
- Add sglang_async megatron support
- Add CI task to protect megatron-sglang impl

### Test

> For changes that can not be tested by CI (e.g., algorithm
implementation, new model support), validate by experiment(s) and show
results like training curve plots, evaluatuion results, etc.


https://wandb.ai/swordfaith/gsm8k_async_rl/runs/6h7apmbn?nw=nwuserswordfaith

### Additional Info.

- **Issue Number**: Fixes issue # or discussion # if any.
- **Training**: [Note which backend this PR will affect: FSDP, Megatron,
both, or none]
- **Inference**: SGLang
### Checklist Before Submitting

- [x] Read the [Contribute
Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide).
- [x] Apply [pre-commit
checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting).
- [x] Add `[BREAKING]` to the PR title if it breaks any API.
- [x] Update the documentation about your changes in the
[docs](https://github.com/volcengine/verl/tree/main/docs).
- [x] Add CI test(s) if necessary.

---------

Co-authored-by: BlueSpace <gaoziyuan19@mails.ucas.ac.cn>
… hyperlink (verl-project#1673)

…res and hyperlink

### Checklist Before Starting

- [x] Search for similar PR(s).

### What does this PR do?

modify the installation method of vllm on different architectures and
hyperlink

### High-Level Design

> Demonstrate the high-level design if this PR is complex.

### Specific Changes

1、modify the installation method of vllm on different architectures
2、modify syntax of hyperlink 

### API

> Demonstrate how the API changes if any.

### Usage Example

> Provide usage example(s) for easier usage.

```python
# Add code snippet or script demonstrating how to use this 
```

### Test

> For changes that can not be tested by CI (e.g., algorithm
implementation, new model support), validate by experiment(s) and show
results like training curve plots, evaluatuion results, etc.

### Additional Info.

- **Issue Number**: Fixes issue # or discussion # if any.
- **Training**: [Note which backend this PR will affect: FSDP, Megatron,
both, or none]
- **Inference**: [Note which backend this PR will affect: vLLM, SGLang,
both, or none]

### Checklist Before Submitting

- [x] Read the [Contribute
Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide).
- [x] Apply [pre-commit
checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting).
- [x] Add `[BREAKING]` to the PR title if it breaks any API.
- [x] Update the documentation about your changes in the
[docs](https://github.com/volcengine/verl/tree/main/docs).
- [x] Add CI test(s) if necessary.
### Checklist Before Starting

- [ ] Search for similar PR(s).

### What does this PR do?

In megatron-core, `vocab_parallel_log_probs_from_logits` is an inplace
operator that would modify the logits in place to save memory. This
makes the `vocab_parallel_entropy` produces incorrect results if
`vocab_parallel_entropy` is computed after
`vocab_parallel_log_probs_from_logits`. We swap the order to make sure
the result is correct.

### High-Level Design

> Demonstrate the high-level design if this PR is complex.

### Specific Changes

> List the specific changes.

### API

> Demonstrate how the API changes if any.

### Usage Example

> Provide usage example(s) for easier usage.

```python
# Add code snippet or script demonstrating how to use this 
```

### Test

> For changes that can not be tested by CI (e.g., algorithm
implementation, new model support), validate by experiment(s) and show
results like training curve plots, evaluatuion results, etc.

### Additional Info.

- **Issue Number**: Fixes issue # or discussion # if any.
- **Training**: [Note which backend this PR will affect: FSDP, Megatron,
both, or none]
- **Inference**: [Note which backend this PR will affect: vLLM, SGLang,
both, or none]

### Checklist Before Submitting

- [ ] Read the [Contribute
Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide).
- [ ] Apply [pre-commit
checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting).
- [ ] Add `[BREAKING]` to the PR title if it breaks any API.
- [ ] Update the documentation about your changes in the
[docs](https://github.com/volcengine/verl/tree/main/docs).
- [ ] Add CI test(s) if necessary.
### Checklist Before Starting

- [X] Search for similar PR(s).

### What does this PR do?

Currently, the device to run on depends on whether `is_cuda_available`
is True on the driver process. However, the driver process may be a CPU
process that can't see cuda devices even when cuda devices are
available. Thus, it's not appropriate to use `is_cuda_available` to set
the device. Instead, we should set the device explicitly.

In the future, we may have a ray cluster with both NPU and GPU, and we
can use different devices for different workloads. Thus, setting device
explicitly would be a better choice in the long run.

Why CI can't trigger this problem: because we directly run `python3 xxx`
on CI machine instead of using a standard ray cluster that has dedicated
CPUs for head. CI machines all have GPUs.

### High-Level Design

> Demonstrate the high-level design if this PR is complex.

### Specific Changes

> List the specific changes.

### API

> Demonstrate how the API changes if any.

### Usage Example

> Provide usage example(s) for easier usage.

```python
# Add code snippet or script demonstrating how to use this 
```

### Test

> For changes that can not be tested by CI (e.g., algorithm
implementation, new model support), validate by experiment(s) and show
results like training curve plots, evaluatuion results, etc.

### Additional Info.

- **Issue Number**: Fixes issue # or discussion # if any.
- **Training**: [Note which backend this PR will affect: FSDP, Megatron,
both, or none]
- **Inference**: [Note which backend this PR will affect: vLLM, SGLang,
both, or none]

### Checklist Before Submitting

- [ ] Read the [Contribute
Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide).
- [ ] Apply [pre-commit
checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting).
- [ ] Add `[BREAKING]` to the PR title if it breaks any API.
- [ ] Update the documentation about your changes in the
[docs](https://github.com/volcengine/verl/tree/main/docs).
- [ ] Add CI test(s) if necessary.
### Checklist Before Starting

- [x] Search for similar PR(s).

### What does this PR do?

update ascend_quick_start.rst

### High-Level Design

> Demonstrate the high-level design if this PR is complex.

### Specific Changes

1. rename ascend_quick_start.rst
2. add the accuracy and throughput data of GRPO.

### API

> Demonstrate how the API changes if any.

### Usage Example

> Provide usage example(s) for easier usage.

```python
# Add code snippet or script demonstrating how to use this 
```

### Test

> For changes that can not be tested by CI (e.g., algorithm
implementation, new model support), validate by experiment(s) and show
results like training curve plots, evaluatuion results, etc.

### Additional Info.

- **Issue Number**: Fixes issue # or discussion # if any.
- **Training**: [Note which backend this PR will affect: FSDP, Megatron,
both, or none]
- **Inference**: [Note which backend this PR will affect: vLLM, SGLang,
both, or none]

### Checklist Before Submitting

- [x] Read the [Contribute
Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide).
- [x] Apply [pre-commit
checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting).
- [x] Add `[BREAKING]` to the PR title if it breaks any API.
- [x] Update the documentation about your changes in the
[docs](https://github.com/volcengine/verl/tree/main/docs).
- [x] Add CI test(s) if necessary.
### Checklist Before Starting

- [ ] Search for similar PR(s).

### What does this PR do?

Non_fused_kernels passing arguments error causes Qwen2_5_VL failed.

### High-Level Design

> Demonstrate the high-level design if this PR is complex.

### Specific Changes

> List the specific changes.

### API

> Demonstrate how the API changes if any.

### Usage Example

> Provide usage example(s) for easier usage.

```python
# Add code snippet or script demonstrating how to use this 
```

### Test

> For changes that can not be tested by CI (e.g., algorithm
implementation, new model support), validate by experiment(s) and show
results like training curve plots, evaluatuion results, etc.

### Additional Info.

- **Issue Number**: Fixes issue # or discussion # if any.
- **Training**: [Note which backend this PR will affect: FSDP, Megatron,
both, or none]
- **Inference**: [Note which backend this PR will affect: vLLM, SGLang,
both, or none]

### Checklist Before Submitting

- [ ] Read the [Contribute
Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide).
- [ ] Apply [pre-commit
checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting).
- [ ] Add `[BREAKING]` to the PR title if it breaks any API.
- [ ] Update the documentation about your changes in the
[docs](https://github.com/volcengine/verl/tree/main/docs).
- [x] Add CI test(s) if necessary.

---------

Co-authored-by: hoshi-hiyouga <hiyouga@buaa.edu.cn>
### Checklist Before Starting

- [ ] Search for similar PR(s).

### What does this PR do?

Refactor and reduce some tests scope to reduce unrelated tests.

### High-Level Design

> Demonstrate the high-level design if this PR is complex.

### Specific Changes

> List the specific changes.

### API

> Demonstrate how the API changes if any.

### Usage Example

> Provide usage example(s) for easier usage.

```python
# Add code snippet or script demonstrating how to use this 
```

### Test

> For changes that can not be tested by CI (e.g., algorithm
implementation, new model support), validate by experiment(s) and show
results like training curve plots, evaluatuion results, etc.

### Additional Info.

- **Issue Number**: Fixes issue # or discussion # if any.
- **Training**: [Note which backend this PR will affect: FSDP, Megatron,
both, or none]
- **Inference**: [Note which backend this PR will affect: vLLM, SGLang,
both, or none]

### Checklist Before Submitting

- [ ] Read the [Contribute
Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide).
- [ ] Apply [pre-commit
checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting).
- [ ] Add `[BREAKING]` to the PR title if it breaks any API.
- [ ] Update the documentation about your changes in the
[docs](https://github.com/volcengine/verl/tree/main/docs).
- [x] Add CI test(s) if necessary.
…ion (verl-project#1709)

### Checklist Before Starting

- [x] Search for similar PR(s).

### What does this PR do?

Add a visual explanation of the configuration to the documentation

### Checklist Before Submitting

- [x] Read the [Contribute
Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide).
- [x] Apply [pre-commit
checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting).
- [x] Add `[BREAKING]` to the PR title if it breaks any API.
- [x] Update the documentation about your changes in the
[docs](https://github.com/volcengine/verl/tree/main/docs).
- [x] Add CI test(s) if necessary.
Co-authored-by: Bihan  Rana <bihan@Bihans-MacBook-Pro.local>
Co-authored-by: peterschmidt85 <andrey.cheptsov@gmail.com>
…in `trainer` and `utils` (verl-project#1397)

### Checklist Before Starting

- [x] Search for similar PR(s).

### What does this PR do?

* This PR adds doc string for the public methods inside `trainer` and
`utils` module, so that these methods can be reused and referenced
better.
* Two new doc page `PPO Trainer Interface` and `Utilities` were also
provided under the API Reference section.
* Renamed one function `verl.utils._default_compute_score` to
`verl.utils.default_compute_score`, as it was an external function used
by other modules, i.e., trainer and recipe;

<img width="1093" alt="Screenshot 2025-05-26 at 9 20 31 PM"
src="https://github.com/user-attachments/assets/e361e6bd-a33b-426b-85b4-9fe93ab1e398"
/>


### TODO
This is the second of a series of PRs to improve and stabilize the docs
and API. Stacked on top of verl-project#1396
TODO includes adding more useful utility functions to the doc with
improved doc strings.

### Additional Info.

- **Issue Number**: Fixes issue # or discussion # if any.
- **Training**: [Note which backend this PR will affect: FSDP, Megatron,
both, or none]
- **Inference**: [Note which backend this PR will affect: vLLM, SGLang,
both, or none]

### Checklist Before Submitting

- [x] Read the [Contribute
Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide).
- [x] Apply [pre-commit
checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting).
- [ ] Add `[BREAKING]` to the PR title if it breaks any API.
- [x] Update the documentation about your changes in the
[docs](https://github.com/volcengine/verl/tree/main/docs).
- [x] Add CI test(s) if neccessary.

---------

Signed-off-by: Hongpeng Guo <hg5@illinois.edu>
Co-authored-by: H <linhaibin.eric@gmail.com>
…ng purpose (verl-project#1712)

### Checklist Before Starting

- [X] Search for similar PR(s).

### What does this PR do?

- Support logging rollout probs vs. actor probs for debugging purpose
- Support both vllm and sglang async

### High-Level Design

> Demonstrate the high-level design if this PR is complex.

### Specific Changes

> List the specific changes.

### API

> Demonstrate how the API changes if any.

### Usage Example

> Provide usage example(s) for easier usage.

```python
# Add code snippet or script demonstrating how to use this 
```

### Test

> For changes that can not be tested by CI (e.g., algorithm
implementation, new model support), validate by experiment(s) and show
results like training curve plots, evaluatuion results, etc.

### Additional Info.

- **Issue Number**: Fixes issue # or discussion # if any.
- **Training**: [Note which backend this PR will affect: FSDP, Megatron,
both, or none]
- **Inference**: [Note which backend this PR will affect: vLLM, SGLang,
both, or none]

### Checklist Before Submitting

- [ ] Read the [Contribute
Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide).
- [ ] Apply [pre-commit
checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting).
- [ ] Add `[BREAKING]` to the PR title if it breaks any API.
- [ ] Update the documentation about your changes in the
[docs](https://github.com/volcengine/verl/tree/main/docs).
- [ ] Add CI test(s) if necessary.
… utils test (verl-project#1729)

### Checklist Before Starting

- [x] Search for similar PR(s).

### What does this PR do?

Handle comments after verl-project#1397 being merged:

1. Add back `_default_compute_score` API and mark it as deprecated;
2. Fix a broken ci test `ray_utils_test` on `parallel_put`;

### Checklist Before Submitting

- [x] Read the [Contribute
Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide).
- [x] Apply [pre-commit
checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting).
- [ ] Add `[BREAKING]` to the PR title if it breaks any API.
- [ ] Update the documentation about your changes in the
[docs](https://github.com/volcengine/verl/tree/main/docs).
- [ ] Add CI test(s) if necessary.

---------

Signed-off-by: Hongpeng Guo <hg5@illinois.edu>
czx6858 and others added 12 commits May 28, 2025 10:39
### Checklist Before Starting

- [x] Search for similar PR(s).

### What does this PR do?

This PR updates the README.md for the SPIN recipe to improve accuracy
and completeness. Key changes include corrections and additions to the
method description, the inclusion of related Works, and a more concise
introduction.

### High-Level Design

N/A - Focuses on documentation improvements for clarity and accuracy.

### Specific Changes

- Corrected and supplemented the description of the SPIN methodology.
- Inclusion of related Works along with concise introductions to
relevant papers/concepts.
- Refined and clarified the introductory sections of the README.

### API

N/A - Changes are limited to README.md documentation.

### Usage Example

N/A - This PR does not primarily focus on usage examples, but rather on
descriptive content.

```python
# No new standalone code snippets are part of this PR itself.
…l-project#1700)

### What does this PR do?

 Fix Configuration for Micro Batch Size in Megatron's Ref Policy

### High-Level Design
This pull request addresses an issue with the micro batch size
configuration in the ref policy of Megatron. The default
ppo_megatron_trainer.yaml only includes two configurations:
log_prob_micro_batch_size and log_prob_micro_batch_size_per_gpu.

https://github.com/volcengine/verl/blob/54c9b7364c2d188b2ba4107404cfa3c2b446df19/verl/trainer/config/ppo_megatron_trainer.yaml#L119-L120
However, in `megatron_workers.py`, the required configuration is
ref.log_prob_micro_batch_size_per_gpu

https://github.com/volcengine/verl/blob/54c9b7364c2d188b2ba4107404cfa3c2b446df19/verl/workers/megatron_workers.py#L517-L518
or in `megatron_actor.py ` the required configuration is
ref.ppo_micro_batch_size_per_gpu,

https://github.com/volcengine/verl/blob/54c9b7364c2d188b2ba4107404cfa3c2b446df19/verl/workers/actor/megatron_actor.py#L271-L274

which are not directly related to ppo_micro_batch_size.

To resolve this, I have made modifications to the configuration
calculations and added raise ValueError statements to ensure that the
necessary parameters are correctly defined.

This update ensures that the required parameters are properly handled,
preventing runtime errors and improving the overall robustness of the
training process.

### Changes Made:

- Modified the configuration calculations in megatron_workers.py.

- Added raise ValueError statements to check for the presence of
log_prob_micro_batch_size_per_gpu and ppo_micro_batch_size_per_gpu.
…e workloads (verl-project#1617)

### Checklist Before Starting

- [x] Search for similar PR(s).

### What does this PR do?

1. Megatron support dynamic batch size, to rebalance the workloads.
2. Fix missing critic metrics.

### High-Level Design

Follow the FSDP's dynamic batch size.

### Specific Changes

Use the `rearrange_micro_batches` API, but compatible with Megatron VPP
constraints.

```py
vpp_size = mpu.get_virtual_pipeline_model_parallel_world_size()
if vpp_size is not None and vpp_size > 1:
    microbatch_group_size_per_vp_stage = self.tf_config.microbatch_group_size_per_vp_stage
    micro_batches, indices = rearrange_micro_batches(batch=mini_batch.batch, num_batches_devided_by=microbatch_group_size_per_vp_stage, max_token_len=max_token_len)
    assert len(micro_batches) % self.tf_config.microbatch_group_size_per_vp_stage == 0, f"micro_batches {micro_batches} must be divisible by microbatch_group_size_per_vp_stage {microbatch_group_size_per_vp_stage} for megatron backend"
else:
    micro_batches, indices = rearrange_micro_batches(batch=mini_batch.batch, max_token_len=max_token_len)
```

@vermouth1992 please check whether it makes sense.

Megatron's constraint when using interleaving pipeline:

```py
# If the final micro-batch group has fewer micro-batches than pipeline-parallel size,
    # the pipeline will have dependency bubbles.
    final_microbatch_group_size = num_microbatches % config.microbatch_group_size_per_vp_stage
    if 0 < final_microbatch_group_size < pipeline_parallel_size:
        msg = 'The remainder of M (the total micro-batches) divided by N (number of '
        msg += 'contiguous micro-batches in a virtual pipeline stage) should be 0, '
        msg += 'or larger than or equal to the pipeline-parallel size, but it is '
        msg += f'{final_microbatch_group_size}. '
        msg += 'Otherwise, it introduces dependency bubbles in the pipeline '
        msg += 'and reduces throughput.'
        raise RuntimeError(msg)
```

### API

Megatron forward_backward_batch has changed input, and the output has
become a dict, containing original `output` and the `indices` needed for
compute_old_log_probs.

### Usage Example

```bash
    actor_rollout_ref.actor.use_dynamic_bsz=${USE_DYNAMIC_BSZ} \
    actor_rollout_ref.actor.ppo_max_token_len_per_gpu=${ppo_max_token_len_per_gpu} \
    critic.ppo_max_token_len_per_gpu=${forward_max_token_len_per_gpu} \
```

Other models will directly copy the config.

### Test

> For changes that can not be tested by CI (e.g., algorithm
implementation, new model support), validate by experiment(s) and show
results like training curve plots, evaluatuion results, etc.

### Additional Info.

- **Issue Number**: Fixes issue # or discussion # if any.
- **Training**: [Note which backend this PR will affect: FSDP, Megatron,
both, or none]
- **Inference**: [Note which backend this PR will affect: vLLM, SGLang,
both, or none]

### Checklist Before Submitting

- [x] Read the [Contribute
Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide).
- [x] Apply [pre-commit
checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting).
- [x] Add `[BREAKING]` to the PR title if it breaks any API.
- [ ] Update the documentation about your changes in the
[docs](https://github.com/volcengine/verl/tree/main/docs).
- [x] Add CI test(s) if necessary.
Signed-off-by: Jianbing Dong <jianbingd@nvidia.com>
@ETOgaosion ETOgaosion merged commit 4d2b99f into main Jun 8, 2025
5 of 28 checks passed
@ETOgaosion ETOgaosion deleted the integrated-kernel branch June 8, 2025 17:45
@ETOgaosion ETOgaosion restored the integrated-kernel branch June 8, 2025 17:45
@ETOgaosion ETOgaosion deleted the integrated-kernel branch June 9, 2025 02:10
@ETOgaosion ETOgaosion restored the integrated-kernel branch June 9, 2025 02:10
@ETOgaosion ETOgaosion deleted the integrated-kernel branch June 9, 2025 02:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.