[rollout, sglang] feat: support blockwise fp8 rollout#4415
[rollout, sglang] feat: support blockwise fp8 rollout#4415wuxibin89 merged 20 commits intoverl-project:mainfrom
Conversation
There was a problem hiding this comment.
Code Review
This pull request introduces FP8 rollout support for the sglang backend, which is a significant feature enhancement. The changes include adding a new utility file for FP8 quantization, updating the sglang server and rollout worker to handle FP8 configurations, and updating the documentation accordingly. The implementation appears solid. My review focuses on improving maintainability by addressing code duplication and removing unreachable code.
| if weight_block_size is not None: | ||
| if torch.distributed.get_rank() == 0: | ||
| logger.debug(f" Quantizing to FP8 blockwise: {k}") | ||
| param_lp, param_scale = scaled_fp8_blockwise( | ||
| v.to(dtype), | ||
| weight_block_size=weight_block_size, | ||
| ) | ||
| param_scale = param_scale.squeeze(-1) | ||
| weights_quantized.append([k, param_lp]) | ||
| weights_quantized.append([k + "_scale_inv", param_scale]) | ||
| else: | ||
| raise ValueError( | ||
| "Only blockwise quantization is supported. Please set weight_block_size in quant_config" | ||
| ) |
There was a problem hiding this comment.
This else block is unreachable. weight_block_size is checked for None on line 152 before the loop begins, and an exception is raised if it is None. Consequently, the condition weight_block_size is not None on line 163 will always evaluate to true inside the loop, rendering the else branch dead code. Removing the conditional wrapper and the unreachable else block will improve code clarity and maintainability.
if torch.distributed.get_rank() == 0:
logger.debug(f" Quantizing to FP8 blockwise: {k}")
param_lp, param_scale = scaled_fp8_blockwise(
v.to(dtype),
weight_block_size=weight_block_size,
)
param_scale = param_scale.squeeze(-1)
weights_quantized.append([k, param_lp])
weights_quantized.append([k + "_scale_inv", param_scale])| assert sglang.__version__ >= "0.5.5", "sglang>=0.5.5 is required for FP8 quantization" | ||
| FP8_BLOCK_QUANT_KWARGS = { | ||
| "activation_scheme": "dynamic", | ||
| "fmt": "e4m3", | ||
| "quant_method": "fp8", | ||
| "weight_block_size": [128, 128], | ||
| } | ||
| fp8_block_quant_kwargs = dict(FP8_BLOCK_QUANT_KWARGS) |
There was a problem hiding this comment.
The FP8 quantization configuration logic, including the version check and FP8_BLOCK_QUANT_KWARGS dictionary, is duplicated in verl/workers/rollout/sglang_rollout/sglang_rollout.py. To improve maintainability and prevent future inconsistencies, this logic should be centralized. Consider moving FP8_BLOCK_QUANT_KWARGS to verl/utils/sglang/sglang_fp8_utils.py as a constant and creating a helper function there to encapsulate the version check and config creation.
| assert sglang.__version__ >= "0.5.5", "sglang>=0.5.5 is required for FP8 quantization" | ||
| FP8_BLOCK_QUANT_KWARGS = { | ||
| "activation_scheme": "dynamic", | ||
| "fmt": "e4m3", | ||
| "quant_method": "fp8", | ||
| "weight_block_size": [128, 128], | ||
| } | ||
| fp8_block_quant_kwargs = dict(FP8_BLOCK_QUANT_KWARGS) |
There was a problem hiding this comment.
The FP8 quantization configuration logic, including the version check and FP8_BLOCK_QUANT_KWARGS dictionary, is duplicated in verl/workers/rollout/sglang_rollout/async_sglang_server.py. To improve maintainability and prevent future inconsistencies, this logic should be centralized. Consider moving FP8_BLOCK_QUANT_KWARGS to verl/utils/sglang/sglang_fp8_utils.py as a constant and creating a helper function there to encapsulate the version check and config creation.
) ### What does this PR do? This PR introduces FP8 rollout with sglang inference backend in verl. #### Experiments and Outcomes Qwen3-8B-Base Dense Model **Configuration** - DAPO recipe. AIME24 online validation. - SGLang + FSDP - Note that SPMD rollout has been deprecated, so we removed the FP8 SPMD rollout. - Prompt batch size 32, n=16. - Rollout batch size: 32\*3*16 - Train_batch_size & ppo_mini_batch_size 32 - Max response length 20K - Token-level TIS, C=2 - 8*H100 - verlai/verl:sgl055.latest **Accuracy** With TIS, FP8 rollout aligns with BF16 <img width="1460" height="782" alt="image" src="https://github.com/user-attachments/assets/c8b04c8c-2961-4ad3-9c0a-0d0bee80fd74" /> **Performance** <img width="661" height="661" alt="image" src="https://github.com/user-attachments/assets/967b6889-08b6-407b-8586-86b42a58d0b7" /> <img width="661" height="668" alt="image" src="https://github.com/user-attachments/assets/0b3f4ad1-87e2-428e-ab96-d241944a2b41" /> *purple: BF16, red: FP8 rollout* Results and observations: - FP8 rollout leads to around ~12% rollout speedup ### Checklist Before Starting - [ ] Search for similar PRs. Paste at least one query link here: ... - [x] Format the PR title as `[{modules}] {type}: {description}` (This will be checked by the CI) - `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`, `trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`, `ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`, `env`, `tool`, `ckpt`, `doc`, `data` - If this PR involves multiple modules, separate them with `,` like `[megatron, fsdp, doc]` - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test` - If this PR breaks any API (CLI arguments, config, function signature, etc.), add `[BREAKING]` to the beginning of the title. - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching` ### Test > For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluation results, etc. ### API and Usage Example > Demonstrate how the API changes if any, and provide usage example(s) if possible. ```python # Add code snippet or script demonstrating how to use this ``` ### Design & Code Changes > Demonstrate the high-level design if this PR is complex, and list the specific changes. ### Checklist Before Submitting > [!IMPORTANT] > Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review. - [x] Read the [Contribute Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md). - [x] Apply [pre-commit checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting): `pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always` - [x] Add / Update [the documentation](https://github.com/volcengine/verl/tree/main/docs). - [x] Add unit or end-to-end test(s) to [the CI workflow](https://github.com/volcengine/verl/tree/main/.github/workflows) to cover all the code. If not feasible, explain why: ... - [ ] Once your PR is ready for CI, send a message in [the `ci-request` channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the `verl` Slack workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ). (If not accessible, please try [the Feishu group (飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).) --------- Co-authored-by: Xue Huang <xueh@nvidia.com>
) ### What does this PR do? This PR introduces FP8 rollout with sglang inference backend in verl. #### Experiments and Outcomes Qwen3-8B-Base Dense Model **Configuration** - DAPO recipe. AIME24 online validation. - SGLang + FSDP - Note that SPMD rollout has been deprecated, so we removed the FP8 SPMD rollout. - Prompt batch size 32, n=16. - Rollout batch size: 32\*3*16 - Train_batch_size & ppo_mini_batch_size 32 - Max response length 20K - Token-level TIS, C=2 - 8*H100 - verlai/verl:sgl055.latest **Accuracy** With TIS, FP8 rollout aligns with BF16 <img width="1460" height="782" alt="image" src="https://github.com/user-attachments/assets/c8b04c8c-2961-4ad3-9c0a-0d0bee80fd74" /> **Performance** <img width="661" height="661" alt="image" src="https://github.com/user-attachments/assets/967b6889-08b6-407b-8586-86b42a58d0b7" /> <img width="661" height="668" alt="image" src="https://github.com/user-attachments/assets/0b3f4ad1-87e2-428e-ab96-d241944a2b41" /> *purple: BF16, red: FP8 rollout* Results and observations: - FP8 rollout leads to around ~12% rollout speedup ### Checklist Before Starting - [ ] Search for similar PRs. Paste at least one query link here: ... - [x] Format the PR title as `[{modules}] {type}: {description}` (This will be checked by the CI) - `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`, `trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`, `ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`, `env`, `tool`, `ckpt`, `doc`, `data` - If this PR involves multiple modules, separate them with `,` like `[megatron, fsdp, doc]` - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test` - If this PR breaks any API (CLI arguments, config, function signature, etc.), add `[BREAKING]` to the beginning of the title. - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching` ### Test > For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluation results, etc. ### API and Usage Example > Demonstrate how the API changes if any, and provide usage example(s) if possible. ```python # Add code snippet or script demonstrating how to use this ``` ### Design & Code Changes > Demonstrate the high-level design if this PR is complex, and list the specific changes. ### Checklist Before Submitting > [!IMPORTANT] > Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review. - [x] Read the [Contribute Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md). - [x] Apply [pre-commit checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting): `pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always` - [x] Add / Update [the documentation](https://github.com/volcengine/verl/tree/main/docs). - [x] Add unit or end-to-end test(s) to [the CI workflow](https://github.com/volcengine/verl/tree/main/.github/workflows) to cover all the code. If not feasible, explain why: ... - [ ] Once your PR is ready for CI, send a message in [the `ci-request` channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the `verl` Slack workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ). (If not accessible, please try [the Feishu group (飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).) --------- Co-authored-by: Xue Huang <xueh@nvidia.com>
) ### What does this PR do? This PR introduces FP8 rollout with sglang inference backend in verl. #### Experiments and Outcomes Qwen3-8B-Base Dense Model **Configuration** - DAPO recipe. AIME24 online validation. - SGLang + FSDP - Note that SPMD rollout has been deprecated, so we removed the FP8 SPMD rollout. - Prompt batch size 32, n=16. - Rollout batch size: 32\*3*16 - Train_batch_size & ppo_mini_batch_size 32 - Max response length 20K - Token-level TIS, C=2 - 8*H100 - verlai/verl:sgl055.latest **Accuracy** With TIS, FP8 rollout aligns with BF16 <img width="1460" height="782" alt="image" src="https://github.com/user-attachments/assets/c8b04c8c-2961-4ad3-9c0a-0d0bee80fd74" /> **Performance** <img width="661" height="661" alt="image" src="https://github.com/user-attachments/assets/967b6889-08b6-407b-8586-86b42a58d0b7" /> <img width="661" height="668" alt="image" src="https://github.com/user-attachments/assets/0b3f4ad1-87e2-428e-ab96-d241944a2b41" /> *purple: BF16, red: FP8 rollout* Results and observations: - FP8 rollout leads to around ~12% rollout speedup ### Checklist Before Starting - [ ] Search for similar PRs. Paste at least one query link here: ... - [x] Format the PR title as `[{modules}] {type}: {description}` (This will be checked by the CI) - `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`, `trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`, `ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`, `env`, `tool`, `ckpt`, `doc`, `data` - If this PR involves multiple modules, separate them with `,` like `[megatron, fsdp, doc]` - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test` - If this PR breaks any API (CLI arguments, config, function signature, etc.), add `[BREAKING]` to the beginning of the title. - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching` ### Test > For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluation results, etc. ### API and Usage Example > Demonstrate how the API changes if any, and provide usage example(s) if possible. ```python # Add code snippet or script demonstrating how to use this ``` ### Design & Code Changes > Demonstrate the high-level design if this PR is complex, and list the specific changes. ### Checklist Before Submitting > [!IMPORTANT] > Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review. - [x] Read the [Contribute Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md). - [x] Apply [pre-commit checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting): `pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always` - [x] Add / Update [the documentation](https://github.com/volcengine/verl/tree/main/docs). - [x] Add unit or end-to-end test(s) to [the CI workflow](https://github.com/volcengine/verl/tree/main/.github/workflows) to cover all the code. If not feasible, explain why: ... - [ ] Once your PR is ready for CI, send a message in [the `ci-request` channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the `verl` Slack workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ). (If not accessible, please try [the Feishu group (飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).) --------- Co-authored-by: Xue Huang <xueh@nvidia.com>
* [doc] chore: Update ascend quickstart and docker build guidance doc (#4420)
### What does this PR do?
As title.
### Checklist Before Starting
- [ ] Search for similar PRs. Paste at least one query link here: ...
- [ ] Format the PR title as `[{modules}] {type}: {description}` (This
will be checked by the CI)
- `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`,
`trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`,
`ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`,
`env`, `tool`, `ckpt`, `doc`, `data`
- If this PR involves multiple modules, separate them with `,` like
`[megatron, fsdp, doc]`
- `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test`
- If this PR breaks any API (CLI arguments, config, function signature,
etc.), add `[BREAKING]` to the beginning of the title.
- Example: `[BREAKING][fsdp, megatron] feat: dynamic batching`
### Test
> For changes that can not be tested by CI (e.g., algorithm
implementation, new model support), validate by experiment(s) and show
results like training curve plots, evaluation results, etc.
### API and Usage Example
> Demonstrate how the API changes if any, and provide usage example(s)
if possible.
```python
# Add code snippet or script demonstrating how to use this
```
### Design & Code Changes
> Demonstrate the high-level design if this PR is complex, and list the
specific changes.
### Checklist Before Submitting
> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.
- [ ] Read the [Contribute
Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md).
- [ ] Apply [pre-commit
checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [ ] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [ ] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [ ] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).
(If not accessible, please try [the Feishu group
(飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)
* [sglang] feat: retires sglang spmd mode in the codebase (#4422)
### What does this PR do?
Retires the legacy SGLang SPMD rollout path and makes async/server mode
the only supported backend for SGLang. The PR removes the old
`SGLangRollout` class, its helpers, tests, and recipes, and updates all
docs, scripts, and CI references so they speak only to the async HTTP
adapter (`ServerAdapter`).
### Checklist Before Starting
- [ ] Search for similar PRs. Paste at least one query link here: _N/A
(follow-up to the vLLM SPMD removal)._
- [ ] Format the PR title as `[sglang, rollout, trainer, recipe, ci,
doc] refactor: remove SGLang SPMD rollout`
### Test
### API and Usage Example
Same as vLLM: configs/scripts must set
`actor_rollout_ref.rollout.mode=async` and rely on the HTTP server.
Example:
```bash
python -m verl.trainer.main_ppo \
... \
actor_rollout_ref.rollout.name=sglang \
actor_rollout_ref.rollout.mode=async \
...
```
### Design & Code Changes
- Deleted the `SGLangRollout` class and associated helpers from
`verl/workers/rollout/sglang_rollout/sglang_rollout.py`, keeping only
the async `ServerAdapter`. Cleared its registry entries, configs, and
guards the same way as the vLLM PR.
- Removed SGLang SPMD-specific tests
(`tests/workers/rollout/test_sglang_*`) and CI steps in
`.github/workflows/sgl.yml`, plus any lint exclusions that referenced
those files.
- Updated recipes/examples/e2e scripts that referenced SGLang rollout to
hardcode `rollout.mode=async`, drop sync branches, and set
`return_raw_chat` (mirroring the vLLM cleanup).
### Checklist Before Submitting
- [ ] Read the [Contribute
Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md).
- [ ] Apply [pre-commit
checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting).
- [x] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [x] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: _Removed the
obsolete SGLang SPMD jobs; async workflows remain covered._
- [ ] Once your PR is ready for CI, notify the `ci-request` channel (or
Feishu group).
* [fsdp] feat: update NPU fused kernels for Qwen3 moe block (#4406)
### What does this PR do?
This PR optimizes Qwen3-MoE model training performance on Ascend NPU
devices. It optimizes the implementation of **GMM (Grouped Matmul)** and
integrates fused **permute/unpermute** kernels, achieving a 20%+
training speedup.
Key changes:
1. Added NPU GMM kernel for backward `dw`.
2. Added `npu_moe_token_permute` and `npu_moe_token_unpermute` fused
kernels.
3. Unified GMM function for Qwen3-VL and Qwen3-MoE.
4. Reduced transpose operators in expert weight stacking.
### Checklist Before Starting
- [ ] Search for similar PRs. Paste at least one query link here:
https://github.com/volcengine/verl/pull/3221
- [ ] Format the PR title as `[{modules}] {type}: {description}` (This
will be checked by the CI)
- `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`,
`trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`,
`ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`,
`env`, `tool`, `ckpt`, `doc`, `data`
- If this PR involves multiple modules, separate them with `,` like
`[megatron, fsdp, doc]`
- `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test`
- If this PR breaks any API (CLI arguments, config, function signature,
etc.), add `[BREAKING]` to the beginning of the title.
- Example: `[BREAKING][fsdp, megatron] feat: dynamic batching`
### Test
> For changes that can not be tested by CI (e.g., algorithm
implementation, new model support), validate by experiment(s) and show
results like training curve plots, evaluation results, etc.
Tested with Qwen3-30B-A3B (FSDP, sp=8) on 64 Ascend A2 NPUs.
Baseline:

With optimized fusion kernels:

<!DOCTYPE html><p cid="n349" mdtype="paragraph" class="md-end-block
md-p" style="box-sizing: border-box; line-height: inherit; orphans: 4;
margin: 0.8em 0px; white-space: pre-wrap; position: relative;
caret-color: rgb(51, 51, 51); color: rgb(51, 51, 51); font-family:
"Open Sans", "Clear Sans", "Helvetica
Neue", Helvetica, Arial, "Segoe UI Emoji", sans-serif;
font-size: 16px; font-style: normal; font-variant-caps: normal;
font-weight: 400; letter-spacing: normal; text-align: start;
text-indent: 0px; text-transform: none; word-spacing: 0px;
-webkit-text-stroke-width: 0px; background-color: rgb(255, 255, 255);
text-decoration: none;"><span md-inline="strong" class="md-pair-s"
style="box-sizing: border-box;"><strong style="box-sizing:
border-box;"><span md-inline="plain" class="md-plain" style="box-sizing:
border-box;">Performance results (step 1)</span></strong></span><span
md-inline="plain" class="md-plain" style="box-sizing:
border-box;">:</span></p><figure class="md-table-fig" cid="n350"
mdtype="table" style="box-sizing: border-box; margin: 1.2em 0px;
overflow-x: auto; max-width: calc(100% + 16px); padding: 0px; cursor:
default; caret-color: rgb(51, 51, 51); color: rgb(51, 51, 51);
font-family: "Open Sans", "Clear Sans",
"Helvetica Neue", Helvetica, Arial, "Segoe UI
Emoji", sans-serif; font-size: 16px; font-style: normal;
font-variant-caps: normal; font-weight: 400; letter-spacing: normal;
orphans: auto; text-align: start; text-indent: 0px; text-transform:
none; white-space: normal; widows: auto; word-spacing: 0px;
-webkit-text-stroke-width: 0px; background-color: rgb(255, 255, 255);
text-decoration: none;">
Experiment | gen(s) | old_log_prob(s) | update_actor(s) | step(s)
-- | -- | -- | -- | --
Baseline | 1180.1 | 71.3 | 152.5 | 1406.9
Fused (This PR) | 1167.5 | 58.6 | 111.1 | 1340.9
</figure>
**Precision comparison:**
<img width="664" height="360" alt="image"
src="https://github.com/user-attachments/assets/5e6629c4-31b5-49ec-97cf-d5e4e6beb69c"
/>
### API and Usage Example
> Demonstrate how the API changes if any, and provide usage example(s)
if possible.
```python
# Add code snippet or script demonstrating how to use this
```
### Design & Code Changes
> Demonstrate the high-level design if this PR is complex, and list the
specific changes.
### Checklist Before Submitting
> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.
- [x] Read the [Contribute
Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md).
- [x] Apply [pre-commit
checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [x] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [x] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [x] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).
(If not accessible, please try [the Feishu group
(飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)
* [misc] refactor: clean up unused sharding manager (#4439)
### What does this PR do?
As per title.
### Checklist Before Starting
- [x] Search for similar PRs. Paste at least one query link here: ...
- [x] Format the PR title as `[{modules}] {type}: {description}` (This
will be checked by the CI)
- `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`,
`trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`,
`ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`,
`env`, `tool`, `ckpt`, `doc`, `data`
- If this PR involves multiple modules, separate them with `,` like
`[megatron, fsdp, doc]`
- `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test`
- If this PR breaks any API (CLI arguments, config, function signature,
etc.), add `[BREAKING]` to the beginning of the title.
- Example: `[BREAKING][fsdp, megatron] feat: dynamic batching`
### Test
> For changes that can not be tested by CI (e.g., algorithm
implementation, new model support), validate by experiment(s) and show
results like training curve plots, evaluation results, etc.
### API and Usage Example
> Demonstrate how the API changes if any, and provide usage example(s)
if possible.
```python
# Add code snippet or script demonstrating how to use this
```
### Design & Code Changes
> Demonstrate the high-level design if this PR is complex, and list the
specific changes.
### Checklist Before Submitting
> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.
- [x] Read the [Contribute
Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md).
- [x] Apply [pre-commit
checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [ ] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [ ] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [ ] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).
(If not accessible, please try [the Feishu group
(飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)
* [hardware] chore: clean npu_patch (#4436)
### What does this PR do?
> Add **concise** overview of what this PR aims to achieve or
accomplish. Reference related GitHub issues and PRs that help with the
review.
With an increasing number of models on verl being adapted to Ascend
NPUs, along with the upgrade of the transformers version (currently
`v4.57.3`), the content in the `npu_patch.py` has grown significantly.
Its organization has gradually become cluttered, lacking a unified
naming format, and a small portion of the patches have already become
obsolete (the original functions no longer exist).
This PR aims to address the above issues, ensuring that `npu_patch.py`
remains clean and well-organized.
### Checklist Before Starting
- [ ] Search for similar PRs. Paste at least one query link here: ...
- [ ] Format the PR title as `[{modules}] {type}: {description}` (This
will be checked by the CI)
- `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`,
`trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`,
`ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`,
`env`, `tool`, `ckpt`, `doc`, `data`
- If this PR involves multiple modules, separate them with `,` like
`[megatron, fsdp, doc]`
- `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test`
- If this PR breaks any API (CLI arguments, config, function signature,
etc.), add `[BREAKING]` to the beginning of the title.
- Example: `[BREAKING][fsdp, megatron] feat: dynamic batching`
### Test
> For changes that can not be tested by CI (e.g., algorithm
implementation, new model support), validate by experiment(s) and show
results like training curve plots, evaluation results, etc.
Not related.
### API and Usage Example
> Demonstrate how the API changes if any, and provide usage example(s)
if possible.
Not related.
### Design & Code Changes
> Demonstrate the high-level design if this PR is complex, and list the
specific changes.
Not related.
### Checklist Before Submitting
> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.
- [ ] Read the [Contribute
Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md).
- [ ] Apply [pre-commit
checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [ ] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [ ] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [ ] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).
(If not accessible, please try [the Feishu group
(飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)
* [misc] fix: fix memory leakage when initializing multiple tools (#4430)
### What does this PR do?
Prevents background event-loop leaks in the MCP tool loader.
`initialize_tools_from_config` now lazily spawns the auxiliary asyncio
loop only when an MCP tool is present, and always stops/closes the loop
on exit so the worker process doesn’t hold onto threads or loop
resources.
### Checklist Before Starting
- [ ] Search for similar PRs. Paste at least one query link here: _N/A
(targeted fix for the MCP tool registry)._
- [ ] Format the PR title as `[tool, misc] fix: clean up MCP tool event
loop`
### Test
Not applicable (behavioral fix in tool initialization; existing
tool-based tests still cover the code path).
### API and Usage Example
No API surface changes; the existing YAML config flow stays the same.
Example usage:
```python
from verl.tools.utils.tool_registry import initialize_tools_from_config
tool_instances = initialize_tools_from_config("configs/tools.yaml")
```
### Design & Code Changes
- Replaced the eager `asyncio.new_event_loop()` creation with a lazy
`get_mcp_event_loop()` helper so purely native tool configs no longer
spawn threads.
- Simplified coroutine execution to always go through the
lazily-initialized loop.
- Added robust cleanup in the `finally` block: stop the loop, join the
thread, and call `loop.close()` to release resources (fixing the leak).
### Checklist Before Submitting
- [ ] Read the [Contribute
Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md).
- [ ] Apply [pre-commit
checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting).
- [ ] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [ ] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows).
If not feasible, explain why: _Tool loading already covered by existing
smoke tests._
- [ ] Once your PR is ready for CI, notify the `ci-request` channel (or
Feishu group).
* [trainer, vllm, megatron, recipe] feat: one/two step off async on-policy distillation recipe (#3975)
### What does this PR do?
This PR provides a simple implementation of one and two step off async
knowledge distillation with megatron and vllm backend.
### Checklist Before Starting
- [x] Search for similar PRs. Paste at least one query link here: ...
- [x] Format the PR title as `[{modules}] {type}: {description}` (This
will be checked by the CI)
- `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`,
`trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`,
`ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`,
`env`, `tool`, `ckpt`, `doc`, `data`
- If this PR involves multiple modules, separate them with `,` like
`[megatron, fsdp, doc]`
- `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test`
- If this PR breaks any API (CLI arguments, config, function signature,
etc.), add `[BREAKING]` to the beginning of the title.
- Example: `[BREAKING][fsdp, megatron] feat: dynamic batching`
### Test
> For changes that can not be tested by CI (e.g., algorithm
implementation, new model support), validate by experiment(s) and show
results like training curve plots, evaluation results, etc.
### API and Usage Example
> Demonstrate how the API changes if any, and provide usage example(s)
if possible.
```python
# Add code snippet or script demonstrating how to use this
```
### Design & Code Changes
> Demonstrate the high-level design if this PR is complex, and list the
specific changes.
### Checklist Before Submitting
> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.
- [x] Read the [Contribute
Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md).
- [x] Apply [pre-commit
checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [x] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [ ] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [ ] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).
(If not accessible, please try [the Feishu group
(飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)
---------
Co-authored-by: funrunding <furunding@163.com>
* [misc] feat: optimize performance of index_select_tensor_dict (#4444)
### What does this PR do?
- Optimize the performance of index_select_tensor_dict by unbind first
then index.
Results on chat count task
<img width="360" height="253" alt="image"
src="https://github.com/user-attachments/assets/cdc611e4-6006-4bc2-85a1-a49ec5f63b37"
/>
<img width="363" height="254" alt="image"
src="https://github.com/user-attachments/assets/0a89cbc0-325f-4f00-b986-1fac75a74aa5"
/>
There is still a gap and we need to keep investigation
### Checklist Before Starting
- [ ] Search for similar PRs. Paste at least one query link here: ...
- [ ] Format the PR title as `[{modules}] {type}: {description}` (This
will be checked by the CI)
- `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`,
`trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`,
`ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`,
`env`, `tool`, `ckpt`, `doc`, `data`
- If this PR involves multiple modules, separate them with `,` like
`[megatron, fsdp, doc]`
- `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test`
- If this PR breaks any API (CLI arguments, config, function signature,
etc.), add `[BREAKING]` to the beginning of the title.
- Example: `[BREAKING][fsdp, megatron] feat: dynamic batching`
### Test
> For changes that can not be tested by CI (e.g., algorithm
implementation, new model support), validate by experiment(s) and show
results like training curve plots, evaluation results, etc.
### API and Usage Example
> Demonstrate how the API changes if any, and provide usage example(s)
if possible.
```python
# Add code snippet or script demonstrating how to use this
```
### Design & Code Changes
> Demonstrate the high-level design if this PR is complex, and list the
specific changes.
### Checklist Before Submitting
> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.
- [ ] Read the [Contribute
Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md).
- [ ] Apply [pre-commit
checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [ ] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [ ] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [ ] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).
(If not accessible, please try [the Feishu group
(飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)
* [ci] test: Disable ReMax training test in vllm workflow (#4445)
* [rollout] fix: RolloutConfig should support repetition_penalty config… (#4398)
### What does this PR do?
> Add **concise** overview of what this PR aims to achieve or
accomplish. Reference related GitHub issues and PRs that help with the
review.
During the rollout process, this configuration will be read from
RolloutConfig. If it cannot be read, the default value is 1.0. However,
if you try to configure this parameter in the YAML file, an error will
occur because there is no such member in RolloutConfig.
<img width="1040" height="749" alt="image"
src="https://github.com/user-attachments/assets/921eb8ae-35b6-49db-a22c-d40f31e6f59b"
/>
### Checklist Before Starting
- [x] Search for similar PRs. Paste at least one query link here: ...
- [x] Format the PR title as `[{modules}] {type}: {description}` (This
will be checked by the CI)
- `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`,
`trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`,
`ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`,
`env`, `tool`, `ckpt`, `doc`, `data`
- If this PR involves multiple modules, separate them with `,` like
`[megatron, fsdp, doc]`
- `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test`
- If this PR breaks any API (CLI arguments, config, function signature,
etc.), add `[BREAKING]` to the beginning of the title.
- Example: `[BREAKING][fsdp, megatron] feat: dynamic batching`
### Test
> For changes that can not be tested by CI (e.g., algorithm
implementation, new model support), validate by experiment(s) and show
results like training curve plots, evaluation results, etc.
### API and Usage Example
> Demonstrate how the API changes if any, and provide usage example(s)
if possible.
```python
# Add code snippet or script demonstrating how to use this
```
### Design & Code Changes
> Demonstrate the high-level design if this PR is complex, and list the
specific changes.
### Checklist Before Submitting
> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.
- [x] Read the [Contribute
Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md).
- [x] Apply [pre-commit
checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [x] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [x] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [x] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).
(If not accessible, please try [the Feishu group
(飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)
---------
Co-authored-by: Li Zuming <lizuming@huawei.com>
* [recipe] feat: add fully async comm between rollout and sim node in disagg mode (#4433)
### What does this PR do?
This PR provides a new feature on VLA recipe, fully async communication
between rollout and simulation nodes in disaggregate mode. This PR
allows two overlap optimization:
1. Overlapping between simulation tool **reset** and rollout weights
**update**. Now we can reset the simulation status and update
actor_rollout weights simultaneously, without sync wait.
*This feature saves costs close to 8-step execution time.*
2. Overlapping **communication** between simulation steps and rollout
steps, aka eliminating the transfer overhead among nodes. Now in each
step, different pipeline stages (see PR #3918 ) will be executed
independently, thus there will be no sync waits among stages, so that
the relatively longer steps (simulation steps currently) can be executed
continuously.
*This feature saves about 12% costs in each VLA RL step, depending on
actual execution time of simulation steps.*
**Details about communication overlapping**
- At first our disaggregate mode on VLA show workflows as follow:

- As in disaggregate mode, rollout steps are executed on local nodes and
simulation steps are executed on remote nodes.
- Low GPU utilization, lots of resources are wasted.
- So we implement the pipeline execute mode in PR #3918 , then the
workflow (2 stages for instance) shows as follows:

- We can see that the rollout steps (R) and simulation steps (S) are
executed partially overlapped (pipeline stage 0 and 1), and simulation
steps take much longer time than rollout ones.
- There are still parts of wasted time (unnecessary data transfer
delays) because of the sequential workflow. The reason is shown as
follows:

- The sync waits among pipeline stages cannot be eliminated due to
sequential workflow: although steps are executed in various nodes, they
need to be synchronized after each step, which causes communication cost
and is unnecessary.
- Then we implemented async communication pipelines to overlap the
communication time by different execution time between rollout and
simulation steps. The workflows and execution results are shown as
follows:

---
### Checklist Before Starting
- [ ] Search for similar PRs. Paste at least one query link here: ...
- [ ] Format the PR title as `[{modules}] {type}: {description}` (This
will be checked by the CI)
- `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`,
`trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`,
`ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`,
`env`, `tool`, `ckpt`, `doc`, `data`
- If this PR involves multiple modules, separate them with `,` like
`[megatron, fsdp, doc]`
- `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test`
- If this PR breaks any API (CLI arguments, config, function signature,
etc.), add `[BREAKING]` to the beginning of the title.
- Example: `[BREAKING][fsdp, megatron] feat: dynamic batching`
### Test
```shell
# you can use `ray.timeline` tool for profiling in recipe/vla/main_ppo.py
bash recipe/vla/run_simpleVLA_isaac_disagg.sh
```
Profile results of reset overlapping
<img width="1420" height="232" alt="image"
src="https://github.com/user-attachments/assets/dd98ea70-fa3f-4938-a02e-a44f64a0d1e9"
/>
Profile results of communication overlapping
<img width="1058" height="228" alt="image"
src="https://github.com/user-attachments/assets/a9274864-ccd4-42f9-b5c4-1277bc0c1317"
/>
* [misc] feat: optimize nested tensor index (#4447)
### What does this PR do?
- As title. The same as PR: https://github.com/volcengine/verl/pull/4444
### Checklist Before Starting
- [ ] Search for similar PRs. Paste at least one query link here: ...
- [ ] Format the PR title as `[{modules}] {type}: {description}` (This
will be checked by the CI)
- `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`,
`trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`,
`ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`,
`env`, `tool`, `ckpt`, `doc`, `data`
- If this PR involves multiple modules, separate them with `,` like
`[megatron, fsdp, doc]`
- `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test`
- If this PR breaks any API (CLI arguments, config, function signature,
etc.), add `[BREAKING]` to the beginning of the title.
- Example: `[BREAKING][fsdp, megatron] feat: dynamic batching`
### Test
> For changes that can not be tested by CI (e.g., algorithm
implementation, new model support), validate by experiment(s) and show
results like training curve plots, evaluation results, etc.
### API and Usage Example
> Demonstrate how the API changes if any, and provide usage example(s)
if possible.
```python
# Add code snippet or script demonstrating how to use this
```
### Design & Code Changes
> Demonstrate the high-level design if this PR is complex, and list the
specific changes.
### Checklist Before Submitting
> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.
- [ ] Read the [Contribute
Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md).
- [ ] Apply [pre-commit
checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [ ] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [ ] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [ ] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).
(If not accessible, please try [the Feishu group
(飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)
* [model] feat: add qwen3-4b grpo script on ASCEND NPU A3 (#4432)
### What does this PR do?
add examples/grpo_trainer/run_qwen3-4b_npu.sh
### Test
The figure below shows the comparison curve of the critic_reward_mean
metric.
<img width="1790" height="948" alt="image"
src="https://github.com/user-attachments/assets/01df9bed-f888-470d-936c-eb335acd57e9"
/>
### API and Usage Example
```sh
# install jemalloc
sudo apt update
sudo apt install libjemalloc2
# run bash
bash examples/grpo_trainer/run_qwen3-4b_npu.sh
```
* [megatron] fix: Remove Deprecated Megatron Optimizer Args (#4396)
### What does this PR do?
This PR removes the deprecated arguments during Megatron optimizer
building for compatibility with the latest Megatron, see
[https://github.com/NVIDIA/Megatron-LM/blob/main/megatron/core/optimizer/__init__.py#L442](https://github.com/NVIDIA/Megatron-LM/blob/main/megatron/core/optimizer/__init__.py#L442).
These arguments are never used by verl so they can be safely removed.
This solves the following exception with the latest Megatron:
```
File "/usr/lib/python3.12/concurrent/futures/_base.py", line 456, in result
return self.__get_result()
^^^^^^^^^^^^^^^^^^^
File "/usr/lib/python3.12/concurrent/futures/_base.py", line 401, in __get_result
raise self._exception
^^^^^^^^^^^^^^^^^^^^^
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/root/verl/verl/single_controller/ray/base.py", line 825, in func
return getattr(self.worker_dict[key], name)(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/root/verl/verl/single_controller/base/decorator.py", line 451, in inner
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/root/verl/verl/utils/transferqueue_utils.py", line 187, in dummy_inner
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/root/verl/verl/workers/megatron_workers.py", line 573, in init_model
) = self._build_model_optimizer(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/root/verl/verl/workers/megatron_workers.py", line 464, in _build_model_optimizer
actor_optimizer = get_megatron_optimizer(model=actor_module, config=optim_config_megatron)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/root/verl/verl/utils/megatron/optimizer.py", line 71, in get_megatron_optimizer
return get_megatron_optimizer_native(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
TypeError: get_megatron_optimizer() got an unexpected keyword argument 'no_weight_decay_cond'
```
### Checklist Before Starting
- [x] Search for similar PRs. Paste at least one query link here: ...
- [x] Format the PR title as `[{modules}] {type}: {description}` (This
will be checked by the CI)
- `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`,
`trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`,
`ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`,
`env`, `tool`, `ckpt`, `doc`, `data`
- If this PR involves multiple modules, separate them with `,` like
`[megatron, fsdp, doc]`
- `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test`
- If this PR breaks any API (CLI arguments, config, function signature,
etc.), add `[BREAKING]` to the beginning of the title.
- Example: `[BREAKING][fsdp, megatron] feat: dynamic batching`
### Test
> For changes that can not be tested by CI (e.g., algorithm
implementation, new model support), validate by experiment(s) and show
results like training curve plots, evaluation results, etc.
### API and Usage Example
> Demonstrate how the API changes if any, and provide usage example(s)
if possible.
```python
# Add code snippet or script demonstrating how to use this
```
### Design & Code Changes
> Demonstrate the high-level design if this PR is complex, and list the
specific changes.
### Checklist Before Submitting
> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.
- [x] Read the [Contribute
Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md).
- [ ] Apply [pre-commit
checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [ ] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [ ] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [ ] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).
(If not accessible, please try [the Feishu group
(飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)
* [megatron] fix: respect `use_distributed_optimizer` in config (#4392)
### What does this PR do?
Currently `use_distributed_optimizer` is hardcoded as `optim_args`,
which is unexpected since `use_distributed_optimizer` is a config for
`megatron`.
### Checklist Before Starting
- [X] Search for similar PRs. Paste at least one query link here: ...
- [X] Format the PR title as `[{modules}] {type}: {description}` (This
will be checked by the CI)
- `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`,
`trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`,
`ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`,
`env`, `tool`, `ckpt`, `doc`, `data`
- If this PR involves multiple modules, separate them with `,` like
`[megatron, fsdp, doc]`
- `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test`
- If this PR breaks any API (CLI arguments, config, function signature,
etc.), add `[BREAKING]` to the beginning of the title.
- Example: `[BREAKING][fsdp, megatron] feat: dynamic batching`
### Test
> For changes that can not be tested by CI (e.g., algorithm
implementation, new model support), validate by experiment(s) and show
results like training curve plots, evaluation results, etc.
### API and Usage Example
> Demonstrate how the API changes if any, and provide usage example(s)
if possible.
```python
# Add code snippet or script demonstrating how to use this
```
### Design & Code Changes
> Demonstrate the high-level design if this PR is complex, and list the
specific changes.
### Checklist Before Submitting
> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.
- [X] Read the [Contribute
Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md).
- [X] Apply [pre-commit
checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [X] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [X] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [X] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).
(If not accessible, please try [the Feishu group
(飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)
Signed-off-by: Hollow Man <hollowman@opensuse.org>
* [recipe, ci] fix: remove batch mode for remote generative reward model (#4448)
### What does this PR do?
Reward loop deprecated batch reward manager. Fix `genrm_remote` recipe
as it used batch reward manager.
### Checklist Before Starting
- [ ] Search for similar PRs. Paste at least one query link here: ...
- [ ] Format the PR title as `[{modules}] {type}: {description}` (This
will be checked by the CI)
- `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`,
`trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`,
`ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`,
`env`, `tool`, `ckpt`, `doc`, `data`
- If this PR involves multiple modules, separate them with `,` like
`[megatron, fsdp, doc]`
- `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test`
- If this PR breaks any API (CLI arguments, config, function signature,
etc.), add `[BREAKING]` to the beginning of the title.
- Example: `[BREAKING][fsdp, megatron] feat: dynamic batching`
### Test
> For changes that can not be tested by CI (e.g., algorithm
implementation, new model support), validate by experiment(s) and show
results like training curve plots, evaluation results, etc.
### API and Usage Example
> Demonstrate how the API changes if any, and provide usage example(s)
if possible.
```python
# Add code snippet or script demonstrating how to use this
```
### Design & Code Changes
> Demonstrate the high-level design if this PR is complex, and list the
specific changes.
### Checklist Before Submitting
> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.
- [ ] Read the [Contribute
Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md).
- [ ] Apply [pre-commit
checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [ ] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [ ] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [ ] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).
(If not accessible, please try [the Feishu group
(飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)
* [misc] feat: optimize rearrange_micro_batches (#4451)
* [rollout, sglang] feat: support blockwise fp8 rollout (#4415)
### What does this PR do?
This PR introduces FP8 rollout with sglang inference backend in verl.
#### Experiments and Outcomes
Qwen3-8B-Base Dense Model
**Configuration**
- DAPO recipe. AIME24 online validation.
- SGLang + FSDP
- Note that SPMD rollout has been deprecated, so we removed the FP8 SPMD
rollout.
- Prompt batch size 32, n=16.
- Rollout batch size: 32\*3*16
- Train_batch_size & ppo_mini_batch_size 32
- Max response length 20K
- Token-level TIS, C=2
- 8*H100
- verlai/verl:sgl055.latest
**Accuracy**
With TIS, FP8 rollout aligns with BF16
<img width="1460" height="782" alt="image"
src="https://github.com/user-attachments/assets/c8b04c8c-2961-4ad3-9c0a-0d0bee80fd74"
/>
**Performance**
<img width="661" height="661" alt="image"
src="https://github.com/user-attachments/assets/967b6889-08b6-407b-8586-86b42a58d0b7"
/>
<img width="661" height="668" alt="image"
src="https://github.com/user-attachments/assets/0b3f4ad1-87e2-428e-ab96-d241944a2b41"
/>
*purple: BF16, red: FP8 rollout*
Results and observations:
- FP8 rollout leads to around ~12% rollout speedup
### Checklist Before Starting
- [ ] Search for similar PRs. Paste at least one query link here: ...
- [x] Format the PR title as `[{modules}] {type}: {description}` (This
will be checked by the CI)
- `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`,
`trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`,
`ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`,
`env`, `tool`, `ckpt`, `doc`, `data`
- If this PR involves multiple modules, separate them with `,` like
`[megatron, fsdp, doc]`
- `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test`
- If this PR breaks any API (CLI arguments, config, function signature,
etc.), add `[BREAKING]` to the beginning of the title.
- Example: `[BREAKING][fsdp, megatron] feat: dynamic batching`
### Test
> For changes that can not be tested by CI (e.g., algorithm
implementation, new model support), validate by experiment(s) and show
results like training curve plots, evaluation results, etc.
### API and Usage Example
> Demonstrate how the API changes if any, and provide usage example(s)
if possible.
```python
# Add code snippet or script demonstrating how to use this
```
### Design & Code Changes
> Demonstrate the high-level design if this PR is complex, and list the
specific changes.
### Checklist Before Submitting
> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.
- [x] Read the [Contribute
Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md).
- [x] Apply [pre-commit
checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [x] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [x] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [ ] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).
(If not accessible, please try [the Feishu group
(飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)
---------
Co-authored-by: Xue Huang <xueh@nvidia.com>
* [trainer] feat: model engine sft trainer support vlm model (#4403)
### What does this PR do?
SFT trainer support vlm model
- [x] fsdp engine
- [x] megatron engine
Qwen3-VL-2B-Instruct sft trainer compare on
[llamafactory/pokemon-gpt4o-captions](https://huggingface.co/datasets/llamafactory/pokemon-gpt4o-captions)
<img width="1550" height="620" alt="image"
src="https://github.com/user-attachments/assets/40f79711-a542-4816-89e2-24184c8cb495"
/>
* [trainer] feat: add reward loop config to default config (#4452)
### What does this PR do?
Future PRs will transfer from legacy rm implementation to reward loop
(in both rule-based, genrm, disrm, ...) gradually; this PR adds reward
loop configs to defaults, which inherit the legacy reward model config,
so it will not break any current api.
Specifically, future PRs will:
- align results between reward loop disrm and legacy fsdp/megatron disrm
- deprecate fsdp/megatron disrm, use reward loop disrm as default
- use reward loop rule-based, disrm-based, genrm-based as default
- deprecate legacy reward model config
### Checklist Before Starting
- [ ] Search for similar PRs. Paste at least one query link here: ...
- [ ] Format the PR title as `[{modules}] {type}: {description}` (This
will be checked by the CI)
- `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`,
`trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`,
`ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`,
`env`, `tool`, `ckpt`, `doc`, `data`
- If this PR involves multiple modules, separate them with `,` like
`[megatron, fsdp, doc]`
- `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test`
- If this PR breaks any API (CLI arguments, config, function signature,
etc.), add `[BREAKING]` to the beginning of the title.
- Example: `[BREAKING][fsdp, megatron] feat: dynamic batching`
### Test
> For changes that can not be tested by CI (e.g., algorithm
implementation, new model support), validate by experiment(s) and show
results like training curve plots, evaluation results, etc.
### API and Usage Example
> Demonstrate how the API changes if any, and provide usage example(s)
if possible.
```python
# Add code snippet or script demonstrating how to use this
```
### Design & Code Changes
> Demonstrate the high-level design if this PR is complex, and list the
specific changes.
### Checklist Before Submitting
> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.
- [ ] Read the [Contribute
Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md).
- [ ] Apply [pre-commit
checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [ ] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [ ] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [ ] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).
(If not accessible, please try [the Feishu group
(飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)
* [vllm] feat: support abort generating requests in vllm server (#4453)
### What does this PR do?
This PR adds abort functionality to the vLLM async rollout server,
enabling users to cancel ongoing generation requests efficiently. This
is particularly useful for scenarios where generation needs to be
stopped early (e.g., policy updates, timeout handling, or user
cancellation).
**Key additions:**
- `abort_all_requests()` method to cancel all active generation requests
- `abort_request(request_id)` method to cancel a specific request
- `stop_reason` field in `TokenOutput` to distinguish between completed,
aborted, and other finish states
### Test
The abort functionality has been validated through a standalone test
script that:
1. Starts 8 concurrent generation requests with long prompts
2. Waits 0.5s and calls `abort_all_requests()`
3. Verifies that requests are properly aborted with partial outputs
**Test Results:**
- ✅ Successfully aborted 7 out of 8 requests (1 completed before abort
was triggered)
- ✅ Abort operation completed in **2.27ms** (very fast)
- ✅ All aborted requests returned partial outputs with
`stop_reason="aborted"`
- ✅ Completed request had `stop_reason="completed"`
- ✅ All requests finished without timeout
See full test log in the PR description above.
### API and Usage Example
```python
import ray
from verl.workers.rollout.replica import get_rollout_replica_class
# Initialize vLLM rollout server
rollout_server_class = get_rollout_replica_class("vllm")
server = rollout_server_class(replica_rank=0, config=rollout_config, ...)
await server.init_standalone()
# Start generation
ref = server._server_handle.generate.remote(
request_id="req_123",
prompt_ids=[1, 2, 3],
sampling_params={"temperature": 1.0},
)
# Abort all requests
result = await server.abort_all_requests()
# Returns: {"aborted_count": 1, "request_ids": ["req_123"], ...}
# Or abort a specific request
result = await server.abort_request("req_123")
# Returns: {"aborted": True, "request_id": "req_123"}
# Check stop reason in output
output = ray.get(ref)
print(output.stop_reason) # "aborted" or "completed"
```
### Design & Code Changes
**1. Added `stop_reason` field to `TokenOutput` protocol**
(`verl/workers/rollout/replica.py`):
- New optional field to track why generation stopped
- Values: `"completed"`, `"aborted"`, or other finish reasons
**2. Implemented abort methods in `vLLMHttpServerBase`**
(`verl/workers/rollout/vllm_rollout/vllm_async_server.py`):
- `abort_all_requests()`: Aborts all active requests by:
- Fetching all request states from the output processor
- Creating abort outputs and putting them into request queues
- Calling abort on both output processor and engine core
- `abort_request(request_id)`: Aborts a specific request using similar
logic
**3. Implemented abort methods in `vLLMReplica`**:
- Distributes abort calls across all server instances
- Aggregates results from multiple servers
**4. Added stop reason mapping**:
- Maps vLLM's `finish_reason` to verl's `stop_reason`:
- `"abort"` → `"aborted"`
- `"stop"` or `"length"` → `"completed"`
- Other reasons pass through as-is
**5. Added comprehensive test**
(`tests/workers/rollout/rollout_vllm/test_vllm_abort.py`):
- Standalone script to validate abort functionality
- Tests concurrent request abortion and partial output handling
### Checklist Before Submitting
- [x] Read the [Contribute
Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md).
- [ ] Apply [pre-commit
checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [ ] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [x] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: Added standalone
test script in `tests/workers/rollout/rollout_vllm/test_vllm_abort.py`
- [ ] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).
* [ci] chore: cleanup some ci workflow (#4459)
### What does this PR do?
As title
* [trainer] feat: allow override for reward_manager_worker in agent loop (#4423)
### What does this PR do?
Allow subclass to set reward_manager_worker
### Checklist Before Starting
- [ ] Search for similar PRs. Paste at least one query link here: ...
- [ ] Format the PR title as `[{modules}] {type}: {description}` (This
will be checked by the CI)
- `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`,
`trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`,
`ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`,
`env`, `tool`, `ckpt`, `doc`, `data`
- If this PR involves multiple modules, separate them with `,` like
`[megatron, fsdp, doc]`
- `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test`
- If this PR breaks any API (CLI arguments, config, function signature,
etc.), add `[BREAKING]` to the beginning of the title.
- Example: `[BREAKING][fsdp, megatron] feat: dynamic batching`
### Test
> For changes that can not be tested by CI (e.g., algorithm
implementation, new model support), validate by experiment(s) and show
results like training curve plots, evaluation results, etc.
### API and Usage Example
> Demonstrate how the API changes if any, and provide usage example(s)
if possible.
```python
# Add code snippet or script demonstrating how to use this
```
### Design & Code Changes
> Demonstrate the high-level design if this PR is complex, and list the
specific changes.
### Checklist Before Submitting
> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.
- [ ] Read the [Contribute
Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md).
- [ ] Apply [pre-commit
checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [ ] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [ ] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [ ] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).
(If not accessible, please try [the Feishu group
(飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)
Co-authored-by: Ryan Li <rynli@amazon.com>
Co-authored-by: Yuyang Ding <61647442+yyDing1@users.noreply.github.com>
* [model] feat: enhances TrainingWorker (#4461)
### What does this PR do?
- Support tensordict make iterator with nested tensor
- improve TrainingWorker by setting default engineering args in the
worker init
- Add a unit test of TrainingWorker
- fix #4004
### Checklist Before Starting
- [ ] Search for similar PRs. Paste at least one query link here: ...
- [ ] Format the PR title as `[{modules}] {type}: {description}` (This
will be checked by the CI)
- `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`,
`trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`,
`ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`,
`env`, `tool`, `ckpt`, `doc`, `data`
- If this PR involves multiple modules, separate them with `,` like
`[megatron, fsdp, doc]`
- `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test`
- If this PR breaks any API (CLI arguments, config, function signature,
etc.), add `[BREAKING]` to the beginning of the title.
- Example: `[BREAKING][fsdp, megatron] feat: dynamic batching`
### Test
> For changes that can not be tested by CI (e.g., algorithm
implementation, new model support), validate by experiment(s) and show
results like training curve plots, evaluation results, etc.
### API and Usage Example
> Demonstrate how the API changes if any, and provide usage example(s)
if possible.
```python
# Add code snippet or script demonstrating how to use this
```
### Design & Code Changes
> Demonstrate the high-level design if this PR is complex, and list the
specific changes.
### Checklist Before Submitting
> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.
- [ ] Read the [Contribute
Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md).
- [ ] Apply [pre-commit
checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [ ] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [ ] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [ ] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).
(If not accessible, please try [the Feishu group
(飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)
* [recipe] feat: Modify the way of obtaining default_runtime_env (#4468)
* [rollout] fix: mlflow consecutive slashes (#4446)
### What does this PR do?
MLFlow would not work with metrics that have // in its item name, it
will yield error like so:
```
File "/usr/local/lib/python3.12/dist-packages/mlflow/tracking/client.py", line 2511, in log_batch
return self._tracking_client.log_batch(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/mlflow/telemetry/track.py", line 30, in wrapper
result = func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/mlflow/tracking/_tracking_service/client.py", line 581, in log_batch
self.store.log_batch(run_id=run_id, metrics=metrics_batch, params=[], tags=[])
File "/usr/local/lib/python3.12/dist-packages/mlflow/store/tracking/rest_store.py", line 906, in log_batch
self._call_endpoint(LogBatch, req_body)
File "/usr/local/lib/python3.12/dist-packages/mlflow/store/tracking/rest_store.py", line 208, in _call_endpoint
return call_endpoint(
^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/mlflow/utils/rest_utils.py", line 596, in call_endpoint
response = verify_rest_response(response, endpoint)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/mlflow/utils/rest_utils.py", line 315, in verify_rest_response
raise RestException(json.loads(response.text))
mlflow.exceptions.RestException: INVALID_PARAMETER_VALUE: Invalid value "val-aux//reward/mean_at_1" for parameter 'metrics[0].name' supplied: Names may be treated as files in certain cases, and must not resolve to other names when treated as such. This name would resolve to 'val-aux/reward/mean_at_1'
```
### Test
Added testing for this behavior into `TestMlflowLoggingAdapter`.
### Design & Code Changes
Used regular expression to parse and substituted multiple slashes
pattern
* [fsdp] fix: reward model also reads override config attn_implementation (#4458)
### What does this PR do?
> Add **concise** overview of what this PR aims to achieve or
accomplish. Reference related GitHub issues and PRs that help with the
review.
### Checklist Before Starting
- [x] Search for similar PRs. Paste at least one query link here: ...
- https://github.com/volcengine/verl/pull/3978 missing the reward one
### Test
only need to test in CI
### Checklist Before Submitting
> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.
- [x] Read the [Contribute
Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md).
- [x] Apply [pre-commit
checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [x] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [x] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [x] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).
(If not accessible, please try [the Feishu group
(飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_…
What does this PR do?
This PR introduces FP8 rollout with sglang inference backend in verl.
Experiments and Outcomes
Qwen3-8B-Base Dense Model
Configuration
Accuracy
With TIS, FP8 rollout aligns with BF16

Performance


purple: BF16, red: FP8 rollout
Results and observations:
Checklist Before Starting
[{modules}] {type}: {description}(This will be checked by the CI){modules}includefsdp,megatron,sglang,vllm,rollout,trainer,ci,training_utils,recipe,hardware,deployment,ray,worker,single_controller,misc,perf,model,algo,env,tool,ckpt,doc,data,like[megatron, fsdp, doc]{type}is infeat,fix,refactor,chore,test[BREAKING]to the beginning of the title.[BREAKING][fsdp, megatron] feat: dynamic batchingTest
API and Usage Example
# Add code snippet or script demonstrating how to use thisDesign & Code Changes
Checklist Before Submitting
Important
Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review.
pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=alwaysci-requestchannel in theverlSlack workspace. (If not accessible, please try the Feishu group (飞书群).)