[rollout, sglang] feat: support blockwise fp8 rollout by Agoniii · Pull Request #4415 · verl-project/verl

Agoniii · 2025-12-04T10:56:35Z

What does this PR do?

This PR introduces FP8 rollout with sglang inference backend in verl.

Experiments and Outcomes

Qwen3-8B-Base Dense Model

Configuration

DAPO recipe. AIME24 online validation.
SGLang + FSDP
- Note that SPMD rollout has been deprecated, so we removed the FP8 SPMD rollout.
Prompt batch size 32, n=16.
Rollout batch size: 32*3*16
Train_batch_size & ppo_mini_batch_size 32
Max response length 20K
Token-level TIS, C=2
8*H100
verlai/verl:sgl055.latest

Accuracy

With TIS, FP8 rollout aligns with BF16

Performance

purple: BF16, red: FP8 rollout

Results and observations:

FP8 rollout leads to around ~12% rollout speedup

Checklist Before Starting

Search for similar PRs. Paste at least one query link here: ...
Format the PR title as [{modules}] {type}: {description} (This will be checked by the CI)
- {modules} include fsdp, megatron, sglang, vllm, rollout, trainer, ci, training_utils, recipe, hardware, deployment, ray, worker, single_controller, misc, perf, model, algo, env, tool, ckpt, doc, data
- If this PR involves multiple modules, separate them with , like [megatron, fsdp, doc]
- {type} is in feat, fix, refactor, chore, test
- If this PR breaks any API (CLI arguments, config, function signature, etc.), add [BREAKING] to the beginning of the title.
- Example: [BREAKING][fsdp, megatron] feat: dynamic batching

Test

For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluation results, etc.

API and Usage Example

Demonstrate how the API changes if any, and provide usage example(s) if possible.

# Add code snippet or script demonstrating how to use this

Design & Code Changes

Demonstrate the high-level design if this PR is complex, and list the specific changes.

Checklist Before Submitting

Important

Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review.

Read the Contribute Guide.
Apply pre-commit checks: pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always
Add / Update the documentation.
Add unit or end-to-end test(s) to the CI workflow to cover all the code. If not feasible, explain why: ...
Once your PR is ready for CI, send a message in the ci-request channel in the verl Slack workspace. (If not accessible, please try the Feishu group (飞书群).)

gemini-code-assist

Code Review

This pull request introduces FP8 rollout support for the sglang backend, which is a significant feature enhancement. The changes include adding a new utility file for FP8 quantization, updating the sglang server and rollout worker to handle FP8 configurations, and updating the documentation accordingly. The implementation appears solid. My review focuses on improving maintainability by addressing code duplication and removing unreachable code.

gemini-code-assist · 2025-12-04T10:59:26Z

verl/utils/sglang/sglang_fp8_utils.py

+            if weight_block_size is not None:
+                if torch.distributed.get_rank() == 0:
+                    logger.debug(f"  Quantizing to FP8 blockwise: {k}")
+                param_lp, param_scale = scaled_fp8_blockwise(
+                    v.to(dtype),
+                    weight_block_size=weight_block_size,
+                )
+                param_scale = param_scale.squeeze(-1)
+                weights_quantized.append([k, param_lp])
+                weights_quantized.append([k + "_scale_inv", param_scale])
+            else:
+                raise ValueError(
+                    "Only blockwise quantization is supported. Please set weight_block_size in quant_config"
+                )


This else block is unreachable. weight_block_size is checked for None on line 152 before the loop begins, and an exception is raised if it is None. Consequently, the condition weight_block_size is not None on line 163 will always evaluate to true inside the loop, rendering the else branch dead code. Removing the conditional wrapper and the unreachable else block will improve code clarity and maintainability.

if torch.distributed.get_rank() == 0: logger.debug(f" Quantizing to FP8 blockwise: {k}") param_lp, param_scale = scaled_fp8_blockwise( v.to(dtype), weight_block_size=weight_block_size, ) param_scale = param_scale.squeeze(-1) weights_quantized.append([k, param_lp]) weights_quantized.append([k + "_scale_inv", param_scale])

gemini-code-assist · 2025-12-04T10:59:26Z

verl/workers/rollout/sglang_rollout/async_sglang_server.py

+                assert sglang.__version__ >= "0.5.5", "sglang>=0.5.5 is required for FP8 quantization"
+                FP8_BLOCK_QUANT_KWARGS = {
+                    "activation_scheme": "dynamic",
+                    "fmt": "e4m3",
+                    "quant_method": "fp8",
+                    "weight_block_size": [128, 128],
+                }
+                fp8_block_quant_kwargs = dict(FP8_BLOCK_QUANT_KWARGS)


The FP8 quantization configuration logic, including the version check and FP8_BLOCK_QUANT_KWARGS dictionary, is duplicated in verl/workers/rollout/sglang_rollout/sglang_rollout.py. To improve maintainability and prevent future inconsistencies, this logic should be centralized. Consider moving FP8_BLOCK_QUANT_KWARGS to verl/utils/sglang/sglang_fp8_utils.py as a constant and creating a helper function there to encapsulate the version check and config creation.

gemini-code-assist · 2025-12-04T10:59:26Z

verl/workers/rollout/sglang_rollout/sglang_rollout.py

+            assert sglang.__version__ >= "0.5.5", "sglang>=0.5.5 is required for FP8 quantization"
+            FP8_BLOCK_QUANT_KWARGS = {
+                "activation_scheme": "dynamic",
+                "fmt": "e4m3",
+                "quant_method": "fp8",
+                "weight_block_size": [128, 128],
+            }
+            fp8_block_quant_kwargs = dict(FP8_BLOCK_QUANT_KWARGS)


The FP8 quantization configuration logic, including the version check and FP8_BLOCK_QUANT_KWARGS dictionary, is duplicated in verl/workers/rollout/sglang_rollout/async_sglang_server.py. To improve maintainability and prevent future inconsistencies, this logic should be centralized. Consider moving FP8_BLOCK_QUANT_KWARGS to verl/utils/sglang/sglang_fp8_utils.py as a constant and creating a helper function there to encapsulate the version check and config creation.

) ### What does this PR do? This PR introduces FP8 rollout with sglang inference backend in verl. #### Experiments and Outcomes Qwen3-8B-Base Dense Model **Configuration** - DAPO recipe. AIME24 online validation. - SGLang + FSDP - Note that SPMD rollout has been deprecated, so we removed the FP8 SPMD rollout. - Prompt batch size 32, n=16. - Rollout batch size: 32\*3*16 - Train_batch_size & ppo_mini_batch_size 32 - Max response length 20K - Token-level TIS, C=2 - 8*H100 - verlai/verl:sgl055.latest **Accuracy** With TIS, FP8 rollout aligns with BF16 <img width="1460" height="782" alt="image" src="https://github.com/user-attachments/assets/c8b04c8c-2961-4ad3-9c0a-0d0bee80fd74" /> **Performance** <img width="661" height="661" alt="image" src="https://github.com/user-attachments/assets/967b6889-08b6-407b-8586-86b42a58d0b7" /> <img width="661" height="668" alt="image" src="https://github.com/user-attachments/assets/0b3f4ad1-87e2-428e-ab96-d241944a2b41" /> *purple: BF16, red: FP8 rollout* Results and observations: - FP8 rollout leads to around ~12% rollout speedup ### Checklist Before Starting - [ ] Search for similar PRs. Paste at least one query link here: ... - [x] Format the PR title as `[{modules}] {type}: {description}` (This will be checked by the CI) - `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`, `trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`, `ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`, `env`, `tool`, `ckpt`, `doc`, `data` - If this PR involves multiple modules, separate them with `,` like `[megatron, fsdp, doc]` - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test` - If this PR breaks any API (CLI arguments, config, function signature, etc.), add `[BREAKING]` to the beginning of the title. - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching` ### Test > For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluation results, etc. ### API and Usage Example > Demonstrate how the API changes if any, and provide usage example(s) if possible. ```python # Add code snippet or script demonstrating how to use this ``` ### Design & Code Changes > Demonstrate the high-level design if this PR is complex, and list the specific changes. ### Checklist Before Submitting > [!IMPORTANT] > Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review. - [x] Read the [Contribute Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md). - [x] Apply [pre-commit checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting): `pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always` - [x] Add / Update [the documentation](https://github.com/volcengine/verl/tree/main/docs). - [x] Add unit or end-to-end test(s) to [the CI workflow](https://github.com/volcengine/verl/tree/main/.github/workflows) to cover all the code. If not feasible, explain why: ... - [ ] Once your PR is ready for CI, send a message in [the `ci-request` channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the `verl` Slack workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ). (If not accessible, please try [the Feishu group (飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).) --------- Co-authored-by: Xue Huang <xueh@nvidia.com>

* [doc] chore: Update ascend quickstart and docker build guidance doc (#4420) ### What does this PR do? As title. ### Checklist Before Starting - [ ] Search for similar PRs. Paste at least one query link here: ... - [ ] Format the PR title as `[{modules}] {type}: {description}` (This will be checked by the CI) - `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`, `trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`, `ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`, `env`, `tool`, `ckpt`, `doc`, `data` - If this PR involves multiple modules, separate them with `,` like `[megatron, fsdp, doc]` - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test` - If this PR breaks any API (CLI arguments, config, function signature, etc.), add `[BREAKING]` to the beginning of the title. - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching` ### Test > For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluation results, etc. ### API and Usage Example > Demonstrate how the API changes if any, and provide usage example(s) if possible. ```python # Add code snippet or script demonstrating how to use this ``` ### Design & Code Changes > Demonstrate the high-level design if this PR is complex, and list the specific changes. ### Checklist Before Submitting > [!IMPORTANT] > Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review. - [ ] Read the [Contribute Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md). - [ ] Apply [pre-commit checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting): `pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always` - [ ] Add / Update [the documentation](https://github.com/volcengine/verl/tree/main/docs). - [ ] Add unit or end-to-end test(s) to [the CI workflow](https://github.com/volcengine/verl/tree/main/.github/workflows) to cover all the code. If not feasible, explain why: ... - [ ] Once your PR is ready for CI, send a message in [the `ci-request` channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the `verl` Slack workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ). (If not accessible, please try [the Feishu group (飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).) * [sglang] feat: retires sglang spmd mode in the codebase (#4422) ### What does this PR do? Retires the legacy SGLang SPMD rollout path and makes async/server mode the only supported backend for SGLang. The PR removes the old `SGLangRollout` class, its helpers, tests, and recipes, and updates all docs, scripts, and CI references so they speak only to the async HTTP adapter (`ServerAdapter`). ### Checklist Before Starting - [ ] Search for similar PRs. Paste at least one query link here: _N/A (follow-up to the vLLM SPMD removal)._ - [ ] Format the PR title as `[sglang, rollout, trainer, recipe, ci, doc] refactor: remove SGLang SPMD rollout` ### Test ### API and Usage Example Same as vLLM: configs/scripts must set `actor_rollout_ref.rollout.mode=async` and rely on the HTTP server. Example: ```bash python -m verl.trainer.main_ppo \ ... \ actor_rollout_ref.rollout.name=sglang \ actor_rollout_ref.rollout.mode=async \ ... ``` ### Design & Code Changes - Deleted the `SGLangRollout` class and associated helpers from `verl/workers/rollout/sglang_rollout/sglang_rollout.py`, keeping only the async `ServerAdapter`. Cleared its registry entries, configs, and guards the same way as the vLLM PR. - Removed SGLang SPMD-specific tests (`tests/workers/rollout/test_sglang_*`) and CI steps in `.github/workflows/sgl.yml`, plus any lint exclusions that referenced those files. - Updated recipes/examples/e2e scripts that referenced SGLang rollout to hardcode `rollout.mode=async`, drop sync branches, and set `return_raw_chat` (mirroring the vLLM cleanup). ### Checklist Before Submitting - [ ] Read the [Contribute Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md). - [ ] Apply [pre-commit checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting). - [x] Add / Update [the documentation](https://github.com/volcengine/verl/tree/main/docs). - [x] Add unit or end-to-end test(s) to [the CI workflow](https://github.com/volcengine/verl/tree/main/.github/workflows) to cover all the code. If not feasible, explain why: _Removed the obsolete SGLang SPMD jobs; async workflows remain covered._ - [ ] Once your PR is ready for CI, notify the `ci-request` channel (or Feishu group). * [fsdp] feat: update NPU fused kernels for Qwen3 moe block (#4406) ### What does this PR do? This PR optimizes Qwen3-MoE model training performance on Ascend NPU devices. It optimizes the implementation of **GMM (Grouped Matmul)** and integrates fused **permute/unpermute** kernels, achieving a 20%+ training speedup. Key changes: 1. Added NPU GMM kernel for backward `dw`. 2. Added `npu_moe_token_permute` and `npu_moe_token_unpermute` fused kernels. 3. Unified GMM function for Qwen3-VL and Qwen3-MoE. 4. Reduced transpose operators in expert weight stacking. ### Checklist Before Starting - [ ] Search for similar PRs. Paste at least one query link here: https://github.com/volcengine/verl/pull/3221 - [ ] Format the PR title as `[{modules}] {type}: {description}` (This will be checked by the CI) - `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`, `trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`, `ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`, `env`, `tool`, `ckpt`, `doc`, `data` - If this PR involves multiple modules, separate them with `,` like `[megatron, fsdp, doc]` - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test` - If this PR breaks any API (CLI arguments, config, function signature, etc.), add `[BREAKING]` to the beginning of the title. - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching` ### Test > For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluation results, etc. Tested with Qwen3-30B-A3B (FSDP, sp=8) on 64 Ascend A2 NPUs. Baseline: ![img_v3_02sj_639d375c-ef1e-4e3f-9c3e-afa1c6696feg](https://github.com/user-attachments/assets/87615508-13c1-4cdf-96c0-e1ee09fa99d1) With optimized fusion kernels: ![img_v3_02sj_401d607a-5bb0-42d8-bf07-af47a06cfd4g](https://github.com/user-attachments/assets/6a2f2ca7-e741-416b-b892-234b75b0f809) <!DOCTYPE html><p cid="n349" mdtype="paragraph" class="md-end-block md-p" style="box-sizing: border-box; line-height: inherit; orphans: 4; margin: 0.8em 0px; white-space: pre-wrap; position: relative; caret-color: rgb(51, 51, 51); color: rgb(51, 51, 51); font-family: "Open Sans", "Clear Sans", "Helvetica Neue", Helvetica, Arial, "Segoe UI Emoji", sans-serif; font-size: 16px; font-style: normal; font-variant-caps: normal; font-weight: 400; letter-spacing: normal; text-align: start; text-indent: 0px; text-transform: none; word-spacing: 0px; -webkit-text-stroke-width: 0px; background-color: rgb(255, 255, 255); text-decoration: none;">Performance results (step 1):<figure class="md-table-fig" cid="n350" mdtype="table" style="box-sizing: border-box; margin: 1.2em 0px; overflow-x: auto; max-width: calc(100% + 16px); padding: 0px; cursor: default; caret-color: rgb(51, 51, 51); color: rgb(51, 51, 51); font-family: "Open Sans", "Clear Sans", "Helvetica Neue", Helvetica, Arial, "Segoe UI Emoji", sans-serif; font-size: 16px; font-style: normal; font-variant-caps: normal; font-weight: 400; letter-spacing: normal; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px; background-color: rgb(255, 255, 255); text-decoration: none;"> Experiment | gen(s) | old_log_prob(s) | update_actor(s) | step(s) -- | -- | -- | -- | -- Baseline | 1180.1 | 71.3 | 152.5 | 1406.9 Fused (This PR) | 1167.5 | 58.6 | 111.1 | 1340.9 </figure> **Precision comparison:** <img width="664" height="360" alt="image" src="https://github.com/user-attachments/assets/5e6629c4-31b5-49ec-97cf-d5e4e6beb69c" /> ### API and Usage Example > Demonstrate how the API changes if any, and provide usage example(s) if possible. ```python # Add code snippet or script demonstrating how to use this ``` ### Design & Code Changes > Demonstrate the high-level design if this PR is complex, and list the specific changes. ### Checklist Before Submitting > [!IMPORTANT] > Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review. - [x] Read the [Contribute Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md). - [x] Apply [pre-commit checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting): `pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always` - [x] Add / Update [the documentation](https://github.com/volcengine/verl/tree/main/docs). - [x] Add unit or end-to-end test(s) to [the CI workflow](https://github.com/volcengine/verl/tree/main/.github/workflows) to cover all the code. If not feasible, explain why: ... - [x] Once your PR is ready for CI, send a message in [the `ci-request` channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the `verl` Slack workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ). (If not accessible, please try [the Feishu group (飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).) * [misc] refactor: clean up unused sharding manager (#4439) ### What does this PR do? As per title. ### Checklist Before Starting - [x] Search for similar PRs. Paste at least one query link here: ... - [x] Format the PR title as `[{modules}] {type}: {description}` (This will be checked by the CI) - `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`, `trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`, `ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`, `env`, `tool`, `ckpt`, `doc`, `data` - If this PR involves multiple modules, separate them with `,` like `[megatron, fsdp, doc]` - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test` - If this PR breaks any API (CLI arguments, config, function signature, etc.), add `[BREAKING]` to the beginning of the title. - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching` ### Test > For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluation results, etc. ### API and Usage Example > Demonstrate how the API changes if any, and provide usage example(s) if possible. ```python # Add code snippet or script demonstrating how to use this ``` ### Design & Code Changes > Demonstrate the high-level design if this PR is complex, and list the specific changes. ### Checklist Before Submitting > [!IMPORTANT] > Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review. - [x] Read the [Contribute Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md). - [x] Apply [pre-commit checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting): `pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always` - [ ] Add / Update [the documentation](https://github.com/volcengine/verl/tree/main/docs). - [ ] Add unit or end-to-end test(s) to [the CI workflow](https://github.com/volcengine/verl/tree/main/.github/workflows) to cover all the code. If not feasible, explain why: ... - [ ] Once your PR is ready for CI, send a message in [the `ci-request` channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the `verl` Slack workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ). (If not accessible, please try [the Feishu group (飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).) * [hardware] chore: clean npu_patch (#4436) ### What does this PR do? > Add **concise** overview of what this PR aims to achieve or accomplish. Reference related GitHub issues and PRs that help with the review. With an increasing number of models on verl being adapted to Ascend NPUs, along with the upgrade of the transformers version (currently `v4.57.3`), the content in the `npu_patch.py` has grown significantly. Its organization has gradually become cluttered, lacking a unified naming format, and a small portion of the patches have already become obsolete (the original functions no longer exist). This PR aims to address the above issues, ensuring that `npu_patch.py` remains clean and well-organized. ### Checklist Before Starting - [ ] Search for similar PRs. Paste at least one query link here: ... - [ ] Format the PR title as `[{modules}] {type}: {description}` (This will be checked by the CI) - `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`, `trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`, `ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`, `env`, `tool`, `ckpt`, `doc`, `data` - If this PR involves multiple modules, separate them with `,` like `[megatron, fsdp, doc]` - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test` - If this PR breaks any API (CLI arguments, config, function signature, etc.), add `[BREAKING]` to the beginning of the title. - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching` ### Test > For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluation results, etc. Not related. ### API and Usage Example > Demonstrate how the API changes if any, and provide usage example(s) if possible. Not related. ### Design & Code Changes > Demonstrate the high-level design if this PR is complex, and list the specific changes. Not related. ### Checklist Before Submitting > [!IMPORTANT] > Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review. - [ ] Read the [Contribute Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md). - [ ] Apply [pre-commit checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting): `pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always` - [ ] Add / Update [the documentation](https://github.com/volcengine/verl/tree/main/docs). - [ ] Add unit or end-to-end test(s) to [the CI workflow](https://github.com/volcengine/verl/tree/main/.github/workflows) to cover all the code. If not feasible, explain why: ... - [ ] Once your PR is ready for CI, send a message in [the `ci-request` channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the `verl` Slack workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ). (If not accessible, please try [the Feishu group (飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).) * [misc] fix: fix memory leakage when initializing multiple tools (#4430) ### What does this PR do? Prevents background event-loop leaks in the MCP tool loader. `initialize_tools_from_config` now lazily spawns the auxiliary asyncio loop only when an MCP tool is present, and always stops/closes the loop on exit so the worker process doesn’t hold onto threads or loop resources. ### Checklist Before Starting - [ ] Search for similar PRs. Paste at least one query link here: _N/A (targeted fix for the MCP tool registry)._ - [ ] Format the PR title as `[tool, misc] fix: clean up MCP tool event loop` ### Test Not applicable (behavioral fix in tool initialization; existing tool-based tests still cover the code path). ### API and Usage Example No API surface changes; the existing YAML config flow stays the same. Example usage: ```python from verl.tools.utils.tool_registry import initialize_tools_from_config tool_instances = initialize_tools_from_config("configs/tools.yaml") ``` ### Design & Code Changes - Replaced the eager `asyncio.new_event_loop()` creation with a lazy `get_mcp_event_loop()` helper so purely native tool configs no longer spawn threads. - Simplified coroutine execution to always go through the lazily-initialized loop. - Added robust cleanup in the `finally` block: stop the loop, join the thread, and call `loop.close()` to release resources (fixing the leak). ### Checklist Before Submitting - [ ] Read the [Contribute Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md). - [ ] Apply [pre-commit checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting). - [ ] Add / Update [the documentation](https://github.com/volcengine/verl/tree/main/docs). - [ ] Add unit or end-to-end test(s) to [the CI workflow](https://github.com/volcengine/verl/tree/main/.github/workflows). If not feasible, explain why: _Tool loading already covered by existing smoke tests._ - [ ] Once your PR is ready for CI, notify the `ci-request` channel (or Feishu group). * [trainer, vllm, megatron, recipe] feat: one/two step off async on-policy distillation recipe (#3975) ### What does this PR do? This PR provides a simple implementation of one and two step off async knowledge distillation with megatron and vllm backend. ### Checklist Before Starting - [x] Search for similar PRs. Paste at least one query link here: ... - [x] Format the PR title as `[{modules}] {type}: {description}` (This will be checked by the CI) - `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`, `trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`, `ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`, `env`, `tool`, `ckpt`, `doc`, `data` - If this PR involves multiple modules, separate them with `,` like `[megatron, fsdp, doc]` - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test` - If this PR breaks any API (CLI arguments, config, function signature, etc.), add `[BREAKING]` to the beginning of the title. - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching` ### Test > For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluation results, etc. ### API and Usage Example > Demonstrate how the API changes if any, and provide usage example(s) if possible. ```python # Add code snippet or script demonstrating how to use this ``` ### Design & Code Changes > Demonstrate the high-level design if this PR is complex, and list the specific changes. ### Checklist Before Submitting > [!IMPORTANT] > Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review. - [x] Read the [Contribute Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md). - [x] Apply [pre-commit checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting): `pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always` - [x] Add / Update [the documentation](https://github.com/volcengine/verl/tree/main/docs). - [ ] Add unit or end-to-end test(s) to [the CI workflow](https://github.com/volcengine/verl/tree/main/.github/workflows) to cover all the code. If not feasible, explain why: ... - [ ] Once your PR is ready for CI, send a message in [the `ci-request` channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the `verl` Slack workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ). (If not accessible, please try [the Feishu group (飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).) --------- Co-authored-by: funrunding <furunding@163.com> * [misc] feat: optimize performance of index_select_tensor_dict (#4444) ### What does this PR do? - Optimize the performance of index_select_tensor_dict by unbind first then index. Results on chat count task <img width="360" height="253" alt="image" src="https://github.com/user-attachments/assets/cdc611e4-6006-4bc2-85a1-a49ec5f63b37" /> <img width="363" height="254" alt="image" src="https://github.com/user-attachments/assets/0a89cbc0-325f-4f00-b986-1fac75a74aa5" /> There is still a gap and we need to keep investigation ### Checklist Before Starting - [ ] Search for similar PRs. Paste at least one query link here: ... - [ ] Format the PR title as `[{modules}] {type}: {description}` (This will be checked by the CI) - `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`, `trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`, `ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`, `env`, `tool`, `ckpt`, `doc`, `data` - If this PR involves multiple modules, separate them with `,` like `[megatron, fsdp, doc]` - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test` - If this PR breaks any API (CLI arguments, config, function signature, etc.), add `[BREAKING]` to the beginning of the title. - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching` ### Test > For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluation results, etc. ### API and Usage Example > Demonstrate how the API changes if any, and provide usage example(s) if possible. ```python # Add code snippet or script demonstrating how to use this ``` ### Design & Code Changes > Demonstrate the high-level design if this PR is complex, and list the specific changes. ### Checklist Before Submitting > [!IMPORTANT] > Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review. - [ ] Read the [Contribute Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md). - [ ] Apply [pre-commit checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting): `pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always` - [ ] Add / Update [the documentation](https://github.com/volcengine/verl/tree/main/docs). - [ ] Add unit or end-to-end test(s) to [the CI workflow](https://github.com/volcengine/verl/tree/main/.github/workflows) to cover all the code. If not feasible, explain why: ... - [ ] Once your PR is ready for CI, send a message in [the `ci-request` channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the `verl` Slack workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ). (If not accessible, please try [the Feishu group (飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).) * [ci] test: Disable ReMax training test in vllm workflow (#4445) * [rollout] fix: RolloutConfig should support repetition_penalty config… (#4398) ### What does this PR do? > Add **concise** overview of what this PR aims to achieve or accomplish. Reference related GitHub issues and PRs that help with the review. During the rollout process, this configuration will be read from RolloutConfig. If it cannot be read, the default value is 1.0. However, if you try to configure this parameter in the YAML file, an error will occur because there is no such member in RolloutConfig. <img width="1040" height="749" alt="image" src="https://github.com/user-attachments/assets/921eb8ae-35b6-49db-a22c-d40f31e6f59b" /> ### Checklist Before Starting - [x] Search for similar PRs. Paste at least one query link here: ... - [x] Format the PR title as `[{modules}] {type}: {description}` (This will be checked by the CI) - `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`, `trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`, `ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`, `env`, `tool`, `ckpt`, `doc`, `data` - If this PR involves multiple modules, separate them with `,` like `[megatron, fsdp, doc]` - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test` - If this PR breaks any API (CLI arguments, config, function signature, etc.), add `[BREAKING]` to the beginning of the title. - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching` ### Test > For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluation results, etc. ### API and Usage Example > Demonstrate how the API changes if any, and provide usage example(s) if possible. ```python # Add code snippet or script demonstrating how to use this ``` ### Design & Code Changes > Demonstrate the high-level design if this PR is complex, and list the specific changes. ### Checklist Before Submitting > [!IMPORTANT] > Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review. - [x] Read the [Contribute Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md). - [x] Apply [pre-commit checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting): `pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always` - [x] Add / Update [the documentation](https://github.com/volcengine/verl/tree/main/docs). - [x] Add unit or end-to-end test(s) to [the CI workflow](https://github.com/volcengine/verl/tree/main/.github/workflows) to cover all the code. If not feasible, explain why: ... - [x] Once your PR is ready for CI, send a message in [the `ci-request` channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the `verl` Slack workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ). (If not accessible, please try [the Feishu group (飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).) --------- Co-authored-by: Li Zuming <lizuming@huawei.com> * [recipe] feat: add fully async comm between rollout and sim node in disagg mode (#4433) ### What does this PR do? This PR provides a new feature on VLA recipe, fully async communication between rollout and simulation nodes in disaggregate mode. This PR allows two overlap optimization: 1. Overlapping between simulation tool **reset** and rollout weights **update**. Now we can reset the simulation status and update actor_rollout weights simultaneously, without sync wait. *This feature saves costs close to 8-step execution time.* 2. Overlapping **communication** between simulation steps and rollout steps, aka eliminating the transfer overhead among nodes. Now in each step, different pipeline stages (see PR #3918 ) will be executed independently, thus there will be no sync waits among stages, so that the relatively longer steps (simulation steps currently) can be executed continuously. *This feature saves about 12% costs in each VLA RL step, depending on actual execution time of simulation steps.* **Details about communication overlapping** - At first our disaggregate mode on VLA show workflows as follow: ![img_v3_02sm_7be54524-8d37-4380-a4d2-fb871837d64g](https://github.com/user-attachments/assets/f21244b1-1d4e-4bac-82d3-4a40008c6ca2) - As in disaggregate mode, rollout steps are executed on local nodes and simulation steps are executed on remote nodes. - Low GPU utilization, lots of resources are wasted. - So we implement the pipeline execute mode in PR #3918 , then the workflow (2 stages for instance) shows as follows: ![img_v3_02sm_d398e0bb-ed08-4eaf-ae4d-53d6b9f1343g](https://github.com/user-attachments/assets/084e21d1-6224-4cce-8e0c-026e4f8e733c) - We can see that the rollout steps (R) and simulation steps (S) are executed partially overlapped (pipeline stage 0 and 1), and simulation steps take much longer time than rollout ones. - There are still parts of wasted time (unnecessary data transfer delays) because of the sequential workflow. The reason is shown as follows: ![img_v3_02sm_abd3887d-1220-43b6-af16-fcf357d20c3g](https://github.com/user-attachments/assets/c9493870-0934-4478-a8cf-7b9928f580f1) - The sync waits among pipeline stages cannot be eliminated due to sequential workflow: although steps are executed in various nodes, they need to be synchronized after each step, which causes communication cost and is unnecessary. - Then we implemented async communication pipelines to overlap the communication time by different execution time between rollout and simulation steps. The workflows and execution results are shown as follows: ![img_v3_02sm_60b64952-ed9c-40c5-8665-2187e21112fg](https://github.com/user-attachments/assets/074caaeb-8cfe-4b75-8de3-9e73f8c7d02e) --- ### Checklist Before Starting - [ ] Search for similar PRs. Paste at least one query link here: ... - [ ] Format the PR title as `[{modules}] {type}: {description}` (This will be checked by the CI) - `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`, `trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`, `ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`, `env`, `tool`, `ckpt`, `doc`, `data` - If this PR involves multiple modules, separate them with `,` like `[megatron, fsdp, doc]` - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test` - If this PR breaks any API (CLI arguments, config, function signature, etc.), add `[BREAKING]` to the beginning of the title. - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching` ### Test ```shell # you can use `ray.timeline` tool for profiling in recipe/vla/main_ppo.py bash recipe/vla/run_simpleVLA_isaac_disagg.sh ``` Profile results of reset overlapping <img width="1420" height="232" alt="image" src="https://github.com/user-attachments/assets/dd98ea70-fa3f-4938-a02e-a44f64a0d1e9" /> Profile results of communication overlapping <img width="1058" height="228" alt="image" src="https://github.com/user-attachments/assets/a9274864-ccd4-42f9-b5c4-1277bc0c1317" /> * [misc] feat: optimize nested tensor index (#4447) ### What does this PR do? - As title. The same as PR: https://github.com/volcengine/verl/pull/4444 ### Checklist Before Starting - [ ] Search for similar PRs. Paste at least one query link here: ... - [ ] Format the PR title as `[{modules}] {type}: {description}` (This will be checked by the CI) - `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`, `trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`, `ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`, `env`, `tool`, `ckpt`, `doc`, `data` - If this PR involves multiple modules, separate them with `,` like `[megatron, fsdp, doc]` - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test` - If this PR breaks any API (CLI arguments, config, function signature, etc.), add `[BREAKING]` to the beginning of the title. - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching` ### Test > For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluation results, etc. ### API and Usage Example > Demonstrate how the API changes if any, and provide usage example(s) if possible. ```python # Add code snippet or script demonstrating how to use this ``` ### Design & Code Changes > Demonstrate the high-level design if this PR is complex, and list the specific changes. ### Checklist Before Submitting > [!IMPORTANT] > Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review. - [ ] Read the [Contribute Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md). - [ ] Apply [pre-commit checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting): `pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always` - [ ] Add / Update [the documentation](https://github.com/volcengine/verl/tree/main/docs). - [ ] Add unit or end-to-end test(s) to [the CI workflow](https://github.com/volcengine/verl/tree/main/.github/workflows) to cover all the code. If not feasible, explain why: ... - [ ] Once your PR is ready for CI, send a message in [the `ci-request` channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the `verl` Slack workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ). (If not accessible, please try [the Feishu group (飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).) * [model] feat: add qwen3-4b grpo script on ASCEND NPU A3 (#4432) ### What does this PR do? add examples/grpo_trainer/run_qwen3-4b_npu.sh ### Test The figure below shows the comparison curve of the critic_reward_mean metric. <img width="1790" height="948" alt="image" src="https://github.com/user-attachments/assets/01df9bed-f888-470d-936c-eb335acd57e9" /> ### API and Usage Example ```sh # install jemalloc sudo apt update sudo apt install libjemalloc2 # run bash bash examples/grpo_trainer/run_qwen3-4b_npu.sh ``` * [megatron] fix: Remove Deprecated Megatron Optimizer Args (#4396) ### What does this PR do? This PR removes the deprecated arguments during Megatron optimizer building for compatibility with the latest Megatron, see [https://github.com/NVIDIA/Megatron-LM/blob/main/megatron/core/optimizer/__init__.py#L442](https://github.com/NVIDIA/Megatron-LM/blob/main/megatron/core/optimizer/__init__.py#L442). These arguments are never used by verl so they can be safely removed. This solves the following exception with the latest Megatron: ``` File "/usr/lib/python3.12/concurrent/futures/_base.py", line 456, in result return self.__get_result() ^^^^^^^^^^^^^^^^^^^ File "/usr/lib/python3.12/concurrent/futures/_base.py", line 401, in __get_result raise self._exception ^^^^^^^^^^^^^^^^^^^^^ ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/root/verl/verl/single_controller/ray/base.py", line 825, in func return getattr(self.worker_dict[key], name)(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/root/verl/verl/single_controller/base/decorator.py", line 451, in inner return func(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^ File "/root/verl/verl/utils/transferqueue_utils.py", line 187, in dummy_inner return func(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^ File "/root/verl/verl/workers/megatron_workers.py", line 573, in init_model ) = self._build_model_optimizer( ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/root/verl/verl/workers/megatron_workers.py", line 464, in _build_model_optimizer actor_optimizer = get_megatron_optimizer(model=actor_module, config=optim_config_megatron) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/root/verl/verl/utils/megatron/optimizer.py", line 71, in get_megatron_optimizer return get_megatron_optimizer_native( ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ TypeError: get_megatron_optimizer() got an unexpected keyword argument 'no_weight_decay_cond' ``` ### Checklist Before Starting - [x] Search for similar PRs. Paste at least one query link here: ... - [x] Format the PR title as `[{modules}] {type}: {description}` (This will be checked by the CI) - `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`, `trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`, `ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`, `env`, `tool`, `ckpt`, `doc`, `data` - If this PR involves multiple modules, separate them with `,` like `[megatron, fsdp, doc]` - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test` - If this PR breaks any API (CLI arguments, config, function signature, etc.), add `[BREAKING]` to the beginning of the title. - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching` ### Test > For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluation results, etc. ### API and Usage Example > Demonstrate how the API changes if any, and provide usage example(s) if possible. ```python # Add code snippet or script demonstrating how to use this ``` ### Design & Code Changes > Demonstrate the high-level design if this PR is complex, and list the specific changes. ### Checklist Before Submitting > [!IMPORTANT] > Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review. - [x] Read the [Contribute Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md). - [ ] Apply [pre-commit checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting): `pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always` - [ ] Add / Update [the documentation](https://github.com/volcengine/verl/tree/main/docs). - [ ] Add unit or end-to-end test(s) to [the CI workflow](https://github.com/volcengine/verl/tree/main/.github/workflows) to cover all the code. If not feasible, explain why: ... - [ ] Once your PR is ready for CI, send a message in [the `ci-request` channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the `verl` Slack workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ). (If not accessible, please try [the Feishu group (飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).) * [megatron] fix: respect `use_distributed_optimizer` in config (#4392) ### What does this PR do? Currently `use_distributed_optimizer` is hardcoded as `optim_args`, which is unexpected since `use_distributed_optimizer` is a config for `megatron`. ### Checklist Before Starting - [X] Search for similar PRs. Paste at least one query link here: ... - [X] Format the PR title as `[{modules}] {type}: {description}` (This will be checked by the CI) - `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`, `trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`, `ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`, `env`, `tool`, `ckpt`, `doc`, `data` - If this PR involves multiple modules, separate them with `,` like `[megatron, fsdp, doc]` - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test` - If this PR breaks any API (CLI arguments, config, function signature, etc.), add `[BREAKING]` to the beginning of the title. - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching` ### Test > For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluation results, etc. ### API and Usage Example > Demonstrate how the API changes if any, and provide usage example(s) if possible. ```python # Add code snippet or script demonstrating how to use this ``` ### Design & Code Changes > Demonstrate the high-level design if this PR is complex, and list the specific changes. ### Checklist Before Submitting > [!IMPORTANT] > Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review. - [X] Read the [Contribute Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md). - [X] Apply [pre-commit checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting): `pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always` - [X] Add / Update [the documentation](https://github.com/volcengine/verl/tree/main/docs). - [X] Add unit or end-to-end test(s) to [the CI workflow](https://github.com/volcengine/verl/tree/main/.github/workflows) to cover all the code. If not feasible, explain why: ... - [X] Once your PR is ready for CI, send a message in [the `ci-request` channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the `verl` Slack workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ). (If not accessible, please try [the Feishu group (飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).) Signed-off-by: Hollow Man <hollowman@opensuse.org> * [recipe, ci] fix: remove batch mode for remote generative reward model (#4448) ### What does this PR do? Reward loop deprecated batch reward manager. Fix `genrm_remote` recipe as it used batch reward manager. ### Checklist Before Starting - [ ] Search for similar PRs. Paste at least one query link here: ... - [ ] Format the PR title as `[{modules}] {type}: {description}` (This will be checked by the CI) - `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`, `trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`, `ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`, `env`, `tool`, `ckpt`, `doc`, `data` - If this PR involves multiple modules, separate them with `,` like `[megatron, fsdp, doc]` - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test` - If this PR breaks any API (CLI arguments, config, function signature, etc.), add `[BREAKING]` to the beginning of the title. - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching` ### Test > For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluation results, etc. ### API and Usage Example > Demonstrate how the API changes if any, and provide usage example(s) if possible. ```python # Add code snippet or script demonstrating how to use this ``` ### Design & Code Changes > Demonstrate the high-level design if this PR is complex, and list the specific changes. ### Checklist Before Submitting > [!IMPORTANT] > Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review. - [ ] Read the [Contribute Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md). - [ ] Apply [pre-commit checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting): `pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always` - [ ] Add / Update [the documentation](https://github.com/volcengine/verl/tree/main/docs). - [ ] Add unit or end-to-end test(s) to [the CI workflow](https://github.com/volcengine/verl/tree/main/.github/workflows) to cover all the code. If not feasible, explain why: ... - [ ] Once your PR is ready for CI, send a message in [the `ci-request` channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the `verl` Slack workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ). (If not accessible, please try [the Feishu group (飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).) * [misc] feat: optimize rearrange_micro_batches (#4451) * [rollout, sglang] feat: support blockwise fp8 rollout (#4415) ### What does this PR do? This PR introduces FP8 rollout with sglang inference backend in verl. #### Experiments and Outcomes Qwen3-8B-Base Dense Model **Configuration** - DAPO recipe. AIME24 online validation. - SGLang + FSDP - Note that SPMD rollout has been deprecated, so we removed the FP8 SPMD rollout. - Prompt batch size 32, n=16. - Rollout batch size: 32\*3*16 - Train_batch_size & ppo_mini_batch_size 32 - Max response length 20K - Token-level TIS, C=2 - 8*H100 - verlai/verl:sgl055.latest **Accuracy** With TIS, FP8 rollout aligns with BF16 <img width="1460" height="782" alt="image" src="https://github.com/user-attachments/assets/c8b04c8c-2961-4ad3-9c0a-0d0bee80fd74" /> **Performance** <img width="661" height="661" alt="image" src="https://github.com/user-attachments/assets/967b6889-08b6-407b-8586-86b42a58d0b7" /> <img width="661" height="668" alt="image" src="https://github.com/user-attachments/assets/0b3f4ad1-87e2-428e-ab96-d241944a2b41" /> *purple: BF16, red: FP8 rollout* Results and observations: - FP8 rollout leads to around ~12% rollout speedup ### Checklist Before Starting - [ ] Search for similar PRs. Paste at least one query link here: ... - [x] Format the PR title as `[{modules}] {type}: {description}` (This will be checked by the CI) - `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`, `trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`, `ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`, `env`, `tool`, `ckpt`, `doc`, `data` - If this PR involves multiple modules, separate them with `,` like `[megatron, fsdp, doc]` - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test` - If this PR breaks any API (CLI arguments, config, function signature, etc.), add `[BREAKING]` to the beginning of the title. - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching` ### Test > For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluation results, etc. ### API and Usage Example > Demonstrate how the API changes if any, and provide usage example(s) if possible. ```python # Add code snippet or script demonstrating how to use this ``` ### Design & Code Changes > Demonstrate the high-level design if this PR is complex, and list the specific changes. ### Checklist Before Submitting > [!IMPORTANT] > Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review. - [x] Read the [Contribute Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md). - [x] Apply [pre-commit checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting): `pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always` - [x] Add / Update [the documentation](https://github.com/volcengine/verl/tree/main/docs). - [x] Add unit or end-to-end test(s) to [the CI workflow](https://github.com/volcengine/verl/tree/main/.github/workflows) to cover all the code. If not feasible, explain why: ... - [ ] Once your PR is ready for CI, send a message in [the `ci-request` channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the `verl` Slack workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ). (If not accessible, please try [the Feishu group (飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).) --------- Co-authored-by: Xue Huang <xueh@nvidia.com> * [trainer] feat: model engine sft trainer support vlm model (#4403) ### What does this PR do? SFT trainer support vlm model - [x] fsdp engine - [x] megatron engine Qwen3-VL-2B-Instruct sft trainer compare on [llamafactory/pokemon-gpt4o-captions](https://huggingface.co/datasets/llamafactory/pokemon-gpt4o-captions) <img width="1550" height="620" alt="image" src="https://github.com/user-attachments/assets/40f79711-a542-4816-89e2-24184c8cb495" /> * [trainer] feat: add reward loop config to default config (#4452) ### What does this PR do? Future PRs will transfer from legacy rm implementation to reward loop (in both rule-based, genrm, disrm, ...) gradually; this PR adds reward loop configs to defaults, which inherit the legacy reward model config, so it will not break any current api. Specifically, future PRs will: - align results between reward loop disrm and legacy fsdp/megatron disrm - deprecate fsdp/megatron disrm, use reward loop disrm as default - use reward loop rule-based, disrm-based, genrm-based as default - deprecate legacy reward model config ### Checklist Before Starting - [ ] Search for similar PRs. Paste at least one query link here: ... - [ ] Format the PR title as `[{modules}] {type}: {description}` (This will be checked by the CI) - `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`, `trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`, `ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`, `env`, `tool`, `ckpt`, `doc`, `data` - If this PR involves multiple modules, separate them with `,` like `[megatron, fsdp, doc]` - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test` - If this PR breaks any API (CLI arguments, config, function signature, etc.), add `[BREAKING]` to the beginning of the title. - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching` ### Test > For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluation results, etc. ### API and Usage Example > Demonstrate how the API changes if any, and provide usage example(s) if possible. ```python # Add code snippet or script demonstrating how to use this ``` ### Design & Code Changes > Demonstrate the high-level design if this PR is complex, and list the specific changes. ### Checklist Before Submitting > [!IMPORTANT] > Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review. - [ ] Read the [Contribute Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md). - [ ] Apply [pre-commit checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting): `pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always` - [ ] Add / Update [the documentation](https://github.com/volcengine/verl/tree/main/docs). - [ ] Add unit or end-to-end test(s) to [the CI workflow](https://github.com/volcengine/verl/tree/main/.github/workflows) to cover all the code. If not feasible, explain why: ... - [ ] Once your PR is ready for CI, send a message in [the `ci-request` channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the `verl` Slack workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ). (If not accessible, please try [the Feishu group (飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).) * [vllm] feat: support abort generating requests in vllm server (#4453) ### What does this PR do? This PR adds abort functionality to the vLLM async rollout server, enabling users to cancel ongoing generation requests efficiently. This is particularly useful for scenarios where generation needs to be stopped early (e.g., policy updates, timeout handling, or user cancellation). **Key additions:** - `abort_all_requests()` method to cancel all active generation requests - `abort_request(request_id)` method to cancel a specific request - `stop_reason` field in `TokenOutput` to distinguish between completed, aborted, and other finish states ### Test The abort functionality has been validated through a standalone test script that: 1. Starts 8 concurrent generation requests with long prompts 2. Waits 0.5s and calls `abort_all_requests()` 3. Verifies that requests are properly aborted with partial outputs **Test Results:** - ✅ Successfully aborted 7 out of 8 requests (1 completed before abort was triggered) - ✅ Abort operation completed in **2.27ms** (very fast) - ✅ All aborted requests returned partial outputs with `stop_reason="aborted"` - ✅ Completed request had `stop_reason="completed"` - ✅ All requests finished without timeout See full test log in the PR description above. ### API and Usage Example ```python import ray from verl.workers.rollout.replica import get_rollout_replica_class # Initialize vLLM rollout server rollout_server_class = get_rollout_replica_class("vllm") server = rollout_server_class(replica_rank=0, config=rollout_config, ...) await server.init_standalone() # Start generation ref = server._server_handle.generate.remote( request_id="req_123", prompt_ids=[1, 2, 3], sampling_params={"temperature": 1.0}, ) # Abort all requests result = await server.abort_all_requests() # Returns: {"aborted_count": 1, "request_ids": ["req_123"], ...} # Or abort a specific request result = await server.abort_request("req_123") # Returns: {"aborted": True, "request_id": "req_123"} # Check stop reason in output output = ray.get(ref) print(output.stop_reason) # "aborted" or "completed" ``` ### Design & Code Changes **1. Added `stop_reason` field to `TokenOutput` protocol** (`verl/workers/rollout/replica.py`): - New optional field to track why generation stopped - Values: `"completed"`, `"aborted"`, or other finish reasons **2. Implemented abort methods in `vLLMHttpServerBase`** (`verl/workers/rollout/vllm_rollout/vllm_async_server.py`): - `abort_all_requests()`: Aborts all active requests by: - Fetching all request states from the output processor - Creating abort outputs and putting them into request queues - Calling abort on both output processor and engine core - `abort_request(request_id)`: Aborts a specific request using similar logic **3. Implemented abort methods in `vLLMReplica`**: - Distributes abort calls across all server instances - Aggregates results from multiple servers **4. Added stop reason mapping**: - Maps vLLM's `finish_reason` to verl's `stop_reason`: - `"abort"` → `"aborted"` - `"stop"` or `"length"` → `"completed"` - Other reasons pass through as-is **5. Added comprehensive test** (`tests/workers/rollout/rollout_vllm/test_vllm_abort.py`): - Standalone script to validate abort functionality - Tests concurrent request abortion and partial output handling ### Checklist Before Submitting - [x] Read the [Contribute Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md). - [ ] Apply [pre-commit checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting): `pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always` - [ ] Add / Update [the documentation](https://github.com/volcengine/verl/tree/main/docs). - [x] Add unit or end-to-end test(s) to [the CI workflow](https://github.com/volcengine/verl/tree/main/.github/workflows) to cover all the code. If not feasible, explain why: Added standalone test script in `tests/workers/rollout/rollout_vllm/test_vllm_abort.py` - [ ] Once your PR is ready for CI, send a message in [the `ci-request` channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the `verl` Slack workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ). * [ci] chore: cleanup some ci workflow (#4459) ### What does this PR do? As title * [trainer] feat: allow override for reward_manager_worker in agent loop (#4423) ### What does this PR do? Allow subclass to set reward_manager_worker ### Checklist Before Starting - [ ] Search for similar PRs. Paste at least one query link here: ... - [ ] Format the PR title as `[{modules}] {type}: {description}` (This will be checked by the CI) - `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`, `trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`, `ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`, `env`, `tool`, `ckpt`, `doc`, `data` - If this PR involves multiple modules, separate them with `,` like `[megatron, fsdp, doc]` - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test` - If this PR breaks any API (CLI arguments, config, function signature, etc.), add `[BREAKING]` to the beginning of the title. - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching` ### Test > For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluation results, etc. ### API and Usage Example > Demonstrate how the API changes if any, and provide usage example(s) if possible. ```python # Add code snippet or script demonstrating how to use this ``` ### Design & Code Changes > Demonstrate the high-level design if this PR is complex, and list the specific changes. ### Checklist Before Submitting > [!IMPORTANT] > Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review. - [ ] Read the [Contribute Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md). - [ ] Apply [pre-commit checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting): `pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always` - [ ] Add / Update [the documentation](https://github.com/volcengine/verl/tree/main/docs). - [ ] Add unit or end-to-end test(s) to [the CI workflow](https://github.com/volcengine/verl/tree/main/.github/workflows) to cover all the code. If not feasible, explain why: ... - [ ] Once your PR is ready for CI, send a message in [the `ci-request` channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the `verl` Slack workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ). (If not accessible, please try [the Feishu group (飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).) Co-authored-by: Ryan Li <rynli@amazon.com> Co-authored-by: Yuyang Ding <61647442+yyDing1@users.noreply.github.com> * [model] feat: enhances TrainingWorker (#4461) ### What does this PR do? - Support tensordict make iterator with nested tensor - improve TrainingWorker by setting default engineering args in the worker init - Add a unit test of TrainingWorker - fix #4004 ### Checklist Before Starting - [ ] Search for similar PRs. Paste at least one query link here: ... - [ ] Format the PR title as `[{modules}] {type}: {description}` (This will be checked by the CI) - `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`, `trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`, `ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`, `env`, `tool`, `ckpt`, `doc`, `data` - If this PR involves multiple modules, separate them with `,` like `[megatron, fsdp, doc]` - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test` - If this PR breaks any API (CLI arguments, config, function signature, etc.), add `[BREAKING]` to the beginning of the title. - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching` ### Test > For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluation results, etc. ### API and Usage Example > Demonstrate how the API changes if any, and provide usage example(s) if possible. ```python # Add code snippet or script demonstrating how to use this ``` ### Design & Code Changes > Demonstrate the high-level design if this PR is complex, and list the specific changes. ### Checklist Before Submitting > [!IMPORTANT] > Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review. - [ ] Read the [Contribute Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md). - [ ] Apply [pre-commit checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting): `pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always` - [ ] Add / Update [the documentation](https://github.com/volcengine/verl/tree/main/docs). - [ ] Add unit or end-to-end test(s) to [the CI workflow](https://github.com/volcengine/verl/tree/main/.github/workflows) to cover all the code. If not feasible, explain why: ... - [ ] Once your PR is ready for CI, send a message in [the `ci-request` channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the `verl` Slack workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ). (If not accessible, please try [the Feishu group (飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).) * [recipe] feat: Modify the way of obtaining default_runtime_env (#4468) * [rollout] fix: mlflow consecutive slashes (#4446) ### What does this PR do? MLFlow would not work with metrics that have // in its item name, it will yield error like so: ``` File "/usr/local/lib/python3.12/dist-packages/mlflow/tracking/client.py", line 2511, in log_batch return self._tracking_client.log_batch( ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.12/dist-packages/mlflow/telemetry/track.py", line 30, in wrapper result = func(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.12/dist-packages/mlflow/tracking/_tracking_service/client.py", line 581, in log_batch self.store.log_batch(run_id=run_id, metrics=metrics_batch, params=[], tags=[]) File "/usr/local/lib/python3.12/dist-packages/mlflow/store/tracking/rest_store.py", line 906, in log_batch self._call_endpoint(LogBatch, req_body) File "/usr/local/lib/python3.12/dist-packages/mlflow/store/tracking/rest_store.py", line 208, in _call_endpoint return call_endpoint( ^^^^^^^^^^^^^^ File "/usr/local/lib/python3.12/dist-packages/mlflow/utils/rest_utils.py", line 596, in call_endpoint response = verify_rest_response(response, endpoint) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.12/dist-packages/mlflow/utils/rest_utils.py", line 315, in verify_rest_response raise RestException(json.loads(response.text)) mlflow.exceptions.RestException: INVALID_PARAMETER_VALUE: Invalid value "val-aux//reward/mean_at_1" for parameter 'metrics[0].name' supplied: Names may be treated as files in certain cases, and must not resolve to other names when treated as such. This name would resolve to 'val-aux/reward/mean_at_1' ``` ### Test Added testing for this behavior into `TestMlflowLoggingAdapter`. ### Design & Code Changes Used regular expression to parse and substituted multiple slashes pattern * [fsdp] fix: reward model also reads override config attn_implementation (#4458) ### What does this PR do? > Add **concise** overview of what this PR aims to achieve or accomplish. Reference related GitHub issues and PRs that help with the review. ### Checklist Before Starting - [x] Search for similar PRs. Paste at least one query link here: ... - https://github.com/volcengine/verl/pull/3978 missing the reward one ### Test only need to test in CI ### Checklist Before Submitting > [!IMPORTANT] > Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review. - [x] Read the [Contribute Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md). - [x] Apply [pre-commit checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting): `pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always` - [x] Add / Update [the documentation](https://github.com/volcengine/verl/tree/main/docs). - [x] Add unit or end-to-end test(s) to [the CI workflow](https://github.com/volcengine/verl/tree/main/.github/workflows) to cover all the code. If not feasible, explain why: ... - [x] Once your PR is ready for CI, send a message in [the `ci-request` channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the `verl` Slack workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ). (If not accessible, please try [the Feishu group (飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_…

xueh-nv added 19 commits November 19, 2025 01:59

update for async vllm

1d2dca2

blockwise fp8 rollout

e02c310

add doc

9c843bf

some fix

972f287

udpate vllm_rollout_spmd for async server

360b0f5

udpate vllm quant

00b6596

update doc

bece9e5

update comments

f563493

update dtype

4d4f7b6

update scripts for fp8 rollout

96a2832

add ci for fp8 rollout

c4757ef

update format

00bbce4

modify flag for fp8 rollout

aa8a4f3

sglang fp8 rollout

311c082

update

d772aef

small update

32d0c66

update doc

bf2ed72

add ci

65d4f87

merge main

a9b9d90

Agoniii requested review from SwordFaith, chenhaiq, eric-haibin-lin and zhaochenyang20 as code owners December 4, 2025 10:56

gemini-code-assist bot reviewed Dec 4, 2025

View reviewed changes

ISEEKYAN approved these changes Dec 5, 2025

View reviewed changes

wuxibin89 mentioned this pull request Dec 8, 2025

[rollout, vllm] feat: support KV cache FP8 #4435

Open

7 tasks

merge main and fix conflict

037986e

wuxibin89 merged commit 95a94e3 into verl-project:main Dec 8, 2025
79 of 81 checks passed

wuxibin89 mentioned this pull request Dec 9, 2025

[recipe,sglang] feat: add Truncated importance sampling + sglang recipe #4462

Open

7 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[rollout, sglang] feat: support blockwise fp8 rollout#4415

[rollout, sglang] feat: support blockwise fp8 rollout#4415
wuxibin89 merged 20 commits intoverl-project:mainfrom
Agoniii:xueh/fp8_rollout_sglang_pr

Agoniii commented Dec 4, 2025 •

edited

Loading

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Dec 4, 2025

Uh oh!

gemini-code-assist bot Dec 4, 2025

Uh oh!

gemini-code-assist bot Dec 4, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

Agoniii commented Dec 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Experiments and Outcomes

Checklist Before Starting

Test

API and Usage Example

Design & Code Changes

Checklist Before Submitting

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Dec 4, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Dec 4, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Dec 4, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Agoniii commented Dec 4, 2025 •

edited

Loading