[testing][rollout] feat: support integration of vllm>=0.7.0 (spmd-version) by ZSL98 · Pull Request #209 · verl-project/verl

ZSL98 · 2025-02-05T17:09:56Z

This PR aims to integrate vllm>=0.7.0 and preserve:
Backward compatibility: 0.3.1, 0.4.2, 0.5.4, 0.6.3 are still supported
Forward compatibility: Future versions of vllm (>= 0.7.0) will be supported without requiring manual maintenance for each new release.

The readme of this Beta version is located at docs/README_vllm0.7.md, where users can find the installation method and related features. This readme is copied as below.

Readme for verl(vllm>=0.7) version

Installation

Note: This version of veRL supports FSDP for training and vLLM for rollout. (Megatron-LM is not supported yet.)

# Create the conda environment
conda create -n verl python==3.10
conda activate verl

# Install verl
git clone https://github.com/volcengine/verl.git
cd verl
pip3 install -e .
# Install vLLM>=0.7
pip3 install vllm==0.7.0
# Install flash-attn
pip3 install flash-attn --no-build-isolation

For existing stable vllm versions (<=0.7.2), you also need to make some tiny patches manually on vllm (/path/to/site-packages/vllm after installation) after the above steps:

vllm/distributed/parallel_state.py: Remove the assertion below:

if (world_size
        != tensor_model_parallel_size * pipeline_model_parallel_size):
    raise RuntimeError(
        f"world_size ({world_size}) is not equal to "
        f"tensor_model_parallel_size ({tensor_model_parallel_size}) x "
        f"pipeline_model_parallel_size ({pipeline_model_parallel_size})")

vllm/executor/uniproc_executor.py: change local_rank = rank to local_rank = int(os.environ["LOCAL_RANK"])
vllm/model_executor/model_loader/weight_utils.py: remove the torch.cuda.empty_cache() in pt_weights_iterator

These modifications have already been merged into the main branch of vLLM. To avoid modifying these files manually, you can directly build vLLM from source.

Features

Use cuda graph

After installation, examples using FSDP as training backends can be used. By default, the enforce_eager is set to True, which disables the cuda graph. To enjoy cuda graphs and the sleep mode of vLLM>=0.7, add the following lines to the bash script:

actor_rollout_ref.rollout.enforce_eager=False \
actor_rollout_ref.rollout.free_cache_engine=False \

For a typical job like examples/ppo_trainer/run_qwen2-7b_seq_balance.sh, the rollout generation time is 115 seconds with vLLM0.6.3, while it is 85 seconds with vLLM0.7.0. By enabling the cudagraph, the generation duration is further reduced to 62 seconds.

Note: Currently, if the n is greater than 1 in SamplingParams in vLLM>=0.7, there is a potential performance issue on the stability of rollout generation time (Some iterations would see generation time bursts). We are working with the vLLM team to check this issue.

Other features in vLLM

num_scheduler_step>1: not supported yet (weight loading has not been aligned with MultiStepModelRunner)
Prefix caching: not supported yet (vLLM sleep mode does not support prefix caching)
Chunked prefill: supported

verl/third_party/vllm/vllm_spmd/verl_executor.py

ZSL98 · 2025-02-10T03:08:56Z

If you are loading pt weight files and using vllm's sleep mode, please comment out https://github.com/vllm-project/vllm/blob/main/vllm/model_executor/model_loader/weight_utils.py#L465 because of a potential pytorch issue: pytorch/pytorch#145168

YangWang92 · 2025-02-12T14:50:08Z

BTW, I got an error with current SPMD vLLMRollout on multi-node training, I am still trying to figure what happened.

(main_task pid=3370254) RuntimeError: Sizes of tensors must match except in dimension 1. Expected size 1280 but got size 256 for tensor number 1 in the list.

ZSL98 · 2025-02-13T01:35:40Z

BTW, I got an error with current SPMD vLLMRollout on multi-node training, I am still trying to figure what happened.
(main_task pid=3370254) RuntimeError: Sizes of tensors must match except in dimension 1. Expected size 1280 but got size 256 for tensor number 1 in the list.

@YangWang92 You may try the latest commit in this pr, to handle the SamplingParams n>1 case in vllm_rollout_spmd.py

response = []
for output in outputs:
    for sample_id in range(len(output.outputs)):
        response.append(output.outputs[sample_id].token_ids)

YangWang92 · 2025-02-13T02:18:05Z

BTW, I got an error with current SPMD vLLMRollout on multi-node training, I am still trying to figure what happened.
(main_task pid=3370254) RuntimeError: Sizes of tensors must match except in dimension 1. Expected size 1280 but got size 256 for tensor number 1 in the list.
@YangWang92 You may try the latest commit in this pr, to handle the SamplingParams n>1 case in vllm_rollout_spmd.py
response = []
for output in outputs:
    for sample_id in range(len(output.outputs)):
        response.append(output.outputs[sample_id].token_ids)

Thanks for your help, and let me try.

docs/README_vllm0.7.md

YangWang92 · 2025-02-13T08:02:59Z

BTW, I got an error with current SPMD vLLMRollout on multi-node training, I am still trying to figure what happened.
(main_task pid=3370254) RuntimeError: Sizes of tensors must match except in dimension 1. Expected size 1280 but got size 256 for tensor number 1 in the list.
@YangWang92 You may try the latest commit in this pr, to handle the SamplingParams n>1 case in vllm_rollout_spmd.py
response = []
for output in outputs:
    for sample_id in range(len(output.outputs)):
        response.append(output.outputs[sample_id].token_ids)

I confirmed that current code works well. Thanks!

…sion) (verl-project#209) This PR aims to integrate vllm>=0.7.0 and preserve: **Backward compatibility**: 0.3.1, 0.4.2, 0.5.4, 0.6.3 are still supported **Forward compatibility**: Future versions of vllm (>= 0.7.0) will be supported without requiring manual maintenance for each new release. The readme of this Beta version is located at docs/README_vllm0.7.md, where users can find the installation method and related features. This readme is copied as below. --- Note: This version of veRL supports **FSDP** for training and **vLLM** for rollout. (Megatron-LM is not supported yet.) ``` conda create -n verl python==3.10 conda activate verl git clone https://github.com/volcengine/verl.git cd verl pip3 install -e . pip3 install vllm==0.7.0 pip3 install flash-attn --no-build-isolation ``` For existing stable vllm versions (<=0.7.2), you also need to make some tiny patches manually on vllm (/path/to/site-packages/vllm after installation) after the above steps: - vllm/distributed/parallel_state.py: Remove the assertion below: ``` if (world_size != tensor_model_parallel_size * pipeline_model_parallel_size): raise RuntimeError( f"world_size ({world_size}) is not equal to " f"tensor_model_parallel_size ({tensor_model_parallel_size}) x " f"pipeline_model_parallel_size ({pipeline_model_parallel_size})") ``` - vllm/executor/uniproc_executor.py: change `local_rank = rank` to `local_rank = int(os.environ["LOCAL_RANK"])` - vllm/model_executor/model_loader/weight_utils.py: remove the `torch.cuda.empty_cache()` in `pt_weights_iterator` These modifications have already been merged into the main branch of vLLM. To avoid modifying these files manually, you can directly build vLLM from source. After installation, examples using FSDP as training backends can be used. By default, the `enforce_eager` is set to True, which disables the cuda graph. To enjoy cuda graphs and the sleep mode of vLLM>=0.7, add the following lines to the bash script: ``` actor_rollout_ref.rollout.enforce_eager=False \ actor_rollout_ref.rollout.free_cache_engine=False \ ``` For a typical job like examples/ppo_trainer/run_qwen2-7b_seq_balance.sh, the rollout generation time is 115 seconds with vLLM0.6.3, while it is 85 seconds with vLLM0.7.0. By enabling the cudagraph, the generation duration is further reduced to 62 seconds. **Note:** Currently, if the `n` is greater than 1 in `SamplingParams` in vLLM>=0.7, there is a potential performance issue on the stability of rollout generation time (Some iterations would see generation time bursts). We are working with the vLLM team to check this issue. 1. **num_scheduler_step>1:** not supported yet (weight loading has not been aligned with `MultiStepModelRunner`) 2. **Prefix caching:** not supported yet (vLLM sleep mode does not support prefix caching) 3. **Chunked prefill:** supported --------- Co-authored-by: zhangshulai <zhangshulai@bytedance.com>

…sion) (verl-project#209) This PR aims to integrate vllm>=0.7.0 and preserve: **Backward compatibility**: 0.3.1, 0.4.2, 0.5.4, 0.6.3 are still supported **Forward compatibility**: Future versions of vllm (>= 0.7.0) will be supported without requiring manual maintenance for each new release. The readme of this Beta version is located at docs/README_vllm0.7.md, where users can find the installation method and related features. This readme is copied as below. --- # Readme for verl(vllm>=0.7) version ## Installation Note: This version of veRL supports **FSDP** for training and **vLLM** for rollout. (Megatron-LM is not supported yet.) ``` # Create the conda environment conda create -n verl python==3.10 conda activate verl # Install verl git clone https://github.com/volcengine/verl.git cd verl pip3 install -e . # Install vLLM>=0.7 pip3 install vllm==0.7.0 # Install flash-attn pip3 install flash-attn --no-build-isolation ``` For existing stable vllm versions (<=0.7.2), you also need to make some tiny patches manually on vllm (/path/to/site-packages/vllm after installation) after the above steps: - vllm/distributed/parallel_state.py: Remove the assertion below: ``` if (world_size != tensor_model_parallel_size * pipeline_model_parallel_size): raise RuntimeError( f"world_size ({world_size}) is not equal to " f"tensor_model_parallel_size ({tensor_model_parallel_size}) x " f"pipeline_model_parallel_size ({pipeline_model_parallel_size})") ``` - vllm/executor/uniproc_executor.py: change `local_rank = rank` to `local_rank = int(os.environ["LOCAL_RANK"])` - vllm/model_executor/model_loader/weight_utils.py: remove the `torch.cuda.empty_cache()` in `pt_weights_iterator` These modifications have already been merged into the main branch of vLLM. To avoid modifying these files manually, you can directly build vLLM from source. ## Features ### Use cuda graph After installation, examples using FSDP as training backends can be used. By default, the `enforce_eager` is set to True, which disables the cuda graph. To enjoy cuda graphs and the sleep mode of vLLM>=0.7, add the following lines to the bash script: ``` actor_rollout_ref.rollout.enforce_eager=False \ actor_rollout_ref.rollout.free_cache_engine=False \ ``` For a typical job like examples/ppo_trainer/run_qwen2-7b_seq_balance.sh, the rollout generation time is 115 seconds with vLLM0.6.3, while it is 85 seconds with vLLM0.7.0. By enabling the cudagraph, the generation duration is further reduced to 62 seconds. **Note:** Currently, if the `n` is greater than 1 in `SamplingParams` in vLLM>=0.7, there is a potential performance issue on the stability of rollout generation time (Some iterations would see generation time bursts). We are working with the vLLM team to check this issue. ### Other features in vLLM 1. **num_scheduler_step>1:** not supported yet (weight loading has not been aligned with `MultiStepModelRunner`) 2. **Prefix caching:** not supported yet (vLLM sleep mode does not support prefix caching) 3. **Chunked prefill:** supported --------- Co-authored-by: zhangshulai <zhangshulai@bytedance.com>

* Change: Have micro-extra config setup & friends * fix tests

ZSL98 and others added 18 commits January 17, 2025 16:24

[test] test for vllm-spmd

be4cd50

[test] test for sync weight in OpenRLHF style

d76c04d

[chore] Remove dependencies on vllm<=0.6.3

e20ba1b

[test] Add time profiling on vllm sync weight

ac4c91d

[test] Some formatting changes

6fb4999

Merge branch 'volcengine:main' into zsl/vllm-spmd

b64a473

Merge branch 'volcengine:main' into zsl/vllm-spmd

4fe511a

Merge branch 'volcengine:main' into zsl/vllm-spmd

c77bbec

Add a tiny version of run_qwen2-7b_seq_balance.sh

234a52d

init some files

bc689b6

Merge remote-tracking branch 'upstream/main' into zsl/vllm-spmd

6f55342

update

5a2d526

update

c0a5099

update

4a6d686

support fsdp

6c78554

support vllm>=0.7.0 and fsdp

ef47177

Merge remote-tracking branch 'origin/main' into latest

e8a7487

Merge branch 'volcengine:main' into latest

d71b5a2

ZSL98 mentioned this pull request Feb 5, 2025

[test] Add tests for SPMD vLLM #116

Closed

Merge branch 'volcengine:main' into latest

18ed87f

PeterSH6 reviewed Feb 6, 2025

View reviewed changes

verl/third_party/vllm/vllm_spmd/verl_executor.py Outdated Show resolved Hide resolved

ZSL98 added 2 commits February 6, 2025 15:53

remove redundant files

a27ee29

update

e936114

ZSL98 marked this pull request as ready for review February 6, 2025 08:54

ZSL98 and others added 4 commits February 6, 2025 23:23

[test] update run_fsdp_vllm_spmd.py

ffa88ed

Merge branch 'volcengine:main' into latest

0fd82c6

fix

c34d67e

license

a6237a6

youkaichao mentioned this pull request Feb 10, 2025

[core] fix sleep mode and pytorch checkpoint compatibility vllm-project/vllm#13001

Merged

ZSL98 added 2 commits February 11, 2025 14:46

update

7ba97d8

update

6efc130

doc

bf33df3

PeterSH6 reviewed Feb 13, 2025

View reviewed changes

docs/README_vllm0.7.md Outdated Show resolved Hide resolved

docs/README_vllm0.7.md Outdated Show resolved Hide resolved

docs/README_vllm0.7.md Show resolved Hide resolved

docs/README_vllm0.7.md Show resolved Hide resolved

update doc

c99d258

ZSL98 and others added 2 commits February 14, 2025 07:48

update

de27933

Merge branch 'volcengine:main' into latest

eb8242d

ZSL98 changed the title ~~[WIP] Integrating vllm>=0.7.0~~ Integration of vllm>=0.7.0 Feb 14, 2025

ZSL98 and others added 6 commits February 14, 2025 11:19

quick fix

a85d88f

Merge branch 'latest' of https://github.com/ZSL98/verl into latest

43979c7

doc

fcd1474

Merge branch 'volcengine:main' into latest

fcc0451

reformat

fe1ad99

Merge branch 'latest' of https://github.com/ZSL98/verl into latest

1bf9e5a

PeterSH6 changed the title ~~Integration of vllm>=0.7.0~~ [testing][rollout]feat: support integration of vllm>=0.7.0 (spmd-version) Feb 14, 2025

PeterSH6 changed the title ~~[testing][rollout]feat: support integration of vllm>=0.7.0 (spmd-version)~~ [testing][rollout] feat: support integration of vllm>=0.7.0 (spmd-version) Feb 14, 2025

fix ci

8f2ffae

PeterSH6 merged commit f8b4d08 into verl-project:main Feb 14, 2025
12 checks passed

PeterSH6 mentioned this pull request Mar 12, 2025

[Roadmap] veRL Development Roadmap #22

Open

33 tasks

dreamyang-liu pushed a commit to dreamyang-liu/verl-sagemaker that referenced this pull request Feb 21, 2026

Change: Have micro-extra config setup & friends (verl-project#209)

ec54c72

* Change: Have micro-extra config setup & friends * fix tests

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[testing][rollout] feat: support integration of vllm>=0.7.0 (spmd-version)#209

[testing][rollout] feat: support integration of vllm>=0.7.0 (spmd-version)#209
PeterSH6 merged 41 commits intoverl-project:mainfrom
ZSL98:latest

ZSL98 commented Feb 5, 2025 •

edited

Loading

Uh oh!

Uh oh!

ZSL98 commented Feb 10, 2025

Uh oh!

YangWang92 commented Feb 12, 2025

Uh oh!

ZSL98 commented Feb 13, 2025

Uh oh!

YangWang92 commented Feb 13, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

YangWang92 commented Feb 13, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

ZSL98 commented Feb 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Readme for verl(vllm>=0.7) version

Installation

Features

Use cuda graph

Other features in vLLM

Uh oh!

Uh oh!

ZSL98 commented Feb 10, 2025

Uh oh!

YangWang92 commented Feb 12, 2025

Uh oh!

ZSL98 commented Feb 13, 2025

Uh oh!

YangWang92 commented Feb 13, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

YangWang92 commented Feb 13, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

ZSL98 commented Feb 5, 2025 •

edited

Loading