[DP] Fix dp padding logic in dummyrun by MengqingCao · Pull Request #4705 · vllm-project/vllm-ascend

MengqingCao · 2025-12-04T08:48:17Z

What this PR does / why we need it?

Fix dp padding logic in dummyrun. After vllm-project/vllm#28579, num_tokens will be padded in CudagraphDispatcher, thus we also need to do the pad in the dummy_run.

How was this patch tested?

Test locally with the following scripts

VLLM_USE_MODELSCOPE=true python3 -m vllm.entrypoints.openai.api_server \
         --model wemaster/deepseek_mtp_main_random_bf16 \
         --trust-remote-code \
         --data-parallel-size 4 \
         --tensor-parallel-size 1 \
         --compilation-config '{"cudagraph_capture_sizes":[96],"cudagraph_mode":"FULL_DECODE_ONLY"}' \
         --enable-expert-parallel

vllm bench serve --model wemaster/deepseek_mtp_main_random_bf16 --endpoint /v1/completions --dataset-name random --random-input 512 --random-output 100 --num-prompts 48 --request-rate 1 --ready-check-timeout-sec 0

vLLM version: v0.12.0
vLLM main: vllm-project/vllm@ad32e3e

gemini-code-assist

Code Review

This pull request correctly applies padding logic to dummy_run to align with changes in CudagraphDispatcher. The use of num_tokens_padded for tensor slicing and subsequent function calls is consistent. However, I've found a few issues: a critical bug where num_reqs_padded is used instead of num_tokens_padded when updating num_tokens_across_dp, a potential issue with MoE communication method selection using unpadded token counts, and a leftover debug print statement. Addressing these will ensure correctness and clean up the code.

github-actions · 2025-12-04T09:13:45Z

👋 Hi! Thank you for contributing to the vLLM Ascend project. The following points will speed up your PR merge:‌‌

A PR should do only one thing, smaller PRs enable faster reviews.
Every PR should include unit tests and end-to-end tests ‌to ensure it works and is not broken by other future PRs.
Write the commit message by fulfilling the PR description to help reviewer and future developers understand.

If CI fails, you can run linting and testing checks locally according Contributing and Testing.

whx-sjtu

This bugfix is really imperative for DP scenarios. Can you fix ut and merge this ASAP? @MengqingCao

Signed-off-by: MengqingCao <cmq0113@163.com>

MengqingCao · 2025-12-06T08:47:45Z

@GDzhu01 I've tested this pr on the deepseek-r1-w8a8 model with the latest code, could you help test again? Thx!

MengqingCao · 2025-12-06T08:49:49Z

This bugfix is really imperative for DP scenarios. Can you fix ut and merge this ASAP? @MengqingCao

Yes, I think an approve from @GDzhu01 is needed after his test

MengqingCao · 2025-12-08T12:31:32Z

After offline discussion with @GDzhu01 , I confirmed that there is no problem with this pr, let's merge this!

MengqingCao · 2025-12-08T12:32:15Z

All CI passed in https://github.com/vllm-project/vllm-ascend/actions/runs/20001512989/job/57357444834?pr=4705

### What this PR does / why we need it? Fix dp padding logic in dummyrun. After vllm-project/vllm#28579, `num_tokens` will be padded in `CudagraphDispatcher`, thus we also need to do the pad in the dummy_run. ### How was this patch tested? Test locally with the following scripts ```bash VLLM_USE_MODELSCOPE=true python3 -m vllm.entrypoints.openai.api_server \ --model wemaster/deepseek_mtp_main_random_bf16 \ --trust-remote-code \ --data-parallel-size 4 \ --tensor-parallel-size 1 \ --compilation-config '{"cudagraph_capture_sizes":[96],"cudagraph_mode":"FULL_DECODE_ONLY"}' \ --enable-expert-parallel ``` ```bash vllm bench serve --model wemaster/deepseek_mtp_main_random_bf16 --endpoint /v1/completions --dataset-name random --random-input 512 --random-output 100 --num-prompts 48 --request-rate 1 --ready-check-timeout-sec 0 ``` - vLLM version: v0.12.0 - vLLM main: vllm-project/vllm@ad32e3e Signed-off-by: MengqingCao <cmq0113@163.com> Signed-off-by: weijinqian_v1 <weijinqian@huawei.com>

### What this PR does / why we need it? Fix dp padding logic in dummyrun. After vllm-project/vllm#28579, `num_tokens` will be padded in `CudagraphDispatcher`, thus we also need to do the pad in the dummy_run. ### How was this patch tested? Test locally with the following scripts ```bash VLLM_USE_MODELSCOPE=true python3 -m vllm.entrypoints.openai.api_server \ --model wemaster/deepseek_mtp_main_random_bf16 \ --trust-remote-code \ --data-parallel-size 4 \ --tensor-parallel-size 1 \ --compilation-config '{"cudagraph_capture_sizes":[96],"cudagraph_mode":"FULL_DECODE_ONLY"}' \ --enable-expert-parallel ``` ```bash vllm bench serve --model wemaster/deepseek_mtp_main_random_bf16 --endpoint /v1/completions --dataset-name random --random-input 512 --random-output 100 --num-prompts 48 --request-rate 1 --ready-check-timeout-sec 0 ``` - vLLM version: v0.12.0 - vLLM main: vllm-project/vllm@ad32e3e Signed-off-by: MengqingCao <cmq0113@163.com>

gemini-code-assist bot reviewed Dec 4, 2025

View reviewed changes

Comment thread vllm_ascend/worker/model_runner_v1.py Outdated

Comment thread vllm_ascend/worker/model_runner_v1.py Outdated

Comment thread vllm_ascend/worker/model_runner_v1.py Outdated

wangxiyuan added ready read for review ready-for-test start test by label for PR labels Dec 4, 2025

wangxiyuan approved these changes Dec 4, 2025

View reviewed changes

Comment thread vllm_ascend/worker/model_runner_v1.py Outdated

MengqingCao force-pushed the fixdp branch from ea3701b to 3a1fc5a Compare December 5, 2025 01:24

whx-sjtu approved these changes Dec 6, 2025

View reviewed changes

[DP] Fix dp padding logic in dummyrun

0a982d5

Signed-off-by: MengqingCao <cmq0113@163.com>

MengqingCao force-pushed the fixdp branch from fa7293b to 0a982d5 Compare December 6, 2025 08:38

Merge branch 'main' into fixdp

386a4b8

Merge branch 'main' into fixdp

5100734

Merge branch 'main' into fixdp

bc9dda2

MengqingCao merged commit 58db21f into vllm-project:main Dec 8, 2025
15 checks passed

MengqingCao deleted the fixdp branch December 8, 2025 12:33

lidenghui1110 mentioned this pull request Dec 29, 2025

[Bugfix] Fix PP+PCP and PP+flashcomm1 bugs #5416

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[DP] Fix dp padding logic in dummyrun#4705

[DP] Fix dp padding logic in dummyrun#4705
MengqingCao merged 4 commits intovllm-project:mainfrom
MengqingCao:fixdp

MengqingCao commented Dec 4, 2025 •

edited by github-actions bot

Loading

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

github-actions bot commented Dec 4, 2025

Uh oh!

Uh oh!

whx-sjtu left a comment •

edited by MengqingCao

Loading

Uh oh!

MengqingCao commented Dec 6, 2025

Uh oh!

MengqingCao commented Dec 6, 2025

Uh oh!

MengqingCao commented Dec 8, 2025

Uh oh!

MengqingCao commented Dec 8, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

MengqingCao commented Dec 4, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What this PR does / why we need it?

How was this patch tested?

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

github-actions bot commented Dec 4, 2025

Uh oh!

Uh oh!

whx-sjtu left a comment • edited by MengqingCao Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

MengqingCao commented Dec 6, 2025

Uh oh!

MengqingCao commented Dec 6, 2025

Uh oh!

MengqingCao commented Dec 8, 2025

Uh oh!

MengqingCao commented Dec 8, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

MengqingCao commented Dec 4, 2025 •

edited by github-actions bot

Loading

whx-sjtu left a comment •

edited by MengqingCao

Loading