[Core] Avoid unnecessary coordination for non-MoE data parallel #24828

ZJY0516 · 2025-09-14T10:18:45Z

Purpose

FIX #24461
Avoid unnecessary coordination for non-MoE data parallel

Test Plan

vllm serve /data/datasets/models-hf/Qwen3-4B-Instruct-2507-FP8/ --served-model-name qwen -dp 2 --max-model-len 10240

Test Result

Essential Elements of an Effective PR Description Checklist

The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
The test plan, such as providing test command.
The test results, such as pasting the results comparison before and after, or e2e results
(Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
(Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

Signed-off-by: zjy0516 <[email protected]>

ZJY0516 · 2025-09-14T10:19:15Z

CC @njhill

gemini-code-assist

Code Review

This pull request introduces an optimization to avoid unnecessary dummy batch execution for non-MoE models in a data-parallel setup, which should reduce overhead. The changes are well-structured, adding a skip_dummy_batch path that is conditionally used based on whether expert parallelism is enabled. The implementation correctly preserves the necessary synchronization for data parallelism while skipping the actual model execution. I've added one comment regarding device placement to prevent potential future bugs. The changes look good overall.

vllm/v1/worker/gpu_model_runner.py

Signed-off-by: zjy0516 <[email protected]>

ZJY0516 · 2025-09-14T10:45:01Z

@njhill Should I also consider the offline inference scenario?

njhill · 2025-09-15T18:30:04Z

Thanks for this @ZJY0516! I don't think it looks quite like what we had in mind though. We want to avoid doing the additional collectives altogether and ideally avoid initializing the associated torch distributed process groups.

We should also be able to avoid the messaging done for coordination of request waves, etc.

ZJY0516 · 2025-09-16T01:20:31Z

@njhill But if self.get_dp_padding is not executed, non-moe model on dp > 1 will be blocked forever. Do you have any suggstion?

njhill · 2025-09-17T01:06:09Z

@njhill But if self.get_dp_padding is not executed, non-moe model on dp > 1 will be blocked forever. Do you have any suggstion?

I'm not sure that I follow, the different ranks should be completely independent in the non-MoE case...

ZJY0516 · 2025-09-22T11:07:25Z

@njhill But if self.get_dp_padding is not executed, non-moe model on dp > 1 will be blocked forever. Do you have any suggstion?

I'm not sure that I follow, the different ranks should be completely independent in the non-MoE case...

I think it's because there is a all_reduce operation in DPMetadata.num_tokens_across_dp which is called in self.get_dp_padding.

ZJY0516 · 2025-09-22T11:18:01Z

PR #24105 changed the all_reduce operation from CPU to GPU, which introduced additional cudaMemcpy operations.

Now(free gpu utilization will increase to 100%)

Before that PR(free gpu utilization 0%)

@njhill How should we address this? Should we avoid the memory copy to reduce GPU utilization?

njhill · 2025-09-24T21:11:25Z

@ZJY0516 sorry, I was thinking of something different for this. Basically most aspects should work like a non-DP deployment - we don't need the DP process group, etc. The DPCoordinator doesn't need to synchronize request waves, etc.

ZJY0516 · 2025-09-25T02:27:11Z

@ZJY0516 sorry, I was thinking of something different for this. Basically most aspects should work like a non-DP deployment - we don't need the DP process group, etc. The DPCoordinator doesn't need to synchronize request waves, etc.

You mean for non-moe model we don't need the DP process group and synchronize request waves?

moe dp

30da381

Signed-off-by: zjy0516 <[email protected]>

ZJY0516 requested review from WoosukKwon, alexm-redhat, comaniac, njhill, robertgshaw2-redhat and ywang96 as code owners September 14, 2025 10:18

mergify bot added the v1 label Sep 14, 2025

gemini-code-assist bot reviewed Sep 14, 2025

View reviewed changes

vllm/v1/worker/gpu_model_runner.py Outdated Show resolved Hide resolved

fix

3e9760a

Signed-off-by: zjy0516 <[email protected]>

Merge branch 'main' into moe-dp

eccd202

Merge branch 'main' into moe-dp

607caf8

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[Core] Avoid unnecessary coordination for non-MoE data parallel #24828

[Core] Avoid unnecessary coordination for non-MoE data parallel #24828

ZJY0516 commented Sep 14, 2025 •

edited by github-actions bot

Loading

Uh oh!

ZJY0516 commented Sep 14, 2025

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

Uh oh!

ZJY0516 commented Sep 14, 2025

Uh oh!

njhill commented Sep 15, 2025

Uh oh!

ZJY0516 commented Sep 16, 2025

Uh oh!

njhill commented Sep 17, 2025

Uh oh!

ZJY0516 commented Sep 22, 2025

Uh oh!

ZJY0516 commented Sep 22, 2025 •

edited

Loading

Uh oh!

njhill commented Sep 24, 2025

Uh oh!

ZJY0516 commented Sep 25, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

[Core] Avoid unnecessary coordination for non-MoE data parallel #24828

Are you sure you want to change the base?

[Core] Avoid unnecessary coordination for non-MoE data parallel #24828

Conversation

ZJY0516 commented Sep 14, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Test Plan

Test Result

Uh oh!

ZJY0516 commented Sep 14, 2025

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

ZJY0516 commented Sep 14, 2025

Uh oh!

njhill commented Sep 15, 2025

Uh oh!

ZJY0516 commented Sep 16, 2025

Uh oh!

njhill commented Sep 17, 2025

Uh oh!

ZJY0516 commented Sep 22, 2025

Uh oh!

ZJY0516 commented Sep 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

njhill commented Sep 24, 2025

Uh oh!

ZJY0516 commented Sep 25, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

ZJY0516 commented Sep 14, 2025 •

edited by github-actions bot

Loading

ZJY0516 commented Sep 22, 2025 •

edited

Loading