[Draft]optimize dp allreduce by Angazenn · Pull Request #2492 · vllm-project/vllm-ascend

Angazenn · 2025-08-22T08:38:41Z

What this PR does / why we need it?

Does this PR introduce any user-facing change?

How was this patch tested?

vLLM version: v0.10.1.1
vLLM main: vllm-project/vllm@712d0f8

Signed-off-by: Angazenn <supperccell@163.com>

gemini-code-assist

Code Review

This pull request optimizes the data parallel communication in _get_forward_metadata_across_dp by replacing a CPU-based all_reduce with an NPU-based all_gather. This is a good optimization that moves the collective operation to the accelerator. The new implementation also appears to correct the logic for determining enable_dbo across DP ranks, using an any operation which seems more appropriate. I have one suggestion to further improve the performance by minimizing data transfer between NPU and CPU.

gemini-code-assist · 2025-08-22T08:40:22Z

vllm_ascend/worker/model_runner_v1.py

+                                                dtype=torch.int32)
+        global_forward_metadata = get_dp_group().all_gather(
+            local_forward_metadata, dim=0)
+        maybe_padded_num_tokens = global_forward_metadata[:, 0].cpu().max()


For better performance, it's generally recommended to perform reduction operations on the device (NPU) and only transfer the scalar result to the CPU. This avoids synchronizing and copying a larger tensor. You can change this line to perform the max() operation on the NPU before moving the result to the CPU using .item().

Suggested change

maybe_padded_num_tokens = global_forward_metadata[:, 0].cpu().max()

maybe_padded_num_tokens = global_forward_metadata[:, 0].max().item()

github-actions · 2025-08-22T09:21:54Z

👋 Hi! Thank you for contributing to the vLLM Ascend project. The following points will speed up your PR merge:‌‌

A PR should do only one thing, smaller PRs enable faster reviews.
Every PR should include unit tests and end-to-end tests ‌to ensure it works and is not broken by other future PRs.
Write the commit message by fulfilling the PR description to help reviewer and future developers understand.

If CI fails, you can run linting and testing checks locally according Contributing and Testing.

Signed-off-by: Angazenn <supperccell@163.com>

codecov · 2025-08-22T18:10:56Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 78.49%. Comparing base (60ac4fb) to head (d9e687b).
⚠️ Report is 679 commits behind head on main.

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #2492      +/-   ##
==========================================
+ Coverage   77.70%   78.49%   +0.78%     
==========================================
  Files         132      132              
  Lines       17521    17806     +285     
==========================================
+ Hits        13615    13976     +361     
+ Misses       3906     3830      -76

Flag	Coverage Δ
unittests	`78.49% <ø> (+0.78%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

ApsarasX · 2025-08-23T09:37:24Z

Same to #1857 ?

cc @jianzs

Signed-off-by: Angazenn <supperccell@163.com>

optimize dp allreduce

15b115b

Signed-off-by: Angazenn <supperccell@163.com>

gemini-code-assist bot reviewed Aug 22, 2025

View reviewed changes

fix lint

03d7903

Signed-off-by: Angazenn <supperccell@163.com>

Angazenn force-pushed the dp branch from e50c44a to 03d7903 Compare August 22, 2025 17:56

comment

d9e687b

Signed-off-by: Angazenn <supperccell@163.com>

Angazenn closed this Aug 28, 2025

Angazenn deleted the dp branch September 8, 2025 03:16

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Draft]optimize dp allreduce#2492

[Draft]optimize dp allreduce#2492
Angazenn wants to merge 3 commits intovllm-project:mainfrom
Angazenn:dp

Angazenn commented Aug 22, 2025 •

edited by github-actions bot

Loading

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Aug 22, 2025

Uh oh!

Angazenn Aug 22, 2025

Uh oh!

github-actions bot commented Aug 22, 2025

Uh oh!

codecov bot commented Aug 22, 2025 •

edited

Loading

Uh oh!

ApsarasX commented Aug 23, 2025 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

	maybe_padded_num_tokens = global_forward_metadata[:, 0].cpu().max()
	maybe_padded_num_tokens = global_forward_metadata[:, 0].max().item()

Conversation

Angazenn commented Aug 22, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What this PR does / why we need it?

Does this PR introduce any user-facing change?

How was this patch tested?

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Aug 22, 2025

Choose a reason for hiding this comment

Uh oh!

Angazenn Aug 22, 2025

Choose a reason for hiding this comment

Uh oh!

github-actions bot commented Aug 22, 2025

Uh oh!

codecov bot commented Aug 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

ApsarasX commented Aug 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Angazenn commented Aug 22, 2025 •

edited by github-actions bot

Loading

codecov bot commented Aug 22, 2025 •

edited

Loading

ApsarasX commented Aug 23, 2025 •

edited

Loading