Skip to content

[Feature] Optimize forward metadata collection across dp ranks#1593

Closed
jianzs wants to merge 10 commits intomainfrom
feat/dp-comm-opt
Closed

[Feature] Optimize forward metadata collection across dp ranks#1593
jianzs wants to merge 10 commits intomainfrom
feat/dp-comm-opt

Conversation

@jianzs
Copy link
Copy Markdown
Collaborator

@jianzs jianzs commented Jul 2, 2025

What this PR does / why we need it?

This PR introduces two optimizations for cases where data parallel size > 1:

  1. Eliminates DP communication in set_forward_context
  2. Implements HCCL for DP metadata communication, resulting in significant performance improvements for large DP configurations
    • Achieves ~20ms latency reduction with DP size of 64

Does this PR introduce any user-facing change?

no

How was this patch tested?

CI passed.

@jianzs
Copy link
Copy Markdown
Collaborator Author

jianzs commented Jul 2, 2025

@NeverRaR PTAL

Comment thread vllm_ascend/worker/model_runner_v1.py
@NeverRaR
Copy link
Copy Markdown
Contributor

NeverRaR commented Jul 2, 2025

lgtm

@jianzs jianzs added ready read for review and removed ready read for review labels Jul 2, 2025
@jianzs jianzs requested a review from Copilot July 2, 2025 14:51

This comment was marked as outdated.

@jianzs jianzs requested a review from Copilot July 2, 2025 14:57
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR optimizes how forward-pass metadata is collected and communicated across data-parallel ranks by removing the previous all-reduce and introducing an HCCL-based all-gather approach.

  • Enforce that dummy batch execution only runs under data parallelism and refactor execute_dummy_batch to use per-rank metadata.
  • Replace dist.all_reduce with HCCL all_gather in _get_forward_metadata_across_dp and update callers to handle the Tensor of per-rank token counts.
  • Propagate num_tokens_across_dp through dummy runs and forward contexts, masking sentinel values before the pass.

Reviewed Changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.

File Description
vllm_ascend/worker/worker_v1.py Added assertion for dp_size > 1, refactored dummy-run logic to use HCCL per-rank metadata.
vllm_ascend/worker/model_runner_v1.py Swapped all_reduce for all_gather under get_dp_group(), changed method signature and updated callers to handle a Tensor of metadata.
Comments suppressed due to low confidence (1)

vllm_ascend/worker/model_runner_v1.py:622

  • Add unit or integration tests covering the dp_size > 1 aggregation path to verify that all_gather produces the correct combined metadata and that the masked_fill_ logic correctly replaces sentinel values.
            local_forward_metadata)

Comment thread vllm_ascend/worker/worker_v1.py Outdated
Comment thread vllm_ascend/worker/worker_v1.py
@jianzs jianzs requested a review from NeverRaR July 3, 2025 02:37
@jianzs jianzs force-pushed the feat/dp-comm-opt branch from 5d90031 to f1ddce2 Compare July 3, 2025 11:54
@codecov
Copy link
Copy Markdown

codecov Bot commented Jul 3, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 54.93%. Comparing base (c30ddb8) to head (d35cbf3).
⚠️ Report is 676 commits behind head on main.

Additional details and impacted files
@@             Coverage Diff             @@
##             main    #1593       +/-   ##
===========================================
+ Coverage   27.39%   54.93%   +27.53%     
===========================================
  Files          56       80       +24     
  Lines        6191     9712     +3521     
===========================================
+ Hits         1696     5335     +3639     
+ Misses       4495     4377      -118     
Flag Coverage Δ
unittests 54.93% <ø> (+27.53%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@jianzs jianzs added performance-test enable performance test for PR ready-for-test start test by label for PR labels Jul 3, 2025
@jianzs
Copy link
Copy Markdown
Collaborator Author

jianzs commented Jul 4, 2025

@Yikun @wangxiyuan @ApsarasX @ganyi1996ppo ready to merge.

Comment thread vllm_ascend/worker/model_runner_v1.py Outdated
@jianzs jianzs added the ready read for review label Jul 4, 2025
max_num_tokens)
runner._dummy_run(max_num_tokens,
else:
num_tokens = 1
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is it 1?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If graph mode is off, a dummy run only needs to be executed; computational requirements are not a factor.

@jianzs jianzs force-pushed the feat/dp-comm-opt branch 4 times, most recently from dbaad95 to 7726146 Compare July 8, 2025 06:51
@jianzs
Copy link
Copy Markdown
Collaborator Author

jianzs commented Jul 8, 2025

@Angazenn PTAL

@jianzs
Copy link
Copy Markdown
Collaborator Author

jianzs commented Jul 8, 2025

@wangxiyuan @ganyi1996ppo @Yikun ready to merge.

@wangxiyuan
Copy link
Copy Markdown
Collaborator

wangxiyuan commented Jul 9, 2025

torchair has made the code more and more complex and hard to maintain, I have a PR(#1661) to add torchair module, all torchair related code can be updated and changed there, I'll make the PR avaliable for review soon. Before that, I really don't want to merge anything about torchair change. Because it's very hard to review(I'm not sure if the change breaked anything else), Sorry.

@github-actions github-actions Bot added merge-conflicts and removed ready read for review labels Jul 9, 2025
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Jul 9, 2025

This pull request has conflicts, please resolve those before we can evaluate the pull request.

jianzs added 10 commits July 15, 2025 13:10
Signed-off-by: Jade Zheng <zheng.shoujian@outlook.com>
Signed-off-by: Jade Zheng <zheng.shoujian@outlook.com>
Signed-off-by: Jade Zheng <zheng.shoujian@outlook.com>
Signed-off-by: Jade Zheng <zheng.shoujian@outlook.com>
Signed-off-by: Jade Zheng <zheng.shoujian@outlook.com>
Signed-off-by: Jade Zheng <zheng.shoujian@outlook.com>
Signed-off-by: Jade Zheng <zheng.shoujian@outlook.com>
…rker

Signed-off-by: Jade Zheng <zheng.shoujian@outlook.com>
Signed-off-by: Jade Zheng <zheng.shoujian@outlook.com>
Co-authored-by: Angazenn <92204292+Angazenn@users.noreply.github.com>

Signed-off-by: Jade Zheng <zheng.shoujian@outlook.com>
@jianzs jianzs force-pushed the feat/dp-comm-opt branch from 7726146 to d35cbf3 Compare July 15, 2025 05:12
@Yikun Yikun closed this Jul 17, 2025
@Yikun Yikun deleted the feat/dp-comm-opt branch July 17, 2025 11:18
@Yikun
Copy link
Copy Markdown
Member

Yikun commented Jul 17, 2025

Wrong to submit the branch, pls feel free to open new PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

performance-test enable performance test for PR ready-for-test start test by label for PR

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants