adjusting the communication method in graph mode by sharonyunyun · Pull Request #1194 · vllm-project/vllm-ascend

sharonyunyun · 2025-06-12T12:06:08Z

What this PR does / why we need it?

Communication performance optimization: replace allreduce with reduce_scatter+all_gather in MLA layer's TP group，to remove stridedsliced and all_gather in MOE layer.
when tp > 1, It is enabled during the decode phase of the graph mode when enable_multistream_moe、MLA, use_v1, and MC2 are used.
According to the end-to-end RL inference test results, this PR can bring 3% gain in the decode stage.

Before Improvement
Profiling kernel_details

Evaluation

After Improvement
Profiling kernel_details

Evaluation

Does this PR introduce any user-facing change?

Users need to configure enable_multistream_moe=True

How was this patch tested?

Add e2e test cases to cover code logic

wangxiyuan · 2025-06-12T13:54:40Z

I have cancelled the e2e test for quick fix deepseek problem. Please recheck the CI later. Sorry about this.

depeng1994 · 2025-06-13T03:31:22Z

Could you add performance test logs and benefit result details to the PR description？

github-actions · 2025-06-17T09:51:29Z

This pull request has conflicts, please resolve those before we can evaluate the pull request.

github-actions · 2025-06-19T17:03:55Z

This pull request has conflicts, please resolve those before we can evaluate the pull request.

codecov · 2025-06-23T08:40:21Z

Codecov Report

❌ Patch coverage is 10.81081% with 66 lines in your changes missing coverage. Please review.
✅ Project coverage is 27.04%. Comparing base (c30ddb8) to head (b289fbf).
⚠️ Report is 549 commits behind head on main.

Files with missing lines	Patch %	Lines
vllm_ascend/models/deepseek_v2.py	8.95%	61 Missing ⚠️
vllm_ascend/attention/mla_v1.py	20.00%	4 Missing ⚠️
vllm_ascend/models/deepseek_dbo.py	50.00%	1 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #1194      +/-   ##
==========================================
- Coverage   27.39%   27.04%   -0.36%     
==========================================
  Files          56       56              
  Lines        6191     6275      +84     
==========================================
+ Hits         1696     1697       +1     
- Misses       4495     4578      +83

Flag	Coverage Δ
unittests	`27.04% <10.81%> (-0.36%)`	⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

github-actions · 2025-06-23T14:05:54Z

This pull request has conflicts, please resolve those before we can evaluate the pull request.

Signed-off-by: sharonyunyun <zhangying134@huawei.com>

### What this PR does / why we need it? Communication performance optimization: replace allreduce with reduce_scatter+all_gather in MLA layer's TP group，to remove stridedsliced and all_gather in MOE layer. when tp > 1, It is enabled during the decode phase of the graph mode when enable_multistream_moe、MLA, use_v1, and MC2 are used. According to the end-to-end RL inference test results, this PR can bring 3% gain in the decode stage. **Before Improvement** Profiling kernel_details ![image](https://github.com/user-attachments/assets/1bb5dfa1-809b-410a-90c9-c5fd23cff003) Evaluation ![image](https://github.com/user-attachments/assets/0b8ea0c7-88e7-410f-9ef4-f0cfe910cdc7) ![image](https://github.com/user-attachments/assets/94fde910-c125-4c2e-8de4-88fc3fafc057) **After Improvement** Profiling kernel_details ![image](https://github.com/user-attachments/assets/55fac0e0-11f2-4654-8fd4-287949e0b29e) Evaluation ![image](https://github.com/user-attachments/assets/e923f74b-29c4-4171-9382-40a00cf05df0) ![image](https://github.com/user-attachments/assets/5dba7967-07ea-4926-a8be-804bfd34e3e4) ### Does this PR introduce _any_ user-facing change? Users need to configure enable_multistream_moe=True ### How was this patch tested? Add e2e test cases to cover code logic Signed-off-by: sharonyunyun <zhangying134@huawei.com>

sharonyunyun force-pushed the main branch 3 times, most recently from b1227b4 to 9294612 Compare June 12, 2025 13:28

sharonyunyun force-pushed the main branch from 9294612 to 682a093 Compare June 13, 2025 02:34

sharonyunyun force-pushed the main branch 5 times, most recently from 983b820 to 6f640c7 Compare June 13, 2025 09:08

sharonyunyun changed the title ~~Improving performance in graph mode by adjusting the communication me…~~ adjusting the communication method in graph mode Jun 13, 2025

sharonyunyun force-pushed the main branch 2 times, most recently from 8e79e3d to 7c32c3f Compare June 17, 2025 03:59

github-actions bot added the module:tests label Jun 17, 2025

sharonyunyun force-pushed the main branch 6 times, most recently from 235d1dc to 981a6df Compare June 17, 2025 09:17

github-actions bot added the merge-conflicts label Jun 17, 2025

sharonyunyun force-pushed the main branch 2 times, most recently from d635d22 to 16d9dce Compare June 17, 2025 12:35

github-actions bot added module:ops and removed merge-conflicts labels Jun 17, 2025

sharonyunyun force-pushed the main branch 3 times, most recently from d6d17d2 to f3affb8 Compare June 18, 2025 13:18

sharonyunyun force-pushed the main branch from 2acb134 to 428e606 Compare June 19, 2025 12:42

github-actions bot added the merge-conflicts label Jun 19, 2025

sharonyunyun force-pushed the main branch 2 times, most recently from 3d09b8e to 0b315f8 Compare June 20, 2025 01:35

github-actions bot removed the merge-conflicts label Jun 20, 2025

Yikun added long-term-test enable long term test for PR ready-for-test start test by label for PR labels Jun 20, 2025

sharonyunyun force-pushed the main branch 4 times, most recently from 5e67447 to cc59238 Compare June 23, 2025 08:16

sharonyunyun force-pushed the main branch from cc59238 to 949ecf6 Compare June 23, 2025 12:34

github-actions bot added the merge-conflicts label Jun 23, 2025

sharonyunyun force-pushed the main branch from 949ecf6 to 71fc6bd Compare June 24, 2025 01:45

github-actions bot removed the merge-conflicts label Jun 24, 2025

sharonyunyun force-pushed the main branch 2 times, most recently from 45d2385 to 0169aed Compare June 25, 2025 01:51

adjusting the communication method in graph mode

b289fbf

Signed-off-by: sharonyunyun <zhangying134@huawei.com>

sharonyunyun force-pushed the main branch from 0169aed to b289fbf Compare June 25, 2025 04:10

wangxiyuan approved these changes Jun 25, 2025

View reviewed changes

sdmyzlp mentioned this pull request Jun 25, 2025

Br fix multi stream moe #1417

Closed

ganyi1996ppo approved these changes Jun 25, 2025

View reviewed changes

ganyi1996ppo merged commit 941269a into vllm-project:main Jun 25, 2025
24 checks passed

Yikun mentioned this pull request Jun 29, 2025

[0.9.1][cherry-pick] Backport multistream MLA fixes and TP communication optimizations #1474

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

adjusting the communication method in graph mode#1194

adjusting the communication method in graph mode#1194
ganyi1996ppo merged 1 commit intovllm-project:mainfrom
sharonyunyun:main

sharonyunyun commented Jun 12, 2025 •

edited

Loading

Uh oh!

wangxiyuan commented Jun 12, 2025

Uh oh!

depeng1994 commented Jun 13, 2025 •

edited

Loading

Uh oh!

github-actions bot commented Jun 17, 2025

Uh oh!

github-actions bot commented Jun 19, 2025

Uh oh!

codecov bot commented Jun 23, 2025 •

edited

Loading

Uh oh!

github-actions bot commented Jun 23, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Conversation

sharonyunyun commented Jun 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What this PR does / why we need it?

Does this PR introduce any user-facing change?

How was this patch tested?

Uh oh!

wangxiyuan commented Jun 12, 2025

Uh oh!

depeng1994 commented Jun 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented Jun 17, 2025

Uh oh!

github-actions bot commented Jun 19, 2025

Uh oh!

codecov bot commented Jun 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

github-actions bot commented Jun 23, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

sharonyunyun commented Jun 12, 2025 •

edited

Loading

depeng1994 commented Jun 13, 2025 •

edited

Loading

codecov bot commented Jun 23, 2025 •

edited

Loading