Skip to content

adjusting the communication method in graph mode#1194

Merged
ganyi1996ppo merged 1 commit intovllm-project:mainfrom
sharonyunyun:main
Jun 25, 2025
Merged

adjusting the communication method in graph mode#1194
ganyi1996ppo merged 1 commit intovllm-project:mainfrom
sharonyunyun:main

Conversation

@sharonyunyun
Copy link
Copy Markdown
Contributor

@sharonyunyun sharonyunyun commented Jun 12, 2025

What this PR does / why we need it?

Communication performance optimization: replace allreduce with reduce_scatter+all_gather in MLA layer's TP group,to remove stridedsliced and all_gather in MOE layer.
when tp > 1, It is enabled during the decode phase of the graph mode when enable_multistream_moe、MLA, use_v1, and MC2 are used.
According to the end-to-end RL inference test results, this PR can bring 3% gain in the decode stage.

Before Improvement
Profiling kernel_details
image
Evaluation
image
image

After Improvement
Profiling kernel_details
image
Evaluation
image
image

Does this PR introduce any user-facing change?

Users need to configure enable_multistream_moe=True

How was this patch tested?

Add e2e test cases to cover code logic

@sharonyunyun sharonyunyun force-pushed the main branch 3 times, most recently from b1227b4 to 9294612 Compare June 12, 2025 13:28
@wangxiyuan
Copy link
Copy Markdown
Collaborator

I have cancelled the e2e test for quick fix deepseek problem. Please recheck the CI later. Sorry about this.

@depeng1994
Copy link
Copy Markdown
Contributor

depeng1994 commented Jun 13, 2025

Could you add performance test logs and benefit result details to the PR description?

@sharonyunyun sharonyunyun force-pushed the main branch 5 times, most recently from 983b820 to 6f640c7 Compare June 13, 2025 09:08
@sharonyunyun sharonyunyun changed the title Improving performance in graph mode by adjusting the communication me… adjusting the communication method in graph mode Jun 13, 2025
@sharonyunyun sharonyunyun force-pushed the main branch 2 times, most recently from 8e79e3d to 7c32c3f Compare June 17, 2025 03:59
@sharonyunyun sharonyunyun force-pushed the main branch 6 times, most recently from 235d1dc to 981a6df Compare June 17, 2025 09:17
@github-actions
Copy link
Copy Markdown
Contributor

This pull request has conflicts, please resolve those before we can evaluate the pull request.

@sharonyunyun sharonyunyun force-pushed the main branch 2 times, most recently from d635d22 to 16d9dce Compare June 17, 2025 12:35
@sharonyunyun sharonyunyun force-pushed the main branch 3 times, most recently from d6d17d2 to f3affb8 Compare June 18, 2025 13:18
@github-actions
Copy link
Copy Markdown
Contributor

This pull request has conflicts, please resolve those before we can evaluate the pull request.

@sharonyunyun sharonyunyun force-pushed the main branch 2 times, most recently from 3d09b8e to 0b315f8 Compare June 20, 2025 01:35
@Yikun Yikun added long-term-test enable long term test for PR ready-for-test start test by label for PR labels Jun 20, 2025
@sharonyunyun sharonyunyun force-pushed the main branch 4 times, most recently from 5e67447 to cc59238 Compare June 23, 2025 08:16
@codecov
Copy link
Copy Markdown

codecov bot commented Jun 23, 2025

Codecov Report

❌ Patch coverage is 10.81081% with 66 lines in your changes missing coverage. Please review.
✅ Project coverage is 27.04%. Comparing base (c30ddb8) to head (b289fbf).
⚠️ Report is 549 commits behind head on main.

Files with missing lines Patch % Lines
vllm_ascend/models/deepseek_v2.py 8.95% 61 Missing ⚠️
vllm_ascend/attention/mla_v1.py 20.00% 4 Missing ⚠️
vllm_ascend/models/deepseek_dbo.py 50.00% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1194      +/-   ##
==========================================
- Coverage   27.39%   27.04%   -0.36%     
==========================================
  Files          56       56              
  Lines        6191     6275      +84     
==========================================
+ Hits         1696     1697       +1     
- Misses       4495     4578      +83     
Flag Coverage Δ
unittests 27.04% <10.81%> (-0.36%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@github-actions
Copy link
Copy Markdown
Contributor

This pull request has conflicts, please resolve those before we can evaluate the pull request.

Signed-off-by: sharonyunyun <zhangying134@huawei.com>
@sdmyzlp sdmyzlp mentioned this pull request Jun 25, 2025
@ganyi1996ppo ganyi1996ppo merged commit 941269a into vllm-project:main Jun 25, 2025
24 checks passed
weijinqian0 pushed a commit to weijinqian0/vllm-ascend that referenced this pull request Jun 30, 2025
### What this PR does / why we need it?
Communication performance optimization: replace allreduce with
reduce_scatter+all_gather in MLA layer's TP group,to remove
stridedsliced and all_gather in MOE layer.
when tp > 1, It is enabled during the decode phase of the graph mode
when enable_multistream_moe、MLA, use_v1, and MC2 are used.
According to the end-to-end RL inference test results, this PR can bring
3% gain in the decode stage.

**Before Improvement**
Profiling kernel_details

![image](https://github.com/user-attachments/assets/1bb5dfa1-809b-410a-90c9-c5fd23cff003)
Evaluation

![image](https://github.com/user-attachments/assets/0b8ea0c7-88e7-410f-9ef4-f0cfe910cdc7)

![image](https://github.com/user-attachments/assets/94fde910-c125-4c2e-8de4-88fc3fafc057)

**After Improvement**
Profiling kernel_details

![image](https://github.com/user-attachments/assets/55fac0e0-11f2-4654-8fd4-287949e0b29e)
Evaluation

![image](https://github.com/user-attachments/assets/e923f74b-29c4-4171-9382-40a00cf05df0)

![image](https://github.com/user-attachments/assets/5dba7967-07ea-4926-a8be-804bfd34e3e4)

### Does this PR introduce _any_ user-facing change?
Users need to configure enable_multistream_moe=True

### How was this patch tested?
Add e2e test cases to cover code logic

Signed-off-by: sharonyunyun <zhangying134@huawei.com>
chopper0126 pushed a commit to chopper0126/vllm-ascend that referenced this pull request Oct 16, 2025
### What this PR does / why we need it?
Communication performance optimization: replace allreduce with
reduce_scatter+all_gather in MLA layer's TP group,to remove
stridedsliced and all_gather in MOE layer.
when tp > 1, It is enabled during the decode phase of the graph mode
when enable_multistream_moe、MLA, use_v1, and MC2 are used.
According to the end-to-end RL inference test results, this PR can bring
3% gain in the decode stage.

**Before Improvement**
Profiling kernel_details

![image](https://github.com/user-attachments/assets/1bb5dfa1-809b-410a-90c9-c5fd23cff003)
Evaluation

![image](https://github.com/user-attachments/assets/0b8ea0c7-88e7-410f-9ef4-f0cfe910cdc7)

![image](https://github.com/user-attachments/assets/94fde910-c125-4c2e-8de4-88fc3fafc057)

**After Improvement**
Profiling kernel_details

![image](https://github.com/user-attachments/assets/55fac0e0-11f2-4654-8fd4-287949e0b29e)
Evaluation

![image](https://github.com/user-attachments/assets/e923f74b-29c4-4171-9382-40a00cf05df0)

![image](https://github.com/user-attachments/assets/5dba7967-07ea-4926-a8be-804bfd34e3e4)

### Does this PR introduce _any_ user-facing change?
Users need to configure enable_multistream_moe=True

### How was this patch tested?
Add e2e test cases to cover code logic

Signed-off-by: sharonyunyun <zhangying134@huawei.com>
Angazenn pushed a commit to Angazenn/vllm-ascend that referenced this pull request Oct 21, 2025
### What this PR does / why we need it?
Communication performance optimization: replace allreduce with
reduce_scatter+all_gather in MLA layer's TP group,to remove
stridedsliced and all_gather in MOE layer.
when tp > 1, It is enabled during the decode phase of the graph mode
when enable_multistream_moe、MLA, use_v1, and MC2 are used.
According to the end-to-end RL inference test results, this PR can bring
3% gain in the decode stage.

**Before Improvement**
Profiling kernel_details

![image](https://github.com/user-attachments/assets/1bb5dfa1-809b-410a-90c9-c5fd23cff003)
Evaluation

![image](https://github.com/user-attachments/assets/0b8ea0c7-88e7-410f-9ef4-f0cfe910cdc7)

![image](https://github.com/user-attachments/assets/94fde910-c125-4c2e-8de4-88fc3fafc057)

**After Improvement**
Profiling kernel_details

![image](https://github.com/user-attachments/assets/55fac0e0-11f2-4654-8fd4-287949e0b29e)
Evaluation

![image](https://github.com/user-attachments/assets/e923f74b-29c4-4171-9382-40a00cf05df0)

![image](https://github.com/user-attachments/assets/5dba7967-07ea-4926-a8be-804bfd34e3e4)

### Does this PR introduce _any_ user-facing change?
Users need to configure enable_multistream_moe=True

### How was this patch tested?
Add e2e test cases to cover code logic

Signed-off-by: sharonyunyun <zhangying134@huawei.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

long-term-test enable long term test for PR module:ops module:tests ready-for-test start test by label for PR

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants