【main】SP For Qwen3 MoE #2209

lbk-sys · 2025-08-05T04:37:15Z

What this PR does / why we need it?

Qwen3 MoE supports SP. In scenarios like AlltoAll, AlltoAllv, and MC2, replacing AllReduce with Reduce-Scatter and AllGather achieves computational benefits in norm operations while saving one AllGather communication. This feature is enabled during the P-phase and delivers notable gains in long-sequence scenarios (e.g., 16k–25k), with performance improvements reaching 5%–10%.

Does this PR introduce any user-facing change?

How was this patch tested?

compilation_config={
    "pass_config":{
        "enable_sequence_parallelism": True
    }
},
enable_expert_parallel=True,

vLLM version: v0.10.0
vLLM main: vllm-project/vllm@9edd1db

Signed-off-by: libaokui <[email protected]>

github-actions · 2025-08-05T05:08:12Z

👋 Hi! Thank you for contributing to the vLLM Ascend project. The following points will speed up your PR merge:‌‌

A PR should do only one thing, smaller PRs enable faster reviews.
Every PR should include unit tests and end-to-end tests ‌to ensure it works and is not broken by other future PRs.
Write the commit message by fulfilling the PR description to help reviewer and future developers understand.

If CI fails, you can run linting and testing checks locally according Contributing and Testing.

Signed-off-by: libaokui <[email protected]>

wangxiyuan · 2025-08-05T08:15:30Z

vllm_ascend/ops/sequence_parallel.py

+                              get_tp_group, tensor_model_parallel_all_gather,
+                              tensor_model_parallel_reduce_scatter)
+from vllm.forward_context import get_forward_context
+from vllm.platforms import current_platform


Do not use current_platform in vllm-ascend. import vllm_ascend.platform directlly

momo609 · 2025-08-05T08:26:05Z

vllm_ascend/ops/sequence_parallel.py

+        )
+        self.mc2_mask[:lengths_sum_unpadding] = True
+
+    def padding_aligned_reduce_scatter(self,


This functions are duplicated with the pad and unpad functions in flashcommv1,can we aggregating them?

Thank you for your suggestion. We chose not to adopt the flashcomm1 implementation from the 091 branch for two reasons:

The existing flashcomm1 implementations for Qwen2 and Qwen3 in the repository are inconsistent. We've created a model-level interface here to minimize migration efforts for sparse models (SP).

Currently, flashcomm1's graph mode support for sparse models like Qwen2 and Qwen3 isn't available in the main branch. Merging it would impact graph mode performance, so we're keeping it separate for now. Note that merging Qwen3 MoE's SP implementation won't affect the current status.

Signed-off-by: libaokui <[email protected]>

github-actions · 2025-08-05T10:03:13Z

This pull request has conflicts, please resolve those before we can evaluate the pull request.

Signed-off-by: libaokui <[email protected]>

github-actions · 2025-08-06T01:10:46Z

This pull request has conflicts, please resolve those before we can evaluate the pull request.

Signed-off-by: libaokui <[email protected]>

xueliangyang-oeuler · 2025-08-06T03:34:14Z

@lbk-sys can you list out your testing CLI, e.g, vllm serve, and your testing datasets? Thanks.

Signed-off-by: libaokui <[email protected]>

codecov · 2025-08-06T05:01:56Z

Codecov Report

❌ Patch coverage is 19.84733% with 105 lines in your changes missing coverage. Please review.
✅ Project coverage is 76.09%. Comparing base (e31b31f) to head (c76cd00).
⚠️ Report is 621 commits behind head on main.

Files with missing lines	Patch %	Lines
vllm_ascend/ops/sequence_parallel.py	17.91%	55 Missing ⚠️
vllm_ascend/models/qwen3_moe.py	14.00%	43 Missing ⚠️
vllm_ascend/ops/fused_moe.py	44.44%	5 Missing ⚠️
vllm_ascend/platform.py	33.33%	2 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #2209      +/-   ##
==========================================
- Coverage   76.65%   76.09%   -0.56%     
==========================================
  Files         113      114       +1     
  Lines       12763    13103     +340     
==========================================
+ Hits         9783     9971     +188     
- Misses       2980     3132     +152

Flag	Coverage Δ
unittests	`76.09% <19.84%> (-0.56%)`	⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Signed-off-by: libaokui <[email protected]>

lbk-sys · 2025-08-06T06:23:07Z

@lbk-sys can you list out your testing CLI, e.g, vllm serve, and your testing datasets? Thanks.

Thank you for your attention. For testing accuracy, we have run the AIME dataset. For performance testing, we have conducted benchmarks using vLLM and offline scripts. The input data for the model follows the format t*h, so as long as the number of input tokens in the P-phase meets this requirement, benefits can be achieved (regardless of the dataset).

lbk-sys · 2025-08-06T06:23:50Z

@lbk-sys can you list out your testing CLI, e.g, vllm serve, and your testing datasets? Thanks.

Thank you for your attention. For testing accuracy, we have run the AIME dataset. For performance testing, we have conducted benchmarks using vLLM and offline scripts. The input data for the model follows the format t*h, so as long as the number of input tokens in the P-phase meets this requirement, benefits can be achieved (regardless of the dataset).

and deepscaler dataset

MengqingCao · 2025-08-06T08:28:12Z

tests/e2e/multicard/test_offline_inference_distributed.py

+    with VllmRunner(
+            snapshot_download("Qwen/Qwen3-30B-A3B"),
+            dtype="auto",
+            tensor_parallel_size=4,


There is just 2 cards on CI machine, let's reduce tp size to 2

done , thanks

Signed-off-by: libaokui <[email protected]>

### What this PR does / why we need it? Qwen3 MoE supports SP. In scenarios like AlltoAll, AlltoAllv, and MC2, replacing AllReduce with Reduce-Scatter and AllGather achieves computational benefits in norm operations while saving one AllGather communication. This feature is enabled during the P-phase and delivers notable gains in long-sequence scenarios (e.g., 16k–25k), with performance improvements reaching 5%–10%. ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? ``` compilation_config={ "pass_config":{ "enable_sequence_parallelism": True } }, enable_expert_parallel=True, ``` - vLLM version: v0.10.0 - vLLM main: vllm-project/vllm@9edd1db --------- Signed-off-by: libaokui <[email protected]> Co-authored-by: libaokui <[email protected]>

sp for qwen3 moe

fda8497

Signed-off-by: libaokui <[email protected]>

github-actions bot added module:tests module:ops module:core labels Aug 5, 2025

libaokui-1 added 3 commits August 5, 2025 14:09

fix ci

ab9811e

Signed-off-by: libaokui <[email protected]>

remote constraint

c4ca8a6

Signed-off-by: libaokui <[email protected]>

add constraint for moe

6c76541

Signed-off-by: libaokui <[email protected]>

wangxiyuan reviewed Aug 5, 2025

View reviewed changes

momo609 reviewed Aug 5, 2025

View reviewed changes

libaokui-1 added 3 commits August 5, 2025 17:01

fix from commit

28cd0d0

Signed-off-by: libaokui <[email protected]>

fix from commit 2

bd2adb1

Signed-off-by: libaokui <[email protected]>

fix ci

b81162d

Signed-off-by: libaokui <[email protected]>

github-actions bot added the merge-conflicts label Aug 5, 2025

rebase

a9eca6f

Signed-off-by: libaokui <[email protected]>

github-actions bot removed the merge-conflicts label Aug 5, 2025

fix ci

eb7b094

Signed-off-by: libaokui <[email protected]>

github-actions bot added the merge-conflicts label Aug 6, 2025

libaokui-1 added 2 commits August 6, 2025 09:17

fix ci 2

adff144

Signed-off-by: libaokui <[email protected]>

fix merge question

931db85

Signed-off-by: libaokui <[email protected]>

github-actions bot removed the merge-conflicts label Aug 6, 2025

libaokui-1 added 2 commits August 6, 2025 09:58

fix ci 3

1fb2af9

Signed-off-by: libaokui <[email protected]>

fix merge 2

e573f47

Signed-off-by: libaokui <[email protected]>

libaokui-1 added 2 commits August 6, 2025 11:59

fix ci 4

38e9339

Signed-off-by: libaokui <[email protected]>

fix ci 5

b2b1beb

Signed-off-by: libaokui <[email protected]>

fix ci 6

d43ac17

Signed-off-by: libaokui <[email protected]>

MengqingCao reviewed Aug 6, 2025

View reviewed changes

fix ci 7

845fbb3

Signed-off-by: libaokui <[email protected]>

wangxiyuan mentioned this pull request Aug 6, 2025

[Release]: Release checklist for v0.10.0rc1 #2210

Closed

50 tasks

wangxiyuan approved these changes Aug 6, 2025

View reviewed changes

libaokui-1 added 3 commits August 6, 2025 20:06

fix ci 8

836576c

Signed-off-by: libaokui <[email protected]>

fix ci 9

f942d9d

Signed-off-by: libaokui <[email protected]>

fix ci 10

c76cd00

Signed-off-by: libaokui <[email protected]>

wangxiyuan merged commit c611291 into vllm-project:main Aug 7, 2025
25 checks passed

【main】SP For Qwen3 MoE #2209

【main】SP For Qwen3 MoE #2209

Uh oh!

Conversation

lbk-sys commented Aug 5, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What this PR does / why we need it?

Does this PR introduce any user-facing change?

How was this patch tested?

Uh oh!

github-actions bot commented Aug 5, 2025

Uh oh!

wangxiyuan Aug 5, 2025

Choose a reason for hiding this comment

Uh oh!

momo609 Aug 5, 2025

Choose a reason for hiding this comment

Uh oh!

lbk-sys Aug 5, 2025

Choose a reason for hiding this comment

Uh oh!

github-actions bot commented Aug 5, 2025

Uh oh!

github-actions bot commented Aug 6, 2025

Uh oh!

xueliangyang-oeuler commented Aug 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

codecov bot commented Aug 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

lbk-sys commented Aug 6, 2025

Uh oh!

lbk-sys commented Aug 6, 2025

Uh oh!

MengqingCao Aug 6, 2025

Choose a reason for hiding this comment

Uh oh!

lbk-sys Aug 6, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

lbk-sys commented Aug 5, 2025 •

edited by github-actions bot

Loading

xueliangyang-oeuler commented Aug 6, 2025 •

edited

Loading

codecov bot commented Aug 6, 2025 •

edited

Loading