[2/N][Feat] Add MC2 communication method for MoE layers #2469

yiz-liu · 2025-08-21T04:12:36Z

What this PR does / why we need it?

This method replaces the previous all-gather approach for small numbers of tokens.

The key changes include:

A new AscendFusedMoE layer that handles token splitting, local computation, and final aggregation via all-gather.
Logic in the model runner to dynamically select between the new MC2 method and the existing all-gather method based on the number of input tokens.
Sharding the MoE communication mask across tensor-parallel ranks.

Does this PR introduce any user-facing change?

None.

How was this patch tested?

Test case fixed.

vLLM version: v0.10.1.1
vLLM main: vllm-project/vllm@b00e69f

github-actions · 2025-08-21T04:12:43Z

👋 Hi! Thank you for contributing to the vLLM Ascend project. The following points will speed up your PR merge:‌‌

A PR should do only one thing, smaller PRs enable faster reviews.
Every PR should include unit tests and end-to-end tests ‌to ensure it works and is not broken by other future PRs.
Write the commit message by fulfilling the PR description to help reviewer and future developers understand.

If CI fails, you can run linting and testing checks locally according Contributing and Testing.

gemini-code-assist

Code Review

This pull request introduces a new MC2 communication method for MoE layers, designed to optimize performance for smaller token counts. The changes include a new AscendFusedMoE layer, dynamic selection of the communication method in the model runner, and sharding of the MoE communication mask. My review identified a critical issue in the AscendFusedMoE layer where the all_gather operation uses the input tensor instead of the computed output, effectively discarding the results of the MoE computation. This needs to be addressed for the feature to function correctly.

vllm_ascend/ops/common_fused_moe.py

vllm_ascend/worker/model_runner_v1.py

github-actions · 2025-08-21T23:35:06Z

This pull request has conflicts, please resolve those before we can evaluate the pull request.

vllm_ascend/distributed/parallel_state.py

vllm_ascend/utils.py

yiz-liu · 2025-08-25T08:30:08Z

Look into the memory problem with DummyCommImpl.

yiz-liu · 2025-08-25T08:40:53Z

Fix UT;
Delete NaiveMulticast related codes.

yiz-liu · 2025-08-25T09:46:38Z

Fix UT;

Delete NaiveMulticast related codes.

Fixed, please merge it at your earliest convenience, @wangxiyuan

codecov · 2025-08-26T05:17:33Z

Codecov Report

❌ Patch coverage is 32.00000% with 102 lines in your changes missing coverage. Please review.
✅ Project coverage is 77.81%. Comparing base (5d8ec28) to head (df99292).
⚠️ Report is 648 commits behind head on main.

Files with missing lines	Patch %	Lines
vllm_ascend/distributed/moe_comm_method.py	31.73%	71 Missing ⚠️
vllm_ascend/ops/common_fused_moe.py	25.00%	30 Missing ⚠️
vllm_ascend/ascend_forward_context.py	0.00%	1 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #2469      +/-   ##
==========================================
- Coverage   77.99%   77.81%   -0.18%     
==========================================
  Files         134      134              
  Lines       18498    18489       -9     
==========================================
- Hits        14427    14387      -40     
- Misses       4071     4102      +31

Flag	Coverage Δ
unittests	`77.81% <32.00%> (-0.18%)`	⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

github-actions · 2025-08-26T06:18:24Z

This pull request has conflicts, please resolve those before we can evaluate the pull request.

This method replaces the previous all-gather approach for small numbers of tokens. The key changes include: - A new `AscendFusedMoE` layer that handles token splitting, local computation, and final aggregation via all-gather. - Logic in the model runner to dynamically select between the new MC2 method and the existing all-gather method based on the number of input tokens. - Sharding the MoE communication mask across tensor-parallel ranks. Signed-off-by: Yizhou Liu <[email protected]>

This commit refactors the MoE communication method framework to improve modularity, clarity, and extensibility. Key changes include: - **Revised `MoECommMethod` Interface:** - Renamed `_pre_process` to `permute` and `_post_process` to `unpermute` for better clarity. - Introduced `prepare` and `finalize` methods to encapsulate logic that happens before and after the core MoE computation, such as tensor padding/splitting for MC2 and the final AllReduce. - **Simplified `AscendFusedMoE`:** - The `forward_impl` is significantly simplified by delegating pre- and post-processing logic (padding, splitting, reduction) to the specific `MoECommMethod` implementation. - `AscendFusedMoE` now instantiates all communication method objects at initialization and selects the appropriate one at runtime based on a string identifier. - **Centralized Expert Logic:** - Removed `unified_fused_experts` and introduced a new `fused_experts` function in `common_fused_moe.py`. - This new function utilizes the `permute`/`unpermute` methods from the `MoECommMethod` abstraction, decoupling the core expert logic from specific communication implementations. - **Configuration and Invocation:** - The communication method is now selected and passed around as a string (e.g., "mc2", "allgather") instead of a class type, simplifying the invocation in the model runner. These changes result in a cleaner separation of concerns, making the MoE implementation easier to understand, maintain, and extend with new communication strategies. Signed-off-by: Yizhou Liu <[email protected]>

…zations and add MC2 group integration I am proud to say MC2 is fully supported with ACL Graph now! Signed-off-by: Yizhou Liu <[email protected]>

The test now uses the `FusedMoEConfig` for configuration instead of a generic `PretrainedConfig`. It also calls the `permute` and `unpermute` methods on the communication implementation instance, rather than calling the `torch.ops` functions directly. Signed-off-by: Yizhou Liu <[email protected]>

Removes the `moe_comm_pre_process` and `moe_comm_post_process` custom operators and their associated registration logic. This simplifies the MoE communication implementation by integrating the pre-processing logic directly into the communication methods. Additionally, this change removes the unused `NaiveAll2AllManager` from the NPU communicator and refactors helper function usage for getting the MC2 communication name. Signed-off-by: Yizhou Liu <[email protected]>

…#2469) ### What this PR does / why we need it? This method replaces the previous all-gather approach for small numbers of tokens. The key changes include: - A new `AscendFusedMoE` layer that handles token splitting, local computation, and final aggregation via all-gather. - Logic in the model runner to dynamically select between the new MC2 method and the existing all-gather method based on the number of input tokens. - Sharding the MoE communication mask across tensor-parallel ranks. ### Does this PR introduce _any_ user-facing change? None. ### How was this patch tested? Test case fixed. - vLLM version: v0.10.1.1 - vLLM main: vllm-project/vllm@b00e69f --------- Signed-off-by: Yizhou Liu <[email protected]>

github-actions bot added module:ops module:core labels Aug 21, 2025

gemini-code-assist bot reviewed Aug 21, 2025

View reviewed changes

vllm_ascend/ops/common_fused_moe.py Outdated Show resolved Hide resolved

weijinqian0 reviewed Aug 21, 2025

View reviewed changes

vllm_ascend/worker/model_runner_v1.py Outdated Show resolved Hide resolved

yiz-liu force-pushed the feat-mc2 branch from 4d7c007 to 1a7846b Compare August 21, 2025 04:32

github-actions bot added module:tests merge-conflicts labels Aug 21, 2025

yiz-liu force-pushed the feat-mc2 branch from 9e16cc7 to 35f785f Compare August 22, 2025 02:22

github-actions bot removed the merge-conflicts label Aug 22, 2025

yiz-liu force-pushed the feat-mc2 branch from 35f785f to f453685 Compare August 23, 2025 05:26

yiz-liu mentioned this pull request Aug 23, 2025

[Fix] Add operations in _dummy_run to maintain synchronization with _process_reqs, resolving a service hang #2454

Merged

yiz-liu force-pushed the feat-mc2 branch from 9694820 to 0bf0495 Compare August 25, 2025 06:23

yiz-liu changed the title ~~[WIP][Feat] Add MC2 communication method for MoE layers~~ [Feat] Add MC2 communication method for MoE layers Aug 25, 2025

weijinqian0 reviewed Aug 25, 2025

View reviewed changes

vllm_ascend/distributed/parallel_state.py Outdated Show resolved Hide resolved

momo609 reviewed Aug 25, 2025

View reviewed changes

vllm_ascend/utils.py Show resolved Hide resolved

yiz-liu force-pushed the feat-mc2 branch from a2a0983 to 5b04c1f Compare August 26, 2025 01:18

Yikun mentioned this pull request Aug 25, 2025

[Release]: Release checklist for v0.10.1rc1 #2525

Closed

48 tasks

yiz-liu force-pushed the feat-mc2 branch 2 times, most recently from fb26435 to f8bf600 Compare August 26, 2025 04:48

github-actions bot added the merge-conflicts label Aug 26, 2025

yiz-liu changed the title ~~[Feat] Add MC2 communication method for MoE layers~~ [2/N][Feat] Add MC2 communication method for MoE layers Aug 26, 2025

yiz-liu mentioned this pull request Aug 26, 2025

[RFC]: Refactoring MoE Communication for ACL Graph Compatibility and Performance Optimization #2552

Closed

yiz-liu added 3 commits August 26, 2025 17:18

feat(moe): Enhance MoE communication methods with NPU-specific optimi…

4c4d990

…zations and add MC2 group integration I am proud to say MC2 is fully supported with ACL Graph now! Signed-off-by: Yizhou Liu <[email protected]>

yiz-liu force-pushed the feat-mc2 branch from f8bf600 to c6abfa4 Compare August 26, 2025 09:19

github-actions bot removed the merge-conflicts label Aug 26, 2025

yiz-liu force-pushed the feat-mc2 branch from c6abfa4 to df99292 Compare August 26, 2025 09:38

wangxiyuan approved these changes Aug 26, 2025

View reviewed changes

wangxiyuan merged commit a6bb502 into vllm-project:main Aug 26, 2025
20 of 23 checks passed

yiz-liu deleted the feat-mc2 branch August 27, 2025 01:26

zhangxinyuehfad mentioned this pull request Aug 27, 2025

[Bug]: vllm-ascend/Qwen3-30B-A3B-W8A8 + EP + TP start failed due to AclrtSynchronizeStreamWithTimeout(copy_stream), error code is 107027 #2567

Open

vllm-ascend-ci mentioned this pull request Aug 27, 2025

vLLM Ascend Model Support Priority #1608

Open

leo-pony mentioned this pull request Sep 15, 2025

[Bug]: Qwen3-30B-A3B-W8A8 on v0.10.0rc1 report AclrtSynchronizeStreamWithTimeout(copy_stream), error code is 107027 #2473

Open

Yikun mentioned this pull request Sep 20, 2025

[Bug]: Remove outofdate commits to improve perf test #3051

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[2/N][Feat] Add MC2 communication method for MoE layers #2469

[2/N][Feat] Add MC2 communication method for MoE layers #2469

Uh oh!

yiz-liu commented Aug 21, 2025 •

edited by github-actions bot

Loading

Uh oh!

github-actions bot commented Aug 21, 2025

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

Uh oh!

Uh oh!

github-actions bot commented Aug 21, 2025

Uh oh!

Uh oh!

Uh oh!

yiz-liu commented Aug 25, 2025

Uh oh!

yiz-liu commented Aug 25, 2025

Uh oh!

yiz-liu commented Aug 25, 2025

Uh oh!

codecov bot commented Aug 26, 2025 •

edited

Loading

Uh oh!

github-actions bot commented Aug 26, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

[2/N][Feat] Add MC2 communication method for MoE layers #2469

[2/N][Feat] Add MC2 communication method for MoE layers #2469

Uh oh!

Conversation

yiz-liu commented Aug 21, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What this PR does / why we need it?

Does this PR introduce any user-facing change?

How was this patch tested?

Uh oh!

github-actions bot commented Aug 21, 2025

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

github-actions bot commented Aug 21, 2025

Uh oh!

Uh oh!

Uh oh!

yiz-liu commented Aug 25, 2025

Uh oh!

yiz-liu commented Aug 25, 2025

Uh oh!

yiz-liu commented Aug 25, 2025

Uh oh!

codecov bot commented Aug 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

github-actions bot commented Aug 26, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

yiz-liu commented Aug 21, 2025 •

edited by github-actions bot

Loading

codecov bot commented Aug 26, 2025 •

edited

Loading