[bugfix] Use FUSED_MC2 MoE comm path for the op `dispatch_ffn_combine` by kiscad · Pull Request #5156 · vllm-project/vllm-ascend

kiscad · 2025-12-18T07:04:04Z

What this PR does / why we need it?

Renames the MoE comm enum value MoECommType.FUSED_ALLTOALL to MoECommType.FUSED_MC2 and updates all call sites.
Updates select_moe_comm_method to optionally select FUSED_MC2 on Ascend A3 when:
- enable_expert_parallel=True
- quantization is w8a8_dynamic
- EP <= 16
- dynamic_eplb is disabled
- is_mtp_model = False
Replaces the old “fused all-to-all” comm implementation with FusedMC2CommImpl, using TokenDispatcherWithMC2 / PrepareAndFinalizeWithMC2 and dispatch_ffn_combine.

Does this PR introduce any user-facing change?

How was this patch tested?

vLLM version: v0.12.0
vLLM main: vllm-project/vllm@ad32e3e

gemini-code-assist

Code Review

This pull request refactors the Mixture-of-Experts (MoE) communication path by renaming FUSED_ALLTOALL to FUSED_MC2 and enabling it for the dispatch_ffn_combine op on Ascend A3 devices under specific quantization and parallelism settings. The changes significantly improve code clarity, especially in the select_moe_comm_method function, by simplifying the selection logic. The new FusedMC2CommImpl correctly utilizes the dispatch_ffn_combine custom op. My main feedback concerns a type hint mismatch in the fused_experts method, which could impact maintainability. Overall, this is a good improvement for enabling a more performant MoE path.

github-actions · 2025-12-18T07:12:40Z

👋 Hi! Thank you for contributing to the vLLM Ascend project. The following points will speed up your PR merge:‌‌

A PR should do only one thing, smaller PRs enable faster reviews.
Every PR should include unit tests and end-to-end tests ‌to ensure it works and is not broken by other future PRs.
Write the commit message by fulfilling the PR description to help reviewer and future developers understand.

If CI fails, you can run linting and testing checks locally according Contributing and Testing.

Signed-off-by: Chen Chen <0109chenchen@gmail.com>

yiz-liu · 2025-12-18T10:46:16Z

+        fused_mc2_enable = quant_type == "w8a8_dynamic" and get_ep_group(
+        ).world_size <= 16 and (not dynamic_eplb) and (not is_mtp_model)
+        if fused_mc2_enable:
+            moe_comm_type = MoECommType.FUSED_MC2
+        else:
+            moe_comm_type = (MoECommType.MC2 if num_tokens
+                             <= mc2_tokens_capacity else MoECommType.ALLTOALL)


We may need to be more careful here.

Signed-off-by: Chen Chen <0109chenchen@gmail.com>

wangxiyuan · 2025-12-18T15:31:19Z

-                         else MoECommType.FUSED_ALLTOALL
-                         if fused_all2all_enable else MoECommType.ALLTOALL)
+        fused_mc2_enable = envs_ascend.VLLM_ASCEND_ENABLE_FUSED_MC2 and quant_type == "w8a8_dynamic" and get_ep_group(
+        ).world_size <= 16 and (not dynamic_eplb) and (not is_mtp_model)


change is_mtp_model to check model type

…to eplb_refactor * 'main' of https://github.com/vllm-project/vllm-ascend: (52 commits) [Doc]Add the user_guide doc file regarding fine-grained TP. (vllm-project#5084) [pref] qwen3_next add triton ops : fused_sigmoid_gating_delta_rule_update (vllm-project#4818) [Feature] Add token mask for DispatchGmmCombineDecode operator (vllm-project#5171) [CI] Improve CI (vllm-project#5078) [Refactor] remove some metadata variables in attention_v1. (vllm-project#5160) Add Qwen3-VL-235B-A22B-Instruct tutorials (vllm-project#5167) [Doc] Add a perf tune section (vllm-project#5127) [Image] Refactor image build (vllm-project#5175) [refactor] refactor weight trans nz and transpose (vllm-project#4878) [BugFix]Fix precision issue for LoRA feature (vllm-project#4141) 【Doc】Deepseekv3.1/R1 doc enhancement (vllm-project#4827) support basic long_seq feature st (vllm-project#5140) [Bugfix] install trition for test_custom_op (vllm-project#5112) [2/N][Pangu][MoE] Remove Pangu Related Code (vllm-project#5130) [bugfix] Use FUSED_MC2 MoE comm path for the op `dispatch_ffn_combine` (vllm-project#5156) [BugFix] Fix top_p,top_k issue with EAGLE and add top_p,top_k in EAGLE e2e (vllm-project#5131) [Doc][P/D] Fix MooncakeConnector's name (vllm-project#5172) [Bugfix] Fix in_profile_run in mtp_proposer dummy_run (vllm-project#5165) [Doc] Refact benchmark doc (vllm-project#5173) [Nightly] Avoid max_model_len being smaller than the decoder prompt to prevent single-node-accuray-tests from failing (vllm-project#5174) ... Signed-off-by: 白永斌 <baiyongbin3@h-partners.com>

vllm-project#5156) ### What this PR does / why we need it? - Renames the MoE comm enum value `MoECommType.FUSED_ALLTOALL` to `MoECommType.FUSED_MC2` and updates all call sites. - Updates `select_moe_comm_method` to optionally select `FUSED_MC2` on Ascend A3 when: - `enable_expert_parallel=True` - quantization is `w8a8_dynamic` - `EP <= 16` - `dynamic_eplb` is disabled - `is_mtp_model = False` - Replaces the old “fused all-to-all” comm implementation with `FusedMC2CommImpl`, using `TokenDispatcherWithMC2` / `PrepareAndFinalizeWithMC2` and `dispatch_ffn_combine`. - vLLM version: v0.12.0 - vLLM main: vllm-project/vllm@ad32e3e --------- Signed-off-by: Chen Chen <0109chenchen@gmail.com>

…tch_ffn_c… (#5284) ### What this PR does / why we need it? - This PR removes the Expert Parallel (EP) HCCL buffer allocation that was previously introduced by the fused-op `dispatch_ffn_combine` (#3532 ), since the fused-op has switch to MC2 HCCL buffer (#5156 ). ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? - vLLM version: release/v0.13.0 - vLLM main: vllm-project/vllm@ad32e3e Signed-off-by: Chen Chen <0109chenchen@gmail.com>

vllm-project#5156) ### What this PR does / why we need it? - Renames the MoE comm enum value `MoECommType.FUSED_ALLTOALL` to `MoECommType.FUSED_MC2` and updates all call sites. - Updates `select_moe_comm_method` to optionally select `FUSED_MC2` on Ascend A3 when: - `enable_expert_parallel=True` - quantization is `w8a8_dynamic` - `EP <= 16` - `dynamic_eplb` is disabled - `is_mtp_model = False` - Replaces the old “fused all-to-all” comm implementation with `FusedMC2CommImpl`, using `TokenDispatcherWithMC2` / `PrepareAndFinalizeWithMC2` and `dispatch_ffn_combine`. - vLLM version: v0.12.0 - vLLM main: vllm-project/vllm@ad32e3e --------- Signed-off-by: Chen Chen <0109chenchen@gmail.com> Signed-off-by: zrj026 <zhangrunjiang026@gmail.com>

…tch_ffn_c… (vllm-project#5284) ### What this PR does / why we need it? - This PR removes the Expert Parallel (EP) HCCL buffer allocation that was previously introduced by the fused-op `dispatch_ffn_combine` (vllm-project#3532 ), since the fused-op has switch to MC2 HCCL buffer (vllm-project#5156 ). ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? - vLLM version: release/v0.13.0 - vLLM main: vllm-project/vllm@ad32e3e Signed-off-by: Chen Chen <0109chenchen@gmail.com> Signed-off-by: zrj026 <zhangrunjiang026@gmail.com>

vllm-project#5156) ### What this PR does / why we need it? - Renames the MoE comm enum value `MoECommType.FUSED_ALLTOALL` to `MoECommType.FUSED_MC2` and updates all call sites. - Updates `select_moe_comm_method` to optionally select `FUSED_MC2` on Ascend A3 when: - `enable_expert_parallel=True` - quantization is `w8a8_dynamic` - `EP <= 16` - `dynamic_eplb` is disabled - `is_mtp_model = False` - Replaces the old “fused all-to-all” comm implementation with `FusedMC2CommImpl`, using `TokenDispatcherWithMC2` / `PrepareAndFinalizeWithMC2` and `dispatch_ffn_combine`. - vLLM version: v0.12.0 - vLLM main: vllm-project/vllm@ad32e3e --------- Signed-off-by: Chen Chen <0109chenchen@gmail.com> Signed-off-by: zrj026 <zhangrunjiang026@gmail.com>

…tch_ffn_c… (vllm-project#5284) ### What this PR does / why we need it? - This PR removes the Expert Parallel (EP) HCCL buffer allocation that was previously introduced by the fused-op `dispatch_ffn_combine` (vllm-project#3532 ), since the fused-op has switch to MC2 HCCL buffer (vllm-project#5156 ). ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? - vLLM version: release/v0.13.0 - vLLM main: vllm-project/vllm@ad32e3e Signed-off-by: Chen Chen <0109chenchen@gmail.com> Signed-off-by: zrj026 <zhangrunjiang026@gmail.com>

gemini-code-assist bot reviewed Dec 18, 2025

View reviewed changes

Comment thread vllm_ascend/ops/fused_moe/moe_comm_method.py

github-actions bot added module:ops module:core module:quantization labels Dec 18, 2025

Use FUSED_MC2 MoE comm path for the op dispatch_ffn_combine

2c13a50

Signed-off-by: Chen Chen <0109chenchen@gmail.com>

kiscad force-pushed the pg-mc2 branch from a8e5b15 to 2c13a50 Compare December 18, 2025 09:40

kiscad changed the title ~~Use FUSED_MC2 MoE comm path for the op dispatch_ffn_combine~~ [bugfix] Use FUSED_MC2 MoE comm path for the op dispatch_ffn_combine Dec 18, 2025

yiz-liu requested changes Dec 18, 2025

View reviewed changes

kiscad requested a review from yiz-liu December 18, 2025 11:21

kiscad force-pushed the pg-mc2 branch from 0fd6fc7 to 4da71c6 Compare December 18, 2025 12:42

add env-var switch

cdb8f03

Signed-off-by: Chen Chen <0109chenchen@gmail.com>

kiscad force-pushed the pg-mc2 branch from 4da71c6 to cdb8f03 Compare December 18, 2025 13:01

wangxiyuan approved these changes Dec 18, 2025

View reviewed changes

wangxiyuan merged commit 1b47fca into vllm-project:main Dec 18, 2025
25 checks passed

kiscad mentioned this pull request Dec 23, 2025

[bugfix] remove the EP buffer allocation introduced by fused-op dispatch_ffn_c… #5284

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[bugfix] Use FUSED_MC2 MoE comm path for the op `dispatch_ffn_combine`#5156

[bugfix] Use FUSED_MC2 MoE comm path for the op `dispatch_ffn_combine`#5156
wangxiyuan merged 2 commits intovllm-project:mainfrom
kiscad:pg-mc2

kiscad commented Dec 18, 2025 •

edited by github-actions bot

Loading

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

Uh oh!

github-actions bot commented Dec 18, 2025

Uh oh!

yiz-liu Dec 18, 2025

Uh oh!

wangxiyuan Dec 18, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

kiscad commented Dec 18, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What this PR does / why we need it?

Does this PR introduce any user-facing change?

How was this patch tested?

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

github-actions bot commented Dec 18, 2025

Uh oh!

yiz-liu Dec 18, 2025

Choose a reason for hiding this comment

Uh oh!

wangxiyuan Dec 18, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

kiscad commented Dec 18, 2025 •

edited by github-actions bot

Loading