Skip to content

[bugfix] Use FUSED_MC2 MoE comm path for the op dispatch_ffn_combine#5156

Merged
wangxiyuan merged 2 commits intovllm-project:mainfrom
kiscad:pg-mc2
Dec 18, 2025
Merged

[bugfix] Use FUSED_MC2 MoE comm path for the op dispatch_ffn_combine#5156
wangxiyuan merged 2 commits intovllm-project:mainfrom
kiscad:pg-mc2

Conversation

@kiscad
Copy link
Copy Markdown
Contributor

@kiscad kiscad commented Dec 18, 2025

What this PR does / why we need it?

  • Renames the MoE comm enum value MoECommType.FUSED_ALLTOALL to MoECommType.FUSED_MC2 and updates all call sites.
  • Updates select_moe_comm_method to optionally select FUSED_MC2 on Ascend A3 when:
    • enable_expert_parallel=True
    • quantization is w8a8_dynamic
    • EP <= 16
    • dynamic_eplb is disabled
    • is_mtp_model = False
  • Replaces the old “fused all-to-all” comm implementation with FusedMC2CommImpl, using TokenDispatcherWithMC2 / PrepareAndFinalizeWithMC2 and dispatch_ffn_combine.

Does this PR introduce any user-facing change?

How was this patch tested?

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request refactors the Mixture-of-Experts (MoE) communication path by renaming FUSED_ALLTOALL to FUSED_MC2 and enabling it for the dispatch_ffn_combine op on Ascend A3 devices under specific quantization and parallelism settings. The changes significantly improve code clarity, especially in the select_moe_comm_method function, by simplifying the selection logic. The new FusedMC2CommImpl correctly utilizes the dispatch_ffn_combine custom op. My main feedback concerns a type hint mismatch in the fused_experts method, which could impact maintainability. Overall, this is a good improvement for enabling a more performant MoE path.

Comment thread vllm_ascend/ops/fused_moe/moe_comm_method.py
@github-actions
Copy link
Copy Markdown
Contributor

👋 Hi! Thank you for contributing to the vLLM Ascend project. The following points will speed up your PR merge:‌‌

  • A PR should do only one thing, smaller PRs enable faster reviews.
  • Every PR should include unit tests and end-to-end tests ‌to ensure it works and is not broken by other future PRs.
  • Write the commit message by fulfilling the PR description to help reviewer and future developers understand.

If CI fails, you can run linting and testing checks locally according Contributing and Testing.

Signed-off-by: Chen Chen <0109chenchen@gmail.com>
@kiscad kiscad changed the title Use FUSED_MC2 MoE comm path for the op dispatch_ffn_combine [bugfix] Use FUSED_MC2 MoE comm path for the op dispatch_ffn_combine Dec 18, 2025
Comment thread vllm_ascend/ascend_forward_context.py Outdated
Comment on lines +256 to +262
fused_mc2_enable = quant_type == "w8a8_dynamic" and get_ep_group(
).world_size <= 16 and (not dynamic_eplb) and (not is_mtp_model)
if fused_mc2_enable:
moe_comm_type = MoECommType.FUSED_MC2
else:
moe_comm_type = (MoECommType.MC2 if num_tokens
<= mc2_tokens_capacity else MoECommType.ALLTOALL)
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We may need to be more careful here.

Signed-off-by: Chen Chen <0109chenchen@gmail.com>
else MoECommType.FUSED_ALLTOALL
if fused_all2all_enable else MoECommType.ALLTOALL)
fused_mc2_enable = envs_ascend.VLLM_ASCEND_ENABLE_FUSED_MC2 and quant_type == "w8a8_dynamic" and get_ep_group(
).world_size <= 16 and (not dynamic_eplb) and (not is_mtp_model)
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

change is_mtp_model to check model type

@wangxiyuan wangxiyuan merged commit 1b47fca into vllm-project:main Dec 18, 2025
25 checks passed
845473182 pushed a commit to 845473182/vllm-ascend that referenced this pull request Dec 19, 2025
…to eplb_refactor

* 'main' of https://github.com/vllm-project/vllm-ascend: (52 commits)
  [Doc]Add the user_guide doc file regarding fine-grained TP. (vllm-project#5084)
  [pref] qwen3_next add triton ops : fused_sigmoid_gating_delta_rule_update (vllm-project#4818)
  [Feature] Add token mask for DispatchGmmCombineDecode operator (vllm-project#5171)
  [CI] Improve CI (vllm-project#5078)
  [Refactor] remove some metadata variables in attention_v1. (vllm-project#5160)
  Add Qwen3-VL-235B-A22B-Instruct tutorials (vllm-project#5167)
  [Doc] Add a perf tune section (vllm-project#5127)
  [Image] Refactor image build (vllm-project#5175)
  [refactor] refactor weight trans nz and transpose (vllm-project#4878)
  [BugFix]Fix precision issue for LoRA feature (vllm-project#4141)
  【Doc】Deepseekv3.1/R1 doc enhancement (vllm-project#4827)
  support basic long_seq feature st (vllm-project#5140)
  [Bugfix] install trition for test_custom_op (vllm-project#5112)
  [2/N][Pangu][MoE] Remove Pangu Related Code (vllm-project#5130)
  [bugfix] Use FUSED_MC2 MoE comm path for the op `dispatch_ffn_combine` (vllm-project#5156)
  [BugFix] Fix top_p,top_k issue with EAGLE and add top_p,top_k in EAGLE e2e (vllm-project#5131)
  [Doc][P/D] Fix MooncakeConnector's name (vllm-project#5172)
  [Bugfix] Fix in_profile_run in mtp_proposer dummy_run (vllm-project#5165)
  [Doc] Refact benchmark doc (vllm-project#5173)
  [Nightly]  Avoid max_model_len being smaller than the decoder prompt to prevent single-node-accuray-tests from failing (vllm-project#5174)
  ...

Signed-off-by: 白永斌 <baiyongbin3@h-partners.com>
chenaoxuan pushed a commit to chenaoxuan/vllm-ascend that referenced this pull request Dec 20, 2025
vllm-project#5156)

### What this PR does / why we need it?

- Renames the MoE comm enum value `MoECommType.FUSED_ALLTOALL` to
`MoECommType.FUSED_MC2` and updates all call sites.
- Updates `select_moe_comm_method` to optionally select `FUSED_MC2` on
Ascend A3 when:
  - `enable_expert_parallel=True`
  - quantization is `w8a8_dynamic`
  - `EP <= 16`
  - `dynamic_eplb` is disabled
  - `is_mtp_model = False`
- Replaces the old “fused all-to-all” comm implementation with
`FusedMC2CommImpl`, using `TokenDispatcherWithMC2` /
`PrepareAndFinalizeWithMC2` and `dispatch_ffn_combine`.

- vLLM version: v0.12.0
- vLLM main:
vllm-project/vllm@ad32e3e

---------

Signed-off-by: Chen Chen <0109chenchen@gmail.com>
zzzzwwjj pushed a commit that referenced this pull request Dec 24, 2025
…tch_ffn_c… (#5284)

### What this PR does / why we need it?

- This PR removes the Expert Parallel (EP) HCCL buffer allocation that
was previously introduced by the fused-op `dispatch_ffn_combine` (#3532
), since the fused-op has switch to MC2 HCCL buffer (#5156 ).

### Does this PR introduce _any_ user-facing change?

### How was this patch tested?

- vLLM version: release/v0.13.0
- vLLM main:
vllm-project/vllm@ad32e3e

Signed-off-by: Chen Chen <0109chenchen@gmail.com>
ZRJ026 pushed a commit to ZRJ026/vllm-ascend that referenced this pull request Feb 28, 2026
vllm-project#5156)

### What this PR does / why we need it?

- Renames the MoE comm enum value `MoECommType.FUSED_ALLTOALL` to
`MoECommType.FUSED_MC2` and updates all call sites.
- Updates `select_moe_comm_method` to optionally select `FUSED_MC2` on
Ascend A3 when:
  - `enable_expert_parallel=True`
  - quantization is `w8a8_dynamic`
  - `EP <= 16`
  - `dynamic_eplb` is disabled
  - `is_mtp_model = False`
- Replaces the old “fused all-to-all” comm implementation with
`FusedMC2CommImpl`, using `TokenDispatcherWithMC2` /
`PrepareAndFinalizeWithMC2` and `dispatch_ffn_combine`.

- vLLM version: v0.12.0
- vLLM main:
vllm-project/vllm@ad32e3e

---------

Signed-off-by: Chen Chen <0109chenchen@gmail.com>
Signed-off-by: zrj026 <zhangrunjiang026@gmail.com>
ZRJ026 pushed a commit to ZRJ026/vllm-ascend that referenced this pull request Feb 28, 2026
…tch_ffn_c… (vllm-project#5284)

### What this PR does / why we need it?

- This PR removes the Expert Parallel (EP) HCCL buffer allocation that
was previously introduced by the fused-op `dispatch_ffn_combine` (vllm-project#3532
), since the fused-op has switch to MC2 HCCL buffer (vllm-project#5156 ).

### Does this PR introduce _any_ user-facing change?

### How was this patch tested?

- vLLM version: release/v0.13.0
- vLLM main:
vllm-project/vllm@ad32e3e

Signed-off-by: Chen Chen <0109chenchen@gmail.com>
Signed-off-by: zrj026 <zhangrunjiang026@gmail.com>
ZRJ026 pushed a commit to ZRJ026/vllm-ascend that referenced this pull request Mar 4, 2026
vllm-project#5156)

### What this PR does / why we need it?

- Renames the MoE comm enum value `MoECommType.FUSED_ALLTOALL` to
`MoECommType.FUSED_MC2` and updates all call sites.
- Updates `select_moe_comm_method` to optionally select `FUSED_MC2` on
Ascend A3 when:
  - `enable_expert_parallel=True`
  - quantization is `w8a8_dynamic`
  - `EP <= 16`
  - `dynamic_eplb` is disabled
  - `is_mtp_model = False`
- Replaces the old “fused all-to-all” comm implementation with
`FusedMC2CommImpl`, using `TokenDispatcherWithMC2` /
`PrepareAndFinalizeWithMC2` and `dispatch_ffn_combine`.

- vLLM version: v0.12.0
- vLLM main:
vllm-project/vllm@ad32e3e

---------

Signed-off-by: Chen Chen <0109chenchen@gmail.com>
Signed-off-by: zrj026 <zhangrunjiang026@gmail.com>
ZRJ026 pushed a commit to ZRJ026/vllm-ascend that referenced this pull request Mar 4, 2026
…tch_ffn_c… (vllm-project#5284)

### What this PR does / why we need it?

- This PR removes the Expert Parallel (EP) HCCL buffer allocation that
was previously introduced by the fused-op `dispatch_ffn_combine` (vllm-project#3532
), since the fused-op has switch to MC2 HCCL buffer (vllm-project#5156 ).

### Does this PR introduce _any_ user-facing change?

### How was this patch tested?

- vLLM version: release/v0.13.0
- vLLM main:
vllm-project/vllm@ad32e3e

Signed-off-by: Chen Chen <0109chenchen@gmail.com>
Signed-off-by: zrj026 <zhangrunjiang026@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants