Skip to content

[bugfix] Fix dummy-run and multi-node issues in MoE routing and MTP#4947

Merged
wangxiyuan merged 1 commit intovllm-project:mainfrom
kiscad:fix-smoking
Dec 15, 2025
Merged

[bugfix] Fix dummy-run and multi-node issues in MoE routing and MTP#4947
wangxiyuan merged 1 commit intovllm-project:mainfrom
kiscad:fix-smoking

Conversation

@kiscad
Copy link
Copy Markdown
Contributor

@kiscad kiscad commented Dec 12, 2025

What this PR does / why we need it?

  • Fix a premature return in moe_init_routing_quant_v2.cpp so the routing kernel completes correctly instead of exiting early in certain paths.
  • Switch FusedAlltoAllCommImpl to use the MC2-based token dispatcher and prepare/finalize routines, aligning MoE communication with the MC2 algorithm optimized for Ascend devices.
  • Add a temporary override in MtpProposer to map FUSED_ALLTOALL back to ALLTOALL until the MoE communication type selection logic is fully finalized, avoiding incorrect behavior in dummy-run flows.
  • Simplify the MoE communication selection for Ascend 910-93 in NPUModelRunner by removing the EP-size guard on FUSED_ALLTOALL, which fixes failures in multi-node / larger-EP configurations while keeping MC2 routing under the configured token capacity.

Does this PR introduce any user-facing change?

How was this patch tested?

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces fixes for Mixture-of-Experts (MoE) functionality, specifically targeting dummy-run and multi-node scenarios. The main changes involve updating the MoE communication method for FUSED_ALLTOALL to use MC2 components, which likely resolves issues in multi-node setups. Additionally, a guard on the expert parallelism size has been removed, enabling this path for larger configurations. A minor cleanup in a C++ kernel is also included. The changes appear to correctly address the intended fixes. However, I've pointed out that a docstring in moe_comm_method.py is now outdated due to these changes, which could impact future maintainability.

Comment thread vllm_ascend/ops/fused_moe/moe_comm_method.py Outdated
@github-actions
Copy link
Copy Markdown
Contributor

👋 Hi! Thank you for contributing to the vLLM Ascend project. The following points will speed up your PR merge:‌‌

  • A PR should do only one thing, smaller PRs enable faster reviews.
  • Every PR should include unit tests and end-to-end tests ‌to ensure it works and is not broken by other future PRs.
  • Write the commit message by fulfilling the PR description to help reviewer and future developers understand.

If CI fails, you can run linting and testing checks locally according Contributing and Testing.

@kiscad kiscad force-pushed the fix-smoking branch 2 times, most recently from 4020532 to 972aee2 Compare December 12, 2025 06:58
@kiscad kiscad changed the title [bugfix] fix errors in dummy-run and multi-nodes scenarios [bugfix] fix crashes in dummy-run and multi-nodes scenarios Dec 12, 2025
@kiscad kiscad changed the title [bugfix] fix crashes in dummy-run and multi-nodes scenarios [bugfix] Fix dummy-run and multi-node issues in MoE routing and MTP Dec 12, 2025
@kiscad kiscad force-pushed the fix-smoking branch 2 times, most recently from b7e217f to 7b3c477 Compare December 12, 2025 09:23
@github-actions
Copy link
Copy Markdown
Contributor

This pull request has conflicts, please resolve those before we can evaluate the pull request.

@kiscad kiscad force-pushed the fix-smoking branch 3 times, most recently from 84c68f8 to c8a182f Compare December 13, 2025 08:54
Signed-off-by: mojave2 <chenchen145@huawei.com>
@weijinqian0 weijinqian0 added ready read for review ready-for-test start test by label for PR labels Dec 13, 2025
@wangxiyuan wangxiyuan merged commit aa02a85 into vllm-project:main Dec 15, 2025
48 of 52 checks passed
chenaoxuan pushed a commit to chenaoxuan/vllm-ascend that referenced this pull request Dec 20, 2025
…llm-project#4947)

### What this PR does / why we need it?

- Fix a premature `return` in `moe_init_routing_quant_v2.cpp` so the
routing kernel completes correctly instead of exiting early in certain
paths.
- Switch `FusedAlltoAllCommImpl` to use the MC2-based token dispatcher
and prepare/finalize routines, aligning MoE communication with the MC2
algorithm optimized for Ascend devices.
- Add a temporary override in `MtpProposer` to map `FUSED_ALLTOALL` back
to `ALLTOALL` until the MoE communication type selection logic is fully
finalized, avoiding incorrect behavior in dummy-run flows.
- Simplify the MoE communication selection for Ascend 910-93 in
`NPUModelRunner` by removing the EP-size guard on `FUSED_ALLTOALL`,
which fixes failures in multi-node / larger-EP configurations while
keeping MC2 routing under the configured token capacity.

- vLLM version: v0.12.0
- vLLM main:
vllm-project/vllm@ad32e3e

Signed-off-by: mojave2 <chenchen145@huawei.com>
ZRJ026 pushed a commit to ZRJ026/vllm-ascend that referenced this pull request Feb 28, 2026
…llm-project#4947)

### What this PR does / why we need it?

- Fix a premature `return` in `moe_init_routing_quant_v2.cpp` so the
routing kernel completes correctly instead of exiting early in certain
paths.
- Switch `FusedAlltoAllCommImpl` to use the MC2-based token dispatcher
and prepare/finalize routines, aligning MoE communication with the MC2
algorithm optimized for Ascend devices.
- Add a temporary override in `MtpProposer` to map `FUSED_ALLTOALL` back
to `ALLTOALL` until the MoE communication type selection logic is fully
finalized, avoiding incorrect behavior in dummy-run flows.
- Simplify the MoE communication selection for Ascend 910-93 in
`NPUModelRunner` by removing the EP-size guard on `FUSED_ALLTOALL`,
which fixes failures in multi-node / larger-EP configurations while
keeping MC2 routing under the configured token capacity.

- vLLM version: v0.12.0
- vLLM main:
vllm-project/vllm@ad32e3e

Signed-off-by: mojave2 <chenchen145@huawei.com>
Signed-off-by: zrj026 <zhangrunjiang026@gmail.com>
ZRJ026 pushed a commit to ZRJ026/vllm-ascend that referenced this pull request Mar 4, 2026
…llm-project#4947)

### What this PR does / why we need it?

- Fix a premature `return` in `moe_init_routing_quant_v2.cpp` so the
routing kernel completes correctly instead of exiting early in certain
paths.
- Switch `FusedAlltoAllCommImpl` to use the MC2-based token dispatcher
and prepare/finalize routines, aligning MoE communication with the MC2
algorithm optimized for Ascend devices.
- Add a temporary override in `MtpProposer` to map `FUSED_ALLTOALL` back
to `ALLTOALL` until the MoE communication type selection logic is fully
finalized, avoiding incorrect behavior in dummy-run flows.
- Simplify the MoE communication selection for Ascend 910-93 in
`NPUModelRunner` by removing the EP-size guard on `FUSED_ALLTOALL`,
which fixes failures in multi-node / larger-EP configurations while
keeping MC2 routing under the configured token capacity.

- vLLM version: v0.12.0
- vLLM main:
vllm-project/vllm@ad32e3e

Signed-off-by: mojave2 <chenchen145@huawei.com>
Signed-off-by: zrj026 <zhangrunjiang026@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

module:ops ready read for review ready-for-test start test by label for PR

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants