[bugfix] Fix dummy-run and multi-node issues in MoE routing and MTP by kiscad · Pull Request #4947 · vllm-project/vllm-ascend

kiscad · 2025-12-12T04:06:47Z

What this PR does / why we need it?

Fix a premature return in moe_init_routing_quant_v2.cpp so the routing kernel completes correctly instead of exiting early in certain paths.
Switch FusedAlltoAllCommImpl to use the MC2-based token dispatcher and prepare/finalize routines, aligning MoE communication with the MC2 algorithm optimized for Ascend devices.
Add a temporary override in MtpProposer to map FUSED_ALLTOALL back to ALLTOALL until the MoE communication type selection logic is fully finalized, avoiding incorrect behavior in dummy-run flows.
Simplify the MoE communication selection for Ascend 910-93 in NPUModelRunner by removing the EP-size guard on FUSED_ALLTOALL, which fixes failures in multi-node / larger-EP configurations while keeping MC2 routing under the configured token capacity.

Does this PR introduce any user-facing change?

How was this patch tested?

vLLM version: v0.12.0
vLLM main: vllm-project/vllm@ad32e3e

gemini-code-assist

Code Review

This pull request introduces fixes for Mixture-of-Experts (MoE) functionality, specifically targeting dummy-run and multi-node scenarios. The main changes involve updating the MoE communication method for FUSED_ALLTOALL to use MC2 components, which likely resolves issues in multi-node setups. Additionally, a guard on the expert parallelism size has been removed, enabling this path for larger configurations. A minor cleanup in a C++ kernel is also included. The changes appear to correctly address the intended fixes. However, I've pointed out that a docstring in moe_comm_method.py is now outdated due to these changes, which could impact future maintainability.

github-actions · 2025-12-12T04:43:20Z

👋 Hi! Thank you for contributing to the vLLM Ascend project. The following points will speed up your PR merge:‌‌

A PR should do only one thing, smaller PRs enable faster reviews.
Every PR should include unit tests and end-to-end tests ‌to ensure it works and is not broken by other future PRs.
Write the commit message by fulfilling the PR description to help reviewer and future developers understand.

If CI fails, you can run linting and testing checks locally according Contributing and Testing.

github-actions · 2025-12-12T09:30:53Z

This pull request has conflicts, please resolve those before we can evaluate the pull request.

Signed-off-by: mojave2 <chenchen145@huawei.com>

…llm-project#4947) ### What this PR does / why we need it? - Fix a premature `return` in `moe_init_routing_quant_v2.cpp` so the routing kernel completes correctly instead of exiting early in certain paths. - Switch `FusedAlltoAllCommImpl` to use the MC2-based token dispatcher and prepare/finalize routines, aligning MoE communication with the MC2 algorithm optimized for Ascend devices. - Add a temporary override in `MtpProposer` to map `FUSED_ALLTOALL` back to `ALLTOALL` until the MoE communication type selection logic is fully finalized, avoiding incorrect behavior in dummy-run flows. - Simplify the MoE communication selection for Ascend 910-93 in `NPUModelRunner` by removing the EP-size guard on `FUSED_ALLTOALL`, which fixes failures in multi-node / larger-EP configurations while keeping MC2 routing under the configured token capacity. - vLLM version: v0.12.0 - vLLM main: vllm-project/vllm@ad32e3e Signed-off-by: mojave2 <chenchen145@huawei.com>

…llm-project#4947) ### What this PR does / why we need it? - Fix a premature `return` in `moe_init_routing_quant_v2.cpp` so the routing kernel completes correctly instead of exiting early in certain paths. - Switch `FusedAlltoAllCommImpl` to use the MC2-based token dispatcher and prepare/finalize routines, aligning MoE communication with the MC2 algorithm optimized for Ascend devices. - Add a temporary override in `MtpProposer` to map `FUSED_ALLTOALL` back to `ALLTOALL` until the MoE communication type selection logic is fully finalized, avoiding incorrect behavior in dummy-run flows. - Simplify the MoE communication selection for Ascend 910-93 in `NPUModelRunner` by removing the EP-size guard on `FUSED_ALLTOALL`, which fixes failures in multi-node / larger-EP configurations while keeping MC2 routing under the configured token capacity. - vLLM version: v0.12.0 - vLLM main: vllm-project/vllm@ad32e3e Signed-off-by: mojave2 <chenchen145@huawei.com> Signed-off-by: zrj026 <zhangrunjiang026@gmail.com>

gemini-code-assist bot reviewed Dec 12, 2025

View reviewed changes

Comment thread vllm_ascend/ops/fused_moe/moe_comm_method.py Outdated

github-actions bot added the module:ops label Dec 12, 2025

kiscad force-pushed the fix-smoking branch 2 times, most recently from 4020532 to 972aee2 Compare December 12, 2025 06:58

kiscad changed the title ~~[bugfix] fix errors in dummy-run and multi-nodes scenarios~~ [bugfix] fix crashes in dummy-run and multi-nodes scenarios Dec 12, 2025

kiscad force-pushed the fix-smoking branch from 972aee2 to a0995c0 Compare December 12, 2025 07:22

kiscad changed the title ~~[bugfix] fix crashes in dummy-run and multi-nodes scenarios~~ [bugfix] Fix dummy-run and multi-node issues in MoE routing and MTP Dec 12, 2025

kiscad force-pushed the fix-smoking branch 2 times, most recently from b7e217f to 7b3c477 Compare December 12, 2025 09:23

github-actions bot added the merge-conflicts label Dec 12, 2025

kiscad force-pushed the fix-smoking branch from 7b3c477 to 3fecb8d Compare December 12, 2025 09:31

github-actions bot removed the merge-conflicts label Dec 12, 2025

kiscad force-pushed the fix-smoking branch 3 times, most recently from 84c68f8 to c8a182f Compare December 13, 2025 08:54

[bugfix] fix errors in dummy-run and multi-nodes scenarios

84f9581

Signed-off-by: mojave2 <chenchen145@huawei.com>

kiscad force-pushed the fix-smoking branch from c8a182f to 84f9581 Compare December 13, 2025 09:30

weijinqian0 added ready read for review ready-for-test start test by label for PR labels Dec 13, 2025

weijinqian0 approved these changes Dec 15, 2025

View reviewed changes

wangxiyuan approved these changes Dec 15, 2025

View reviewed changes

wangxiyuan merged commit aa02a85 into vllm-project:main Dec 15, 2025
48 of 52 checks passed

linfeng-yuan mentioned this pull request Dec 16, 2025

[Feature]Use DispatchGmmCombineDecode operator to replace MC2(Optional) #5040

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[bugfix] Fix dummy-run and multi-node issues in MoE routing and MTP#4947

[bugfix] Fix dummy-run and multi-node issues in MoE routing and MTP#4947
wangxiyuan merged 1 commit intovllm-project:mainfrom
kiscad:fix-smoking

kiscad commented Dec 12, 2025 •

edited

Loading

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

Uh oh!

github-actions bot commented Dec 12, 2025

Uh oh!

github-actions bot commented Dec 12, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

kiscad commented Dec 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What this PR does / why we need it?

Does this PR introduce any user-facing change?

How was this patch tested?

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

github-actions bot commented Dec 12, 2025

Uh oh!

github-actions bot commented Dec 12, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

kiscad commented Dec 12, 2025 •

edited

Loading