[Bugfix] Disable the dispatch_ffn_combine kernel in MTP path by kiscad · Pull Request #4751 · vllm-project/vllm-ascend

kiscad · 2025-12-05T10:15:14Z

What this PR does / why we need it?

This PR is to fix a smoking test failure. Adjust mtp_proposer and model_runner_v1 to route MTP decoding through the non‑fused MoE implementation while keeping the overall inference flow unchanged.

Does this PR introduce any user-facing change?

How was this patch tested?

This PR will be tested in smoking tests.

vLLM version: v0.12.0
vLLM main: vllm-project/vllm@ad32e3e

gemini-code-assist

Code Review

This pull request introduces a bugfix to disable the fused-moe kernel during the dummy_run of the MTP (Multi-path Transformer) proposer. This is accomplished by checking if the selected MoE communication method is FUSED_ALLTOALL and reverting to the standard ALLTOALL method if it is. This change is localized and specifically targets the dummy_run, which is crucial for graph capturing. The modification correctly addresses a likely bug with the fused kernel in this context, and the implementation is sound. No issues were found in the proposed changes.

github-actions · 2025-12-05T10:51:01Z

👋 Hi! Thank you for contributing to the vLLM Ascend project. The following points will speed up your PR merge:‌‌

A PR should do only one thing, smaller PRs enable faster reviews.
Every PR should include unit tests and end-to-end tests ‌to ensure it works and is not broken by other future PRs.
Write the commit message by fulfilling the PR description to help reviewer and future developers understand.

If CI fails, you can run linting and testing checks locally according Contributing and Testing.

zzzzwwjj · 2025-12-08T03:41:26Z

moe_comm_type judgement conditions need to consider whether is quant case;
dispatch_ffn_combine op need support EP>16 case;
This two problems need to finish afterwards.

Signed-off-by: mojave2 <chenchen145@huawei.com>

…oject#4751) ### What this PR does / why we need it? This PR is to fix a smoking test failure. Adjust mtp_proposer and model_runner_v1 to route MTP decoding through the non‑fused MoE implementation while keeping the overall inference flow unchanged. - vLLM version: v0.12.0 - vLLM main: vllm-project/vllm@ad32e3e Signed-off-by: mojave2 <chenchen145@huawei.com> Co-authored-by: Mengqing Cao <cmq0113@163.com>

gemini-code-assist bot reviewed Dec 5, 2025

View reviewed changes

kiscad force-pushed the bugfix-fusedmoe branch 2 times, most recently from 2f86c9f to 77e74e4 Compare December 6, 2025 12:08

kiscad changed the title ~~[Bugfix] disable fused-moe kernel in MTP module~~ [Bugfix] Disable fused MoE kernel in MTP decoding path Dec 6, 2025

kiscad changed the title ~~[Bugfix] Disable fused MoE kernel in MTP decoding path~~ [Bugfix] Disable the dispatch_ffn_combine kernel in MTP path Dec 6, 2025

zzzzwwjj approved these changes Dec 8, 2025

View reviewed changes

wangxiyuan approved these changes Dec 8, 2025

View reviewed changes

wangxiyuan added ready read for review ready-for-test start test by label for PR labels Dec 8, 2025

kiscad force-pushed the bugfix-fusedmoe branch 3 times, most recently from 4273f3c to 37ec423 Compare December 9, 2025 02:52

[Bugfix] disable fused-moe kernel in MTP module

c460dda

Signed-off-by: mojave2 <chenchen145@huawei.com>

kiscad force-pushed the bugfix-fusedmoe branch from 37ec423 to c460dda Compare December 9, 2025 06:13

MengqingCao approved these changes Dec 9, 2025

View reviewed changes

Merge branch 'main' into bugfix-fusedmoe

3f3d91d

MengqingCao merged commit 848419d into vllm-project:main Dec 9, 2025
18 checks passed

linfeng-yuan mentioned this pull request Dec 16, 2025

[Feature]Use DispatchGmmCombineDecode operator to replace MC2(Optional) #5040

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bugfix] Disable the dispatch_ffn_combine kernel in MTP path#4751

[Bugfix] Disable the dispatch_ffn_combine kernel in MTP path#4751
MengqingCao merged 2 commits intovllm-project:mainfrom
kiscad:bugfix-fusedmoe

kiscad commented Dec 5, 2025 •

edited by github-actions bot

Loading

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

github-actions bot commented Dec 5, 2025

Uh oh!

zzzzwwjj commented Dec 8, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

kiscad commented Dec 5, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What this PR does / why we need it?

Does this PR introduce any user-facing change?

How was this patch tested?

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

github-actions bot commented Dec 5, 2025

Uh oh!

zzzzwwjj commented Dec 8, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

kiscad commented Dec 5, 2025 •

edited by github-actions bot

Loading