[Feature][Cherry Pick]Enable DispatchGmmCombineDecode when eagle is moe with w8a8, or not moe by wangqiankun13 · Pull Request #6081 · vllm-project/vllm-ascend

wangqiankun13 · 2026-01-21T07:01:57Z

What this PR does / why we need it?

This PR is cherry-picked from #5758.

Operator DispatchGmmCombineDecode does not support non-W8A8 scenarios and cannot share the same communication domain with Operator Dispatch/Combine.

for instance, when the draft model uses a non-W8A8 MOE architecture while the main model employs a W8A8 MOE architecture.

Therefore days ago, I implemented an interception that unconditionally disables Operator DispatchGmmCombineDecode whenever the speculative mode is EAGLE or EAGLE-3. #5293
However, this approach was not precise enough.
This PR further refines the logic by specifically identifying the draft model's configuration: Operator DispatchGmmCombineDecode will now be disabled only when the draft model uses an MOE architecture and is non-W8A8.

More info about this operator, please refer to RFC: issue #5476

Does this PR introduce any user-facing change?

How was this patch tested?

… or not moe Signed-off-by: wangqiankun <wangqiankun13@huawei.com>

…lm-ascend into FIA_v0.13.0 * 'releases/v0.13.0' of https://github.com/vllm-project/vllm-ascend: [Feature][Cherry Pick]Enable DispatchGmmCombineDecode when eagle is moe with w8a8, or not moe (vllm-project#6081) [v0.13.0][BugFix][Cherry Pick] Fix input parameter bug of dispatch_gmm_combine_decode (vllm-project#5931) [0.13.0][Bugfix] Fix Triton operator usage for multimodal models based on `the mrope_interleaved` parameter (vllm-project#6074) [v0.13.0][CI] Upgrade to CANN 8.5.0 (vllm-project#6101)

…oe with w8a8, or not moe (vllm-project#6081) ### What this PR does / why we need it? This PR is cherry-picked from vllm-project#5758. Operator DispatchGmmCombineDecode does not support non-W8A8 scenarios and cannot share the same communication domain with Operator Dispatch/Combine. for instance, when the draft model uses a non-W8A8 MOE architecture while the main model employs a W8A8 MOE architecture. Therefore days ago, I implemented an interception that unconditionally disables Operator DispatchGmmCombineDecode whenever the speculative mode is EAGLE or EAGLE-3. vllm-project#5293 However, this approach was not precise enough. This PR further refines the logic by specifically identifying the draft model's configuration: Operator DispatchGmmCombineDecode will now be disabled only when the draft model uses an MOE architecture and is non-W8A8. More info about this operator, please refer to RFC: issue vllm-project#5476 Signed-off-by: wangqiankun <wangqiankun13@huawei.com>

wangqiankun13 changed the title ~~[Feature]Enable DispatchGmmCombineDecode when eagle is moe with w8a8, or not moe~~ [Feature][Cherry Pick]Enable DispatchGmmCombineDecode when eagle is moe with w8a8, or not moe Jan 21, 2026

wangqiankun13 force-pushed the v0.13-check_eagle branch 2 times, most recently from aad7763 to 0979339 Compare January 21, 2026 08:12

[Feature]Enable DispatchGmmCombineDecode when eagle is moe with w8a8,…

1607ec1

… or not moe Signed-off-by: wangqiankun <wangqiankun13@huawei.com>

wangqiankun13 force-pushed the v0.13-check_eagle branch from 0979339 to 1607ec1 Compare January 21, 2026 09:42

Angazenn added ready read for review ready-for-test start test by label for PR labels Jan 21, 2026

wangxiyuan merged commit 47d1b9b into vllm-project:releases/v0.13.0 Jan 22, 2026
20 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature][Cherry Pick]Enable DispatchGmmCombineDecode when eagle is moe with w8a8, or not moe#6081

[Feature][Cherry Pick]Enable DispatchGmmCombineDecode when eagle is moe with w8a8, or not moe#6081
wangxiyuan merged 1 commit intovllm-project:releases/v0.13.0from
wangqiankun13:v0.13-check_eagle

wangqiankun13 commented Jan 21, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

wangqiankun13 commented Jan 21, 2026

What this PR does / why we need it?

Does this PR introduce any user-facing change?

How was this patch tested?

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants