Skip to content

[Feature][Cherry Pick]Enable DispatchGmmCombineDecode when eagle is moe with w8a8, or not moe#6081

Merged
wangxiyuan merged 1 commit intovllm-project:releases/v0.13.0from
wangqiankun13:v0.13-check_eagle
Jan 22, 2026
Merged

[Feature][Cherry Pick]Enable DispatchGmmCombineDecode when eagle is moe with w8a8, or not moe#6081
wangxiyuan merged 1 commit intovllm-project:releases/v0.13.0from
wangqiankun13:v0.13-check_eagle

Conversation

@wangqiankun13
Copy link
Copy Markdown
Contributor

What this PR does / why we need it?

This PR is cherry-picked from #5758.

Operator DispatchGmmCombineDecode does not support non-W8A8 scenarios and cannot share the same communication domain with Operator Dispatch/Combine.

for instance, when the draft model uses a non-W8A8 MOE architecture while the main model employs a W8A8 MOE architecture.

Therefore days ago, I implemented an interception that unconditionally disables Operator DispatchGmmCombineDecode whenever the speculative mode is EAGLE or EAGLE-3. #5293
However, this approach was not precise enough.
This PR further refines the logic by specifically identifying the draft model's configuration: Operator DispatchGmmCombineDecode will now be disabled only when the draft model uses an MOE architecture and is non-W8A8.

More info about this operator, please refer to RFC: issue #5476

Does this PR introduce any user-facing change?

How was this patch tested?

@wangqiankun13 wangqiankun13 changed the title [Feature]Enable DispatchGmmCombineDecode when eagle is moe with w8a8, or not moe [Feature][Cherry Pick]Enable DispatchGmmCombineDecode when eagle is moe with w8a8, or not moe Jan 21, 2026
@wangqiankun13 wangqiankun13 force-pushed the v0.13-check_eagle branch 2 times, most recently from aad7763 to 0979339 Compare January 21, 2026 08:12
… or not moe

Signed-off-by: wangqiankun <wangqiankun13@huawei.com>
@Angazenn Angazenn added ready read for review ready-for-test start test by label for PR labels Jan 21, 2026
@wangxiyuan wangxiyuan merged commit 47d1b9b into vllm-project:releases/v0.13.0 Jan 22, 2026
20 checks passed
845473182 pushed a commit to 845473182/vllm-ascend that referenced this pull request Jan 22, 2026
…lm-ascend into FIA_v0.13.0

* 'releases/v0.13.0' of https://github.com/vllm-project/vllm-ascend:
  [Feature][Cherry Pick]Enable DispatchGmmCombineDecode when eagle is moe with w8a8, or not moe (vllm-project#6081)
  [v0.13.0][BugFix][Cherry Pick] Fix input parameter bug of dispatch_gmm_combine_decode (vllm-project#5931)
  [0.13.0][Bugfix] Fix Triton operator usage for multimodal models based on `the mrope_interleaved` parameter (vllm-project#6074)
  [v0.13.0][CI] Upgrade to CANN 8.5.0 (vllm-project#6101)
starmountain1997 pushed a commit to starmountain1997/vllm-ascend that referenced this pull request Jan 31, 2026
…oe with w8a8, or not moe (vllm-project#6081)

### What this PR does / why we need it?
This PR is cherry-picked from
vllm-project#5758.

Operator DispatchGmmCombineDecode does not support non-W8A8 scenarios
and cannot share the same communication domain with Operator
Dispatch/Combine.

for instance, when the draft model uses a non-W8A8 MOE architecture
while the main model employs a W8A8 MOE architecture.

Therefore days ago, I implemented an interception that unconditionally
disables Operator DispatchGmmCombineDecode whenever the speculative mode
is EAGLE or EAGLE-3.
vllm-project#5293
However, this approach was not precise enough.
This PR further refines the logic by specifically identifying the draft
model's configuration: Operator DispatchGmmCombineDecode will now be
disabled only when the draft model uses an MOE architecture and is
non-W8A8.

More info about this operator, please refer to RFC: issue
vllm-project#5476


Signed-off-by: wangqiankun <wangqiankun13@huawei.com>
tangtiangu pushed a commit to tangtiangu/jiusi-vllm-ascend that referenced this pull request Feb 24, 2026
…oe with w8a8, or not moe (vllm-project#6081)

### What this PR does / why we need it?
This PR is cherry-picked from
vllm-project#5758.

Operator DispatchGmmCombineDecode does not support non-W8A8 scenarios
and cannot share the same communication domain with Operator
Dispatch/Combine.

for instance, when the draft model uses a non-W8A8 MOE architecture
while the main model employs a W8A8 MOE architecture.

Therefore days ago, I implemented an interception that unconditionally
disables Operator DispatchGmmCombineDecode whenever the speculative mode
is EAGLE or EAGLE-3.
vllm-project#5293
However, this approach was not precise enough.
This PR further refines the logic by specifically identifying the draft
model's configuration: Operator DispatchGmmCombineDecode will now be
disabled only when the draft model uses an MOE architecture and is
non-W8A8.

More info about this operator, please refer to RFC: issue
vllm-project#5476


Signed-off-by: wangqiankun <wangqiankun13@huawei.com>
tangtiangu pushed a commit to tangtiangu/jiusi-vllm-ascend that referenced this pull request Feb 24, 2026
…oe with w8a8, or not moe (vllm-project#6081)

### What this PR does / why we need it?
This PR is cherry-picked from
vllm-project#5758.

Operator DispatchGmmCombineDecode does not support non-W8A8 scenarios
and cannot share the same communication domain with Operator
Dispatch/Combine.

for instance, when the draft model uses a non-W8A8 MOE architecture
while the main model employs a W8A8 MOE architecture.

Therefore days ago, I implemented an interception that unconditionally
disables Operator DispatchGmmCombineDecode whenever the speculative mode
is EAGLE or EAGLE-3.
vllm-project#5293
However, this approach was not precise enough.
This PR further refines the logic by specifically identifying the draft
model's configuration: Operator DispatchGmmCombineDecode will now be
disabled only when the draft model uses an MOE architecture and is
non-W8A8.

More info about this operator, please refer to RFC: issue
vllm-project#5476


Signed-off-by: wangqiankun <wangqiankun13@huawei.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ready read for review ready-for-test start test by label for PR

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants