[v0.13.0][BugFix][Cherry Pick] Fix input parameter bug of dispatch_gmm_combine_decode by wangqiankun13 · Pull Request #5931 · vllm-project/vllm-ascend

wangqiankun13 · 2026-01-15T11:32:00Z

What this PR does / why we need it?

This PR is cherry-picked from PR5932.

In #5040, the dispatch_gmm_combine_decode operator was configured with an incorrect global_bs parameter. This PR is to fix the bug.

The global_bs provided as input should have the same meaning as in the moe_distributed_dispatch operator, specifically: (the maximum batch size across all cards) * (expert parallel world size).
However, the implementation incorrectly used the variable max_num_tokens, which does not account for tensor parallelism. This error likely resulted in an unnecessarily large (overestimated) value.

More info about this operator, please refer to RFC: issue #5476

Does this PR introduce any user-facing change?

How was this patch tested?

gemini-code-assist

Code Review

This pull request correctly fixes a critical bug in the dispatch_gmm_combine_decode operator. The global_bs parameter was previously calculated incorrectly as it did not account for tensor parallelism, which could lead to runtime errors or incorrect results. The change ensures global_bs is calculated consistently with other MoE operators by factoring in the tensor parallel size. The removal of the now-unused fused_global_bs attribute is also a good code cleanup. The changes are correct and improve the robustness of the fused MoE implementation.

Signed-off-by: wangqiankun <wangqiankun13@huawei.com>

…lm-ascend into FIA_v0.13.0 * 'releases/v0.13.0' of https://github.com/vllm-project/vllm-ascend: [Feature][Cherry Pick]Enable DispatchGmmCombineDecode when eagle is moe with w8a8, or not moe (vllm-project#6081) [v0.13.0][BugFix][Cherry Pick] Fix input parameter bug of dispatch_gmm_combine_decode (vllm-project#5931) [0.13.0][Bugfix] Fix Triton operator usage for multimodal models based on `the mrope_interleaved` parameter (vllm-project#6074) [v0.13.0][CI] Upgrade to CANN 8.5.0 (vllm-project#6101)

…m_combine_decode (vllm-project#5931) ### What this PR does / why we need it? This PR is cherry-picked from [PR5932](vllm-project#5932). In vllm-project#5040, the dispatch_gmm_combine_decode operator was configured with an incorrect global_bs parameter. This PR is to fix the bug. The global_bs provided as input should have the same meaning as in the moe_distributed_dispatch operator, specifically: (the maximum batch size across all cards) * (expert parallel world size). However, the implementation incorrectly used the variable max_num_tokens, which does not account for tensor parallelism. This error likely resulted in an unnecessarily large (overestimated) value. More info about this operator, please refer to RFC: issue vllm-project#5476 Signed-off-by: wangqiankun <wangqiankun13@huawei.com>

gemini-code-assist bot reviewed Jan 15, 2026

View reviewed changes

[BugFix] Fix input parameter bug of dispatch_gmm_combine_decode

fa6bc4e

Signed-off-by: wangqiankun <wangqiankun13@huawei.com>

wangqiankun13 force-pushed the v0.13.0-fix-global-bs-bug branch from 02f41ca to fa6bc4e Compare January 16, 2026 01:33

wangqiankun13 changed the title ~~[BugFix] Fix input parameter bug of dispatch_gmm_combine_decode~~ [BugFix][Cherry Pick] Fix input parameter bug of dispatch_gmm_combine_decode Jan 20, 2026

wangqiankun13 changed the title ~~[BugFix][Cherry Pick] Fix input parameter bug of dispatch_gmm_combine_decode~~ [v0.13.0][BugFix][Cherry Pick] Fix input parameter bug of dispatch_gmm_combine_decode Jan 21, 2026

weijinqian0 approved these changes Jan 21, 2026

View reviewed changes

wangxiyuan added ready read for review ready-for-test start test by label for PR labels Jan 21, 2026

wangxiyuan merged commit 1548008 into vllm-project:releases/v0.13.0 Jan 22, 2026
20 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[v0.13.0][BugFix][Cherry Pick] Fix input parameter bug of dispatch_gmm_combine_decode#5931

[v0.13.0][BugFix][Cherry Pick] Fix input parameter bug of dispatch_gmm_combine_decode#5931
wangxiyuan merged 1 commit intovllm-project:releases/v0.13.0from
wangqiankun13:v0.13.0-fix-global-bs-bug

wangqiankun13 commented Jan 15, 2026 •

edited

Loading

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

wangqiankun13 commented Jan 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What this PR does / why we need it?

Does this PR introduce any user-facing change?

How was this patch tested?

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

wangqiankun13 commented Jan 15, 2026 •

edited

Loading