Revert "[Bugfix] Fix fused MoE IMA (sans chunking) by using int64 for strides" by mgoin · Pull Request #34530 · vllm-project/vllm

mgoin · 2026-02-13T18:33:36Z

Reverts #34279 due to large performance degradations reported. We will search for a similar result with more careful performance analysis later

… strides…" This reverts commit d7982da.

gemini-code-assist

Code Review

This pull request correctly reverts the explicit typing of stride parameters to tl.int64 in the fused_moe_kernel_gptq_awq and fused_moe_kernel Triton kernels. The motivation for this revert is to address significant performance degradations introduced by the original change. While this action knowingly reintroduces a bug concerning potential integer overflows with very large tensors, it is a pragmatic trade-off to restore performance. The intention to investigate a more performant solution for the overflow issue is acknowledged. The revert is implemented correctly.

…egression PR vllm-project#34279 annotated all stride parameters as tl.int64 to fix an int32 overflow crash, but this caused ~60x perf regression on small GPUs (e.g. NVIDIA GB10) due to register pressure. PR vllm-project#34530 reverted that fix. This patch prevents the overflow with minimal register impact by casting offs_token to int64 after loading instead of widening all strides. When chunking is disabled and M is large, stride_cm * offs_token (where stride_cm = N = w1.size(1) and offs_token up to M*topk) can exceed int32 max. The cast leverages Triton type promotion (int32 * int64 -> int64) following the existing pattern used for off_experts and offs_bn. Adds a regression test that disables chunking with M=100000, n=2048, topk=6 (product = 4096 * 600000 = 2.46B > int32 max) and validates correctness against the torch_moe reference. Fixes vllm-project#34413 Signed-off-by: haosdent <haosdent@gmail.com>

… strides" (vllm-project#34530) Signed-off-by: wzhao18 <wzhao18.sz@gmail.com>

… strides" (vllm-project#34530) Signed-off-by: Eldar Kurtic <research@neuralmagic.com>

… strides" (vllm-project#34530) Signed-off-by: zjy0516 <riverclouds.zhu@qq.com>

… strides" (vllm-project#34530)

Revert "[Bugfix] Fix fused MoE IMA (sans chunking) by using int64 for…

aeed7ac

… strides…" This reverts commit d7982da.

mgoin requested a review from pavanimajety as a code owner February 13, 2026 18:33

mgoin mentioned this pull request Feb 13, 2026

[Bugfix] Fix fused MoE int32 overflow in stride*offset without perf regression #34507

Merged

5 tasks

mergify bot added the bug Something isn't working label Feb 13, 2026

gemini-code-assist bot reviewed Feb 13, 2026

View reviewed changes

tlrmchlsmth approved these changes Feb 13, 2026

View reviewed changes

mgoin mentioned this pull request Feb 13, 2026

[Bugfix] Fix fused MoE IMA (sans chunking) by using int64 for strides #34279

Merged

4 tasks

vllm-bot merged commit bfaa559 into main Feb 13, 2026
11 of 12 checks passed

vllm-bot deleted the revert-34279-fix-fused-moe-int64-strides branch February 13, 2026 18:35

wzhao18 pushed a commit to wzhao18/vllm that referenced this pull request Feb 18, 2026

Revert "[Bugfix] Fix fused MoE IMA (sans chunking) by using int64 for…

6a3e28e

… strides" (vllm-project#34530) Signed-off-by: wzhao18 <wzhao18.sz@gmail.com>

eldarkurtic pushed a commit to eldarkurtic/vllm that referenced this pull request Feb 19, 2026

Revert "[Bugfix] Fix fused MoE IMA (sans chunking) by using int64 for…

1bfac43

… strides" (vllm-project#34530) Signed-off-by: Eldar Kurtic <research@neuralmagic.com>

ZJY0516 pushed a commit to ZJY0516/vllm that referenced this pull request Feb 23, 2026

Revert "[Bugfix] Fix fused MoE IMA (sans chunking) by using int64 for…

0de1cda

… strides" (vllm-project#34530) Signed-off-by: zjy0516 <riverclouds.zhu@qq.com>

llsj14 pushed a commit to llsj14/vllm that referenced this pull request Mar 1, 2026

Revert "[Bugfix] Fix fused MoE IMA (sans chunking) by using int64 for…

fbb703d

… strides" (vllm-project#34530)

tunglinwood pushed a commit to tunglinwood/vllm that referenced this pull request Mar 4, 2026

Revert "[Bugfix] Fix fused MoE IMA (sans chunking) by using int64 for…

3d360f3

… strides" (vllm-project#34530)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Revert "[Bugfix] Fix fused MoE IMA (sans chunking) by using int64 for strides"#34530

Revert "[Bugfix] Fix fused MoE IMA (sans chunking) by using int64 for strides"#34530
vllm-bot merged 1 commit intomainfrom
revert-34279-fix-fused-moe-int64-strides

mgoin commented Feb 13, 2026

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

Conversation

mgoin commented Feb 13, 2026

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants