[Bugfix] Correct block shape logic in WNA16 MoE triton kernel by JartX · Pull Request #31686 · vllm-project/vllm

JartX · 2026-01-04T18:14:54Z

This PR addresses an inference regression introduced in PR #31050. Specifically, it fixes a failure when loading models that use WNA16 Triton.

Note: This PR depends on PR #31663, which contains the fix for the failure introduced in PR #31533.

CC: @zyongye @robertgshaw2-redhat @jeejeelee — please take a look. :)

Signed-off-by: JartX <sagformas@epdcenter.es>

gemini-code-assist

Code Review

This pull request introduces a bugfix for the WNA16 MoE Triton kernel by correcting the block shape logic. It ensures that BLOCK_SIZE_N and BLOCK_SIZE_K are always set in the configuration before the kernel grid is defined, which resolves an inference regression. Additionally, the code is refactored to use a new dispatch_fused_moe_kernel function, which adds support for zero-point quantization (w1_zp and w2_zp) for the expert weights. The changes appear correct and improve the robustness and functionality of the fused MoE layer.

JartX · 2026-01-05T23:39:21Z

#31752 works fine close it

[Bugfix] Correct block shape logic in WNA16 MoE triton kernel

21a986b

Signed-off-by: JartX <sagformas@epdcenter.es>

JartX requested review from mgoin and pavanimajety as code owners January 4, 2026 18:14

gemini-code-assist bot reviewed Jan 4, 2026

View reviewed changes

JartX mentioned this pull request Jan 4, 2026

[MoE Refactor] Split invoke_fused_moe_kernel #31050

Merged

Merge branch 'main' into bugfix/correct_block_shape_logic_WNA16MOE

5de77ba

zyongye mentioned this pull request Jan 5, 2026

[MoE Refactoring][Bugfix]Wrap WNA16 Triton kernel into mk and change compressed tensor kernel selection #31752

Merged

JartX closed this Jan 5, 2026

JartX deleted the bugfix/correct_block_shape_logic_WNA16MOE branch March 15, 2026 16:51

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Bugfix] Correct block shape logic in WNA16 MoE triton kernel#31686

[Bugfix] Correct block shape logic in WNA16 MoE triton kernel#31686
JartX wants to merge 2 commits intovllm-project:mainfrom
JartX:bugfix/correct_block_shape_logic_WNA16MOE

JartX commented Jan 4, 2026 •

edited by github-actions bot

Loading

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

JartX commented Jan 5, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

JartX commented Jan 4, 2026 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

JartX commented Jan 5, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

JartX commented Jan 4, 2026 •

edited by github-actions bot

Loading