Skip to content

[Bugfix] Correct block shape logic in WNA16 MoE triton kernel#31686

Closed
JartX wants to merge 2 commits intovllm-project:mainfrom
JartX:bugfix/correct_block_shape_logic_WNA16MOE
Closed

[Bugfix] Correct block shape logic in WNA16 MoE triton kernel#31686
JartX wants to merge 2 commits intovllm-project:mainfrom
JartX:bugfix/correct_block_shape_logic_WNA16MOE

Conversation

@JartX
Copy link
Copy Markdown
Contributor

@JartX JartX commented Jan 4, 2026

This PR addresses an inference regression introduced in PR #31050. Specifically, it fixes a failure when loading models that use WNA16 Triton.

Note: This PR depends on PR #31663, which contains the fix for the failure introduced in PR #31533.

CC: @zyongye @robertgshaw2-redhat @jeejeelee — please take a look. :)

Signed-off-by: JartX <sagformas@epdcenter.es>
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a bugfix for the WNA16 MoE Triton kernel by correcting the block shape logic. It ensures that BLOCK_SIZE_N and BLOCK_SIZE_K are always set in the configuration before the kernel grid is defined, which resolves an inference regression. Additionally, the code is refactored to use a new dispatch_fused_moe_kernel function, which adds support for zero-point quantization (w1_zp and w2_zp) for the expert weights. The changes appear correct and improve the robustness and functionality of the fused MoE layer.

@JartX
Copy link
Copy Markdown
Contributor Author

JartX commented Jan 5, 2026

#31752 works fine close it

@JartX JartX closed this Jan 5, 2026
@JartX JartX deleted the bugfix/correct_block_shape_logic_WNA16MOE branch March 15, 2026 16:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant