fused_moe_kernel - cast accumulator after applying router weights by gnovack · Pull Request #32002 · vllm-project/vllm

gnovack · 2026-01-09T00:54:52Z

Purpose

The tests/lora/test_olmoe_tp.py test_olmoe_lora_mixed test case has been failing since #31676 was merged. Previously the application of moe_weight (router weights) to the accumulator was performed in float32, but after #31676, this computation is now done after accumulator has been cast to compute_type.

This change in behavior caused the above test case to begin failing, and introduced a slight accuracy degradation based on lm-eval results on gsm8k and mmlu_pro.

Results Before #31676:

gsm8k:  0.680
mmlu_pro: 0.2229

Results on main:

gsm8k:  0.676
mmlu_pro: 0.2218

Test Plan

Run MoE Tests
Run Failing LoRA test
Rerun lm-eval tests

Test Result

tests/lora/test_olmoe_tp.py test_olmoe_lora_mixed is passing now.

lm-eval results after the change in this PR:

gsm8k: 0.2229
mmlu_pro: 0.680

Note

^{Cursor Bugbot is generating a summary for commit e78572a. Configure here.}

Note

Adjusts compute order in the Triton fused_moe_kernel to improve numerical behavior.

Applies b_scale/a_scale dequant multipliers without early casting; defers accumulator.to(compute_type) until after adding bias and multiplying router moe_weight
Simplifies dequant branches: for int8_w8a16, multiply by b_scale; for fp8_w8a8/int8_w8a8 tensor-wise or per-channel (non-block-wise), multiply by a_scale * b_scale
No API changes; affects only internal accumulation/casting order in vllm/model_executor/layers/fused_moe/fused_moe.py

^{Written by Cursor Bugbot for commit e78572a. This will update automatically on new commits. Configure here.}

Signed-off-by: gnovack <gnovack@amazon.com>

gemini-code-assist

Code Review

This pull request correctly addresses the accuracy regression and test failures by ensuring the router weights are applied to the accumulator in float32 precision before casting to the compute_type. The change moves the type casting to after the router weight multiplication, which restores the intended numerical behavior. The code is also simplified by removing redundant type casts. The changes look good and are well-justified by the test results provided in the description.

varun-sundar-rabindranath

LGTM! Thanks @gnovack

varun-sundar-rabindranath · 2026-01-10T03:08:23Z

cc @mgoin PTAL 🙌

mgoin

Seems reasonable to me, thanks for the fix

…lm-project#32002) Signed-off-by: gnovack <gnovack@amazon.com>

…lm-project#32002) Signed-off-by: gnovack <gnovack@amazon.com> Signed-off-by: dsuhinin <suhinin.dmitriy@gmail.com>

…lm-project#32002) Signed-off-by: gnovack <gnovack@amazon.com>

fused_moe_kernel - cast accumulator after applying router weights

e78572a

Signed-off-by: gnovack <gnovack@amazon.com>

gnovack requested review from mgoin and pavanimajety as code owners January 9, 2026 00:54

gemini-code-assist bot reviewed Jan 9, 2026

View reviewed changes

varun-sundar-rabindranath approved these changes Jan 10, 2026

View reviewed changes

mgoin approved these changes Jan 10, 2026

View reviewed changes

mgoin added ready ONLY add when PR is ready to merge/full CI is needed ci-failure Issue about an unexpected test failure in CI labels Jan 10, 2026

github-project-automation bot added this to CI Failures Jan 10, 2026

DarkLight1337 merged commit d1fd802 into vllm-project:main Jan 10, 2026
58 of 59 checks passed

github-project-automation bot moved this to Done in CI Failures Jan 10, 2026

akh64bit pushed a commit to akh64bit/vllm that referenced this pull request Jan 16, 2026

fused_moe_kernel - cast accumulator after applying router weights (vl…

ed90d39

…lm-project#32002) Signed-off-by: gnovack <gnovack@amazon.com>

dsuhinin pushed a commit to dsuhinin/vllm that referenced this pull request Jan 21, 2026

fused_moe_kernel - cast accumulator after applying router weights (vl…

8a9bf4e

…lm-project#32002) Signed-off-by: gnovack <gnovack@amazon.com> Signed-off-by: dsuhinin <suhinin.dmitriy@gmail.com>

ItzDEXX pushed a commit to ItzDEXX/vllm that referenced this pull request Feb 19, 2026

fused_moe_kernel - cast accumulator after applying router weights (vl…

56251e5

…lm-project#32002) Signed-off-by: gnovack <gnovack@amazon.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fused_moe_kernel - cast accumulator after applying router weights#32002

fused_moe_kernel - cast accumulator after applying router weights#32002
DarkLight1337 merged 1 commit intovllm-project:mainfrom
gnovack:fused-moe-cast-fix

gnovack commented Jan 9, 2026 •

edited by github-actions bot

Loading

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

varun-sundar-rabindranath left a comment

Uh oh!

varun-sundar-rabindranath commented Jan 10, 2026

Uh oh!

mgoin left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Uh oh!

Conversation

gnovack commented Jan 9, 2026 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Test Plan

Test Result

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

varun-sundar-rabindranath left a comment

Choose a reason for hiding this comment

Uh oh!

varun-sundar-rabindranath commented Jan 10, 2026

Uh oh!

mgoin left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

gnovack commented Jan 9, 2026 •

edited by github-actions bot

Loading