[Bugfix] Fix some issues with MoERunner PR #32344 by bnellnm · Pull Request #34371 · vllm-project/vllm

bnellnm · 2026-02-11T19:24:12Z

Purpose

Move ensure_moe_quant_config call from FusedMoE.forward_native into _moe_forward and _moe_forward_shared. This is closer to how it was before when it was hidden inside the custom op. It should avoid torch.compile issues.

Fix handling of gate. The use_overlapped flag should have been checked before returning _gate.

Possible fix for #34357

Test Plan

Ran openai/gpt-oss-20b
Tested #34357, was able to repro it with a revision earlier than #32344
Ran nvidia/DeepSeek-R1-NVFP4

Test Result

cc @mgoin , @robertgshaw2-redhat

Signed-off-by: Bill Nell <bnell@redhat.com>

gemini-code-assist

Code Review

This pull request introduces two key bug fixes for the MoE runner. First, it corrects the handling of the gate property in FusedMoE by checking the use_overlapped flag, ensuring the router is invoked correctly based on whether shared expert computation is overlapped. This aligns the gate's behavior with the shared_experts property. Second, it moves the ensure_moe_quant_config_init call into the _moe_forward and _moe_forward_shared custom op implementations. This is a good change to prevent issues with torch.compile by ensuring that this initialization logic with side effects is executed at runtime rather than during graph tracing. The changes are well-reasoned and improve the correctness and robustness of the MoE implementation. I have no further comments.

mgoin · 2026-02-11T19:48:10Z

This seems to fix my gpt-oss H200 issue

Main: fails

This PR:

vllm serve openai/gpt-oss-20b -tp=1 --port 9000

python tests/evals/gsm8k/gsm8k_eval.py --port 9000                                                    
Running GSM8K evaluation: 1319 questions, 5-shot
Evaluating: 100%|█████████████████████████████████████████████████████████████████████████████████████████| 1319/1319 [00:37<00:00, 34.92it/s]

Results:
Accuracy: 0.293
Invalid responses: 0.174
Total latency: 37.783 s
Questions per second: 34.910
Total output tokens: 308540
Output tokens per second: 8166.192

vllm/model_executor/layers/fused_moe/runner/default_moe_runner.py

Signed-off-by: Bill Nell <bnell@redhat.com>

…roject#34371) Signed-off-by: Bill Nell <bnell@redhat.com>

The code fix landed via vllm-project#34371 (31d992d). This adds a regression test to prevent future regressions: test_w4a16_moe_torch_compile loads a W4A16 MoE model with enforce_eager=False and verifies inference succeeds without the "Hidden size mismatch" assertion error. Signed-off-by: Matthias Gehre <matthias.gehre@amd.com>

…roject#34371) Signed-off-by: Bill Nell <bnell@redhat.com> Signed-off-by: Eldar Kurtic <research@neuralmagic.com>

…roject#34371) Signed-off-by: Bill Nell <bnell@redhat.com>

bnellnm added 2 commits February 11, 2026 12:54

[Bugfix] Move ensure_moe_quant_config_init calls inside moe_forward ops

c1644f8

Signed-off-by: Bill Nell <bnell@redhat.com>

fix gate handling

1ddd73e

Signed-off-by: Bill Nell <bnell@redhat.com>

bnellnm requested review from mgoin and pavanimajety as code owners February 11, 2026 19:24

mergify bot added the bug Something isn't working label Feb 11, 2026

gemini-code-assist bot reviewed Feb 11, 2026

View reviewed changes

robertgshaw2-redhat reviewed Feb 11, 2026

View reviewed changes

vllm/model_executor/layers/fused_moe/runner/default_moe_runner.py Show resolved Hide resolved

add comments

26605df

Signed-off-by: Bill Nell <bnell@redhat.com>

bnellnm requested a review from robertgshaw2-redhat February 11, 2026 21:05

robertgshaw2-redhat approved these changes Feb 11, 2026

View reviewed changes

robertgshaw2-redhat enabled auto-merge (squash) February 11, 2026 21:06

github-actions bot added the ready ONLY add when PR is ready to merge/full CI is needed label Feb 11, 2026

Merge branch 'main' into move-init-call

2e79ded

mgoin approved these changes Feb 11, 2026

View reviewed changes

vllm-bot merged commit 31d992d into vllm-project:main Feb 11, 2026
47 of 54 checks passed

bnellnm deleted the move-init-call branch February 11, 2026 23:02

AndreasKaratzas mentioned this pull request Feb 11, 2026

[CI Failure]: mi325_1: ROCm GPT-OSS Eval #34162

Closed

3 tasks

warichet pushed a commit to warichet/vllm that referenced this pull request Feb 12, 2026

[Bugfix] Fix some issues with MoERunner PR vllm-project#32344 (vllm-p…

a7b5dba

…roject#34371) Signed-off-by: Bill Nell <bnell@redhat.com>

mgehre-amd mentioned this pull request Feb 12, 2026

[Bugfix] Add regression test for MoE quant_config under torch.compile #34335

Merged

laudney mentioned this pull request Feb 16, 2026

[Bugfix][ROCm] Fix WNA16 MoE quant config init and Qwen3-VL tie_word_embeddings #34630

Closed

3 tasks

eldarkurtic pushed a commit to eldarkurtic/vllm that referenced this pull request Feb 19, 2026

[Bugfix] Fix some issues with MoERunner PR vllm-project#32344 (vllm-p…

394c7f9

…roject#34371) Signed-off-by: Bill Nell <bnell@redhat.com> Signed-off-by: Eldar Kurtic <research@neuralmagic.com>

llsj14 pushed a commit to llsj14/vllm that referenced this pull request Mar 1, 2026

[Bugfix] Fix some issues with MoERunner PR vllm-project#32344 (vllm-p…

7723eb8

…roject#34371) Signed-off-by: Bill Nell <bnell@redhat.com>

tunglinwood pushed a commit to tunglinwood/vllm that referenced this pull request Mar 4, 2026

[Bugfix] Fix some issues with MoERunner PR vllm-project#32344 (vllm-p…

8e7fe95

…roject#34371) Signed-off-by: Bill Nell <bnell@redhat.com>

liuchenbing2026 pushed a commit to liuchenbing2026/vllm that referenced this pull request Apr 4, 2026

[Bugfix] Fix some issues with MoERunner PR vllm-project#32344 (vllm-p…

4605880

…roject#34371) Signed-off-by: Bill Nell <bnell@redhat.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Bugfix] Fix some issues with MoERunner PR #32344#34371

[Bugfix] Fix some issues with MoERunner PR #32344#34371
vllm-bot merged 4 commits intovllm-project:mainfrom
neuralmagic:move-init-call

bnellnm commented Feb 11, 2026 •

edited by github-actions bot

Loading

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

mgoin commented Feb 11, 2026 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Uh oh!

Conversation

bnellnm commented Feb 11, 2026 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Test Plan

Test Result

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

mgoin commented Feb 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

bnellnm commented Feb 11, 2026 •

edited by github-actions bot

Loading

mgoin commented Feb 11, 2026 •

edited

Loading