Skip to content

[Bugfix] Fix some issues with MoERunner PR #32344#34371

Merged
vllm-bot merged 4 commits intovllm-project:mainfrom
neuralmagic:move-init-call
Feb 11, 2026
Merged

[Bugfix] Fix some issues with MoERunner PR #32344#34371
vllm-bot merged 4 commits intovllm-project:mainfrom
neuralmagic:move-init-call

Conversation

@bnellnm
Copy link
Copy Markdown
Collaborator

@bnellnm bnellnm commented Feb 11, 2026

Purpose

Move ensure_moe_quant_config call from FusedMoE.forward_native into _moe_forward and _moe_forward_shared. This is closer to how it was before when it was hidden inside the custom op. It should avoid torch.compile issues.

Fix handling of gate. The use_overlapped flag should have been checked before returning _gate.

Possible fix for #34357

Test Plan

Ran openai/gpt-oss-20b
Tested #34357, was able to repro it with a revision earlier than #32344
Ran nvidia/DeepSeek-R1-NVFP4

Test Result

cc @mgoin , @robertgshaw2-redhat

Signed-off-by: Bill Nell <bnell@redhat.com>
@mergify mergify bot added the bug Something isn't working label Feb 11, 2026
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces two key bug fixes for the MoE runner. First, it corrects the handling of the gate property in FusedMoE by checking the use_overlapped flag, ensuring the router is invoked correctly based on whether shared expert computation is overlapped. This aligns the gate's behavior with the shared_experts property. Second, it moves the ensure_moe_quant_config_init call into the _moe_forward and _moe_forward_shared custom op implementations. This is a good change to prevent issues with torch.compile by ensuring that this initialization logic with side effects is executed at runtime rather than during graph tracing. The changes are well-reasoned and improve the correctness and robustness of the MoE implementation. I have no further comments.

@mgoin
Copy link
Copy Markdown
Member

mgoin commented Feb 11, 2026

This seems to fix my gpt-oss H200 issue

Main: fails

This PR:

vllm serve openai/gpt-oss-20b -tp=1 --port 9000

python tests/evals/gsm8k/gsm8k_eval.py --port 9000                                                    
Running GSM8K evaluation: 1319 questions, 5-shot
Evaluating: 100%|█████████████████████████████████████████████████████████████████████████████████████████| 1319/1319 [00:37<00:00, 34.92it/s]

Results:
Accuracy: 0.293
Invalid responses: 0.174
Total latency: 37.783 s
Questions per second: 34.910
Total output tokens: 308540
Output tokens per second: 8166.192

Signed-off-by: Bill Nell <bnell@redhat.com>
@robertgshaw2-redhat robertgshaw2-redhat enabled auto-merge (squash) February 11, 2026 21:06
@github-actions github-actions bot added the ready ONLY add when PR is ready to merge/full CI is needed label Feb 11, 2026
@vllm-bot vllm-bot merged commit 31d992d into vllm-project:main Feb 11, 2026
47 of 54 checks passed
@bnellnm bnellnm deleted the move-init-call branch February 11, 2026 23:02
warichet pushed a commit to warichet/vllm that referenced this pull request Feb 12, 2026
mgehre-amd added a commit to mgehre-amd/vllm that referenced this pull request Feb 18, 2026
The code fix landed via vllm-project#34371 (31d992d). This adds a regression test
to prevent future regressions: test_w4a16_moe_torch_compile loads a
W4A16 MoE model with enforce_eager=False and verifies inference
succeeds without the "Hidden size mismatch" assertion error.

Signed-off-by: Matthias Gehre <matthias.gehre@amd.com>
eldarkurtic pushed a commit to eldarkurtic/vllm that referenced this pull request Feb 19, 2026
…roject#34371)

Signed-off-by: Bill Nell <bnell@redhat.com>
Signed-off-by: Eldar Kurtic <research@neuralmagic.com>
llsj14 pushed a commit to llsj14/vllm that referenced this pull request Mar 1, 2026
tunglinwood pushed a commit to tunglinwood/vllm that referenced this pull request Mar 4, 2026
liuchenbing2026 pushed a commit to liuchenbing2026/vllm that referenced this pull request Apr 4, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working ready ONLY add when PR is ready to merge/full CI is needed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants