Skip to content

[Bugfix][MoE] Only unpad routed output before shared expert add#40865

Merged
tomeras91 merged 4 commits intovllm-project:mainfrom
netanel-haber:bugfix/only-truncate-padded-fused-output-for-shared-output
Apr 25, 2026
Merged

[Bugfix][MoE] Only unpad routed output before shared expert add#40865
tomeras91 merged 4 commits intovllm-project:mainfrom
netanel-haber:bugfix/only-truncate-padded-fused-output-for-shared-output

Conversation

@netanel-haber
Copy link
Copy Markdown
Contributor

@netanel-haber netanel-haber commented Apr 25, 2026

Only trim padded routed output before shared+routed add / latent moe up-proj. No-shared MoE keeps late truncation.

Not duplicate: #40853 (draft auto-pr) reverts #40794; this preserves the shared-output fix.

I've reproduced the following error locally before this pr's change, and the test passes with the change:

python -m pytest -sv "./tests/evals/gpt_oss/test_gpqa_correctness.py::test_gpqa_correctness[gpt-oss-20b-flashinfer-mxfp4-bf16]" --config-list-file=configs/models-b200.txt

The PR skips fused truncation when there is no shared expert (or latent moe up-proj), as is the case with GPT-OSS. This restores GPT-OSS behavior to the state before #40794, while preserving #40794’s intent for Nemotron-Nano-v3: applying truncation before addition, which was broken by #35949.

There may still be an open Nemotron-Nano-v3 FI NVFP4 bug under investigation. This PR, like the previous one, does not attempt to fix that; it only aims to keep GPT-OSS working and make Nemotron-Nano-v3 TP=1 functional.

AI assistance was used.

Signed-off-by: Netanel Haber <58652339+netanel-haber@users.noreply.github.com>
Copy link
Copy Markdown

@claude claude Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Claude Code Review

This pull request is from a fork — automated review is disabled. A repository maintainer can comment @claude review to run a one-time review.

@mergify mergify Bot added the bug Something isn't working label Apr 25, 2026
@netanel-haber
Copy link
Copy Markdown
Contributor Author

@bnellnm - I think this would fix the gptoss failure. Is there a way to manually trigger the gptoss test on this pr?

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request adds a comment clarifying the purpose of tracking routed hidden dimensions and introduces a conditional check when truncating fused outputs. The review identifies a potential shape mismatch bug in latent MoE configurations where shared experts are absent, suggesting an updated condition to ensure truncation occurs when either shared output or a routed output transform is present.

Comment thread vllm/model_executor/layers/fused_moe/runner/moe_runner.py Outdated
Comment thread vllm/model_executor/layers/fused_moe/runner/moe_runner.py Outdated
@bnellnm
Copy link
Copy Markdown
Collaborator

bnellnm commented Apr 25, 2026

@bnellnm - I think this would fix the gptoss failure. Is there a way to manually trigger the gptoss test on this pr?

There was a similar failure when the truncate came before the reduce in a prior PR that was fixed by moving the trunc afterwards.

@netanel-haber
Copy link
Copy Markdown
Contributor Author

I ran the GPQA test before and after the pr locally (b200X2) and the pr fixes it @bnellnm

@netanel-haber
Copy link
Copy Markdown
Contributor Author

What the pr basically does is not trigger the truncation if there is no shared expert, which is the case for gptoss, so it's behavior is reverted to pre pr, while conserving the behavior of my original pr, which is truncation pre addition when there is a shared expert, as in the case of nemotron nano v3.

…here are no shared experts, for latent models

Signed-off-by: Netanel Haber <58652339+netanel-haber@users.noreply.github.com>
Signed-off-by: Netanel Haber <58652339+netanel-haber@users.noreply.github.com>
Copy link
Copy Markdown
Member

@tomeras91 tomeras91 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After making sure truncation is applied if either shared experts or a routed_output_transform is used, LGTM

@tomeras91 tomeras91 enabled auto-merge (squash) April 25, 2026 19:21
@tomeras91 tomeras91 merged commit 12a3f64 into vllm-project:main Apr 25, 2026
66 checks passed
Dao007forever pushed a commit to Dao007forever/vllm that referenced this pull request Apr 26, 2026
…uted output transform (vllm-project#40865)

Signed-off-by: Netanel Haber <58652339+netanel-haber@users.noreply.github.com>
Co-authored-by: Jee Jee Li <pandaleefree@gmail.com>
Signed-off-by: Dao Le <Dao007forever@gmail.com>
avinashsingh77 pushed a commit to avinashsingh77/vllm that referenced this pull request Apr 27, 2026
…uted output transform (vllm-project#40865)

Signed-off-by: Netanel Haber <58652339+netanel-haber@users.noreply.github.com>
Co-authored-by: Jee Jee Li <pandaleefree@gmail.com>
Signed-off-by: Avinash Singh <avinashsingh.rcoem@gmail.com>
jatseng-ai pushed a commit to jatseng-ai/vllm that referenced this pull request Apr 28, 2026
…uted output transform (vllm-project#40865)

Signed-off-by: Netanel Haber <58652339+netanel-haber@users.noreply.github.com>
Co-authored-by: Jee Jee Li <pandaleefree@gmail.com>
Lafunamor pushed a commit to Lafunamor/vllm that referenced this pull request May 1, 2026
…uted output transform (vllm-project#40865)

Signed-off-by: Netanel Haber <58652339+netanel-haber@users.noreply.github.com>
Co-authored-by: Jee Jee Li <pandaleefree@gmail.com>
Signed-off-by: Adrian <info@zzit.ch>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working ready ONLY add when PR is ready to merge/full CI is needed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants