[Bugfix][MoE] Unpad routed output before shared expert add [Fixes #35949] by netanel-haber · Pull Request #40794 · vllm-project/vllm

netanel-haber · 2026-04-24T09:56:30Z

Fixes https://huggingface.co/nvidia/NVIDIA-Nemotron-3-Nano-30B-A3B-NVFP4.

FI TRTLLM NVFP4 MoE can pad the routed hidden dim, e.g. 2688 -> 2816, via align_trtllm_fp4_moe_hidden_dim_for_fi.

Before #35949, FusedMoE returned routed and shared outputs separately. The routed output was truncated back to the original hidden dim before model code added the shared expert output, so the world looked like:

routed kernel output: [tokens, 2816] -> truncate -> [tokens, 2688]
shared output:        [tokens, 2688]
add:                  [tokens, 2688] + [tokens, 2688]

#35949 moved the shared/routed add into MoERunner. That changed the order to add first and truncate later:

routed kernel output: [tokens, 2816]
shared output:        [tokens, 2688]
add:                  [tokens, 2816] + [tokens, 2688]

Dynamo catches this during fake tensor tracing as a shape mismatch.

This PR records the routed hidden dim before _maybe_pad_hidden_states() and trims the fused routed output back to that dim before shared expert addition.

DailyOmni is on par for nano-v3-omni before and after this pr.

Signed-off-by: Netanel Haber <nhaber@nvidia.com>

claude

Claude Code Review

This pull request is from a fork — automated review is disabled. A repository maintainer can comment @claude review to run a one-time review.

gemini-code-assist

Code Review

This pull request introduces logic to handle hidden dimension padding in the Fused MoE runner. It records the original hidden dimension before potential padding and ensures that the fused output is sliced back to its original size if padding was applied. I have no feedback to provide as there are no review comments to evaluate.

tomeras91 · 2026-04-24T10:13:10Z

Thanks! Approved

Cc @robertgshaw2-redhat

BTW @netanel-haber - do you know how this works with latentMoE (regardless of padding)? Is the routed hidden states are added to the shared hidden states only after applying the latent up proj to match hidden dims again?

netanel-haber · 2026-04-24T11:24:33Z

Thanks! Approved

Cc @robertgshaw2-redhat

BTW @netanel-haber - do you know how this works with latentMoE (regardless of padding)? Is the routed hidden states are added to the shared hidden states only after applying the latent up proj to match hidden dims again?

See image re latent:

…m-project#35949] (vllm-project#40794) Signed-off-by: Netanel Haber <nhaber@nvidia.com>

bnellnm · 2026-04-25T02:29:11Z

I think this change might have broken lora/test_gptoss_tp.py::test_gpt_oss_lora_tp2[True-False] again.

netanel-haber · 2026-04-25T07:54:52Z

Re gptoss test breakages,
See: #40865

…m-project#35949] (vllm-project#40794) Signed-off-by: Netanel Haber <nhaber@nvidia.com> Signed-off-by: Avinash Singh <avinashsingh.rcoem@gmail.com>

…m-project#35949] (vllm-project#40794) Signed-off-by: Netanel Haber <nhaber@nvidia.com> Signed-off-by: Adrian <info@zzit.ch>

[Bugfix][MoE] Unpad routed output before shared expert add

621c372

Signed-off-by: Netanel Haber <nhaber@nvidia.com>

netanel-haber requested review from mgoin and pavanimajety as code owners April 24, 2026 09:56

claude Bot reviewed Apr 24, 2026

View reviewed changes

mergify Bot added the bug Something isn't working label Apr 24, 2026

gemini-code-assist Bot reviewed Apr 24, 2026

View reviewed changes

netanel-haber changed the title ~~[Bugfix][MoE] Unpad routed output before shared expert add~~ [Bugfix][MoE] Unpad routed output before shared expert add [Fixes #35949] Apr 24, 2026

tomeras91 approved these changes Apr 24, 2026

View reviewed changes

tomeras91 added the ready ONLY add when PR is ready to merge/full CI is needed label Apr 24, 2026

tomeras91 enabled auto-merge (squash) April 24, 2026 10:13

tomeras91 merged commit e8eb049 into vllm-project:main Apr 24, 2026
72 checks passed

vllm-agent mentioned this pull request Apr 25, 2026

Revert "[Bugfix][MoE] Unpad routed output before shared expert add [Fixes #35949]" (#40794) #40853

Draft

hnt2601 pushed a commit to hnt2601/vllm that referenced this pull request Apr 25, 2026

[Bugfix][MoE] Unpad routed output before shared expert add [Fixes vll…

b4bfea6

…m-project#35949] (vllm-project#40794) Signed-off-by: Netanel Haber <nhaber@nvidia.com>

netanel-haber mentioned this pull request Apr 25, 2026

[Bugfix][MoE] Only unpad routed output before shared expert add #40865

Merged

This was referenced Apr 25, 2026

[CI Bug 2026-04-25] GPQA Eval (GPT-OSS): non-contiguous tensor crash in symm_mem all_reduce ZhanqiuHu/vllm-ci-watch#39

Open

[CI Summary 2026-04-25] 6 failed (5 new, 1 recurring), 1 fixed ZhanqiuHu/vllm-ci-watch#43

Open

bnellnm mentioned this pull request Apr 25, 2026

[LoRA] MoE LoRA Refactor #40338

Merged

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Bugfix][MoE] Unpad routed output before shared expert add [Fixes #35949]#40794

[Bugfix][MoE] Unpad routed output before shared expert add [Fixes #35949]#40794
tomeras91 merged 1 commit intovllm-project:mainfrom
netanel-haber:bugfix/truncate-padded-fused-output-before-adding-to-shared-output

netanel-haber commented Apr 24, 2026 •

edited

Loading

Uh oh!

claude Bot left a comment

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

tomeras91 commented Apr 24, 2026

Uh oh!

netanel-haber commented Apr 24, 2026

Uh oh!

Uh oh!

bnellnm commented Apr 25, 2026

Uh oh!

netanel-haber commented Apr 25, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

Conversation

netanel-haber commented Apr 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

claude Bot left a comment

Choose a reason for hiding this comment

Claude Code Review

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

tomeras91 commented Apr 24, 2026

Uh oh!

netanel-haber commented Apr 24, 2026

Uh oh!

Uh oh!

bnellnm commented Apr 25, 2026

Uh oh!

netanel-haber commented Apr 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

netanel-haber commented Apr 24, 2026 •

edited

Loading

netanel-haber commented Apr 25, 2026 •

edited

Loading