[ROCm][Bugfix]: Only save unpadded sizes for shared_experts in MoERunner to fix rmsnorm pad fusion by Rohan138 · Pull Request #34636 · vllm-project/vllm

Rohan138 · 2026-02-16T17:40:54Z

Purpose

#32344 introduced a silent regression for gpt-oss on ROCm by disabling the pattern matching for the RMSNorm+padding fusion introduced in #30976.

Specifically, passing the original_hidden_states into the moe_forward custom op breaks pattern matching for the AddAiterRMSNormPadPattern, since there is an additional user auto_functionalized(moe_forward) of the unpadded hidden states output from the RMSNorm:

constant_pad_nd(getitem, [0, 192], 0.0) multiple_users pattern CallFunction(operator.getitem, CallFunction(vllm.rocm_aiter_rmsnorm2d_fwd_with_add.default, KeywordArg('input'), KeywordArg('residual'), KeywordArg('weight'), *, _users=2), 0, _users=2)
does not match node %getitem : [num_users=3] = call_function[target=operator.getitem](args = (%rocm_aiter_rmsnorm2d_fwd_with_add, 0), kwargs = {})
with users {constant_pad_nd: None, auto_functionalized: None, rocm_unquantized_gemm_1: None} 

Full pattern:
MultiOutputPattern([CallFunction(aten.constant_pad_nd.default, CallFunction(operator.getitem, CallFunction(vllm.rocm_aiter_rmsnorm2d_fwd_with_add.default, KeywordArg('input'), KeywordArg('residual'), KeywordArg('weight'), *, _users=2), 0, _users=2), *, *), CallFunction(operator.getitem, CallFunction(vllm.rocm_aiter_rmsnorm2d_fwd_with_add.default, KeywordArg('input'), KeywordArg('residual'), KeywordArg('weight'), *, _users=2), 1), CallFunction(vllm.rocm_unquantized_gemm.default, CallFunction(operator.getitem, CallFunction(vllm.rocm_aiter_rmsnorm2d_fwd_with_add.default, KeywordArg('input'), KeywordArg('residual'), KeywordArg('weight'), *, _users=2), 0, _users=2), KeywordArg('router_weight'), KeywordArg('router_bias'))])

IMO this is a cleaner fix than rewriting the whole pass, especially since even without the fusion, keeping the original_hidden_states around for the moe_forward pass is additional memory overhead we don't actually need. Alternatively, we could just have the moe_forward and moe_forward_shared ops have different call signatures as before:


        if self.shared_experts is None:
            fused_output = torch.ops.vllm.moe_forward(
                hidden_states, router_logits, encode_layer_name()
            )
            return reduce_output(fused_output)[..., :transformed_hidden_dim]
        else:
          # We pass original tensor for shared experts (not transformed)
          shared_output, fused_output = torch.ops.vllm.moe_forward_shared(
              hidden_states,
              router_logits,
              encode_layer_name(),
              original_hidden_states,
          )

Test Plan

vllm serve openai/gpt-oss-120b --attention-backend ROCM_AITER_UNIFIED_ATTN and checking perf+traces. I'll follow up with expanding our e2e fusions tests on ROCm in a separate PR.

Test Result

Before:

============ Serving Benchmark Result ============
Successful requests:                     320       
Benchmark duration (s):                  118.22    
Total input tokens:                      294646    
Total generated tokens:                  295581    
Request throughput (req/s):              2.71      
Output token throughput (tok/s):         2500.21   
Total Token throughput (tok/s):          4992.51

After:

============ Serving Benchmark Result ============
Successful requests:                     320       
Benchmark duration (s):                  117.02    
Total input tokens:                      294646    
Total generated tokens:                  295581    
Request throughput (req/s):              2.73      
Output token throughput (tok/s):         2525.90   
Total Token throughput (tok/s):          5043.80

Essential Elements of an Effective PR Description Checklist

The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
The test plan, such as providing test command.
The test results, such as pasting the results comparison before and after, or e2e results
(Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
(Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

Signed-off-by: Rohan138 <rohanpotdar138@gmail.com>

gemini-code-assist

Code Review

This pull request addresses a silent regression on ROCm where an RMSNorm+padding fusion was being disabled. The root cause was an unconditional creation of a reference to hidden_states before a transformation, which added an extra user to the tensor and broke the fusion pattern.

The fix is to only create this reference (original_hidden_states) when it's actually needed, i.e., when shared_experts are present. This is done by making the assignment conditional. Additionally, a related condition was refactored for better clarity, changing isinstance(fused_output, tuple) to the more explicit self.shared_experts is not None.

The changes are correct, well-targeted, and effectively resolve the issue. The code is now more robust and the fusion is re-enabled as intended.

ProExpertProg · 2026-02-17T17:37:35Z

Triggered the MoE integration tests manually, please enable those again if pushing

Rohan138 · 2026-02-17T23:42:29Z

Failures seem unrelated? cc @ProExpertProg

DarkLight1337 · 2026-02-20T05:24:56Z

Can you merge from main to fix the CI failures?

dosubot · 2026-02-21T03:56:23Z

Related Documentation

Checked 0 published document(s) in 1 knowledge base(s). No updates required.

^{How did I do? Any feedback?}

…ner to fix rmsnorm pad fusion (vllm-project#34636) Signed-off-by: Rohan138 <rohanpotdar138@gmail.com>

…ner to fix rmsnorm pad fusion (vllm-project#34636) Signed-off-by: Rohan138 <rohanpotdar138@gmail.com> Signed-off-by: joezuo <qianzhou.zuo@gmail.com>

…ner to fix rmsnorm pad fusion (vllm-project#34636) Signed-off-by: Rohan138 <rohanpotdar138@gmail.com>

…ner to fix rmsnorm pad fusion (vllm-project#34636) Signed-off-by: Rohan138 <rohanpotdar138@gmail.com> Signed-off-by: Andrii Skliar <askliar@nvidia.com>

…ner to fix rmsnorm pad fusion (vllm-project#34636) Signed-off-by: Rohan138 <rohanpotdar138@gmail.com>

patch MoERunner to fix rmsnorm pad

427c059

Signed-off-by: Rohan138 <rohanpotdar138@gmail.com>

Rohan138 requested review from mgoin and pavanimajety as code owners February 16, 2026 17:40

mergify bot added rocm Related to AMD ROCm bug Something isn't working labels Feb 16, 2026

github-project-automation bot added this to AMD Feb 16, 2026

github-project-automation bot moved this to Todo in AMD Feb 16, 2026

gemini-code-assist bot reviewed Feb 16, 2026

View reviewed changes

bnellnm approved these changes Feb 16, 2026

View reviewed changes

ProExpertProg approved these changes Feb 17, 2026

View reviewed changes

ProExpertProg enabled auto-merge (squash) February 17, 2026 17:29

github-actions bot added the ready ONLY add when PR is ready to merge/full CI is needed label Feb 17, 2026

Merge branch 'main' into fix_fused_rmsnorm_pad

d7f7d6f

Rohan138 added 2 commits February 18, 2026 00:40

Merge branch 'main' into fix_fused_rmsnorm_pad

772d332

Merge branch 'main' into fix_fused_rmsnorm_pad

8a7b331

Merge branch 'main' into fix_fused_rmsnorm_pad

515b870

vllm-bot merged commit ded333f into vllm-project:main Feb 21, 2026
56 of 60 checks passed

github-project-automation bot moved this from Todo to Done in AMD Feb 21, 2026

Rohan138 deleted the fix_fused_rmsnorm_pad branch February 21, 2026 08:02

DarkLight1337 pushed a commit to DarkLight1337/vllm that referenced this pull request Feb 21, 2026

[ROCm][Bugfix]: Only save unpadded sizes for shared_experts in MoERun…

1133663

…ner to fix rmsnorm pad fusion (vllm-project#34636) Signed-off-by: Rohan138 <rohanpotdar138@gmail.com>

yugong333 pushed a commit to yugong333/vllm that referenced this pull request Feb 22, 2026

[ROCm][Bugfix]: Only save unpadded sizes for shared_experts in MoERun…

6d02e50

…ner to fix rmsnorm pad fusion (vllm-project#34636) Signed-off-by: Rohan138 <rohanpotdar138@gmail.com>

jmamou pushed a commit to jmamou/vllm that referenced this pull request Feb 23, 2026

[ROCm][Bugfix]: Only save unpadded sizes for shared_experts in MoERun…

c1b23d9

…ner to fix rmsnorm pad fusion (vllm-project#34636) Signed-off-by: Rohan138 <rohanpotdar138@gmail.com>

llsj14 pushed a commit to llsj14/vllm that referenced this pull request Mar 1, 2026

[ROCm][Bugfix]: Only save unpadded sizes for shared_experts in MoERun…

66045e6

…ner to fix rmsnorm pad fusion (vllm-project#34636) Signed-off-by: Rohan138 <rohanpotdar138@gmail.com>

tunglinwood pushed a commit to tunglinwood/vllm that referenced this pull request Mar 4, 2026

[ROCm][Bugfix]: Only save unpadded sizes for shared_experts in MoERun…

f4ba00a

…ner to fix rmsnorm pad fusion (vllm-project#34636) Signed-off-by: Rohan138 <rohanpotdar138@gmail.com>

Copilot AI pushed a commit to machov/vllm that referenced this pull request Mar 10, 2026

[ROCm][Bugfix]: Only save unpadded sizes for shared_experts in MoERun…

ef2f837

…ner to fix rmsnorm pad fusion (vllm-project#34636) Signed-off-by: Rohan138 <rohanpotdar138@gmail.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[ROCm][Bugfix]: Only save unpadded sizes for shared_experts in MoERunner to fix rmsnorm pad fusion#34636

[ROCm][Bugfix]: Only save unpadded sizes for shared_experts in MoERunner to fix rmsnorm pad fusion#34636
vllm-bot merged 5 commits intovllm-project:mainfrom
ROCm:fix_fused_rmsnorm_pad

Rohan138 commented Feb 16, 2026 •

edited by github-actions bot

Loading

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

ProExpertProg commented Feb 17, 2026

Uh oh!

Rohan138 commented Feb 17, 2026

Uh oh!

DarkLight1337 commented Feb 20, 2026

Uh oh!

Uh oh!

dosubot bot commented Feb 21, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Uh oh!

Conversation

Rohan138 commented Feb 16, 2026 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Test Plan

Test Result

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

ProExpertProg commented Feb 17, 2026

Uh oh!

Rohan138 commented Feb 17, 2026

Uh oh!

DarkLight1337 commented Feb 20, 2026

Uh oh!

Uh oh!

dosubot bot commented Feb 21, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Rohan138 commented Feb 16, 2026 •

edited by github-actions bot

Loading