Use aiter triton fused_add_rmsnorm_pad for gpt-oss by Rohan138 · Pull Request #30976 · vllm-project/vllm

Rohan138 · 2025-12-18T17:39:55Z

Purpose

Adds fused padding op before router GEMM on ROCm, eliminating this unfused pad after the GEMM before the fused_moe: https://github.com/ROCm/vllm/blob/main/vllm/model_executor/layers/fused_moe/layer.py#1603

Before:

After:

~~Follow-up/alternate possibility is to replace this with a single F.pad before the router, then add a fusion pass to fuse AITER CK rmsnorm and pad to PassManager similar to #25693.~~ Done

See also #30357 (gpt-oss quark w4a8 enablement) and #30647 (eliminate padding op on NV w4a8 gpt-oss)

Test Plan

Test Result

Essential Elements of an Effective PR Description Checklist

The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
The test plan, such as providing test command.
The test results, such as pasting the results comparison before and after, or e2e results
(Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
(Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

gemini-code-assist

Code Review

This pull request introduces a fused add+rmsnorm+pad kernel for the gpt-oss model on ROCm, aiming to improve performance by fusing these operations. The changes add a new feature flag and conditionally use the new fused kernel within the TransformerBlock.

My review identified a critical issue where the residual tensor is not un-padded after the fused operation. This would lead to a shape mismatch and a runtime error in the subsequent layer. I have provided a code suggestion to resolve this. The rest of the changes appear to correctly implement the intended feature.

vllm/model_executor/models/gpt_oss.py

mergify · 2026-01-10T20:49:59Z

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @Rohan138.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

cursor

Cursor Bugbot has reviewed your changes and found 1 potential issue.

^{Bugbot Autofix is OFF. To automatically fix reported issues with Cloud Agents, enable Autofix in the Cursor dashboard.}

Comment @cursor review or bugbot run to trigger another review on this PR

vllm/_aiter_ops.py

vllm/model_executor/models/gpt_oss.py

ProExpertProg

Let's do this via compile pass instead of platform-specific model definition changes

tjtanaa · 2026-01-24T01:22:19Z

@Rohan138 I also prefer @ProExpertProg suggestion and through fusion pass we don't need to add more flags.

mergify · 2026-01-24T02:00:15Z

Hi @Rohan138, the pre-commit checks have failed. Please run:

uv pip install pre-commit
pre-commit install
pre-commit run --all-files

Then, commit the changes and push to your branch.

For future commits, pre-commit will run automatically on changed files before each commit.

Tip

Is mypy or markdownlint failing?

mypy and markdownlint are run differently in CI. If the failure is related to either of these checks, please use the following commands to run them locally:

# For mypy (substitute "3.10" with the failing version if needed)
pre-commit run --hook-stage manual mypy-3.10
# For markdownlint
pre-commit run --hook-stage manual markdownlint

vllm/compilation/pass_manager.py

vllm/compilation/rocm_aiter_fusion.py

mergify · 2026-01-24T02:07:19Z

Hi @Rohan138, the pre-commit checks have failed. Please run:

uv pip install pre-commit
pre-commit install
pre-commit run --all-files

Then, commit the changes and push to your branch.

For future commits, pre-commit will run automatically on changed files before each commit.

Tip

Is mypy or markdownlint failing?

mypy and markdownlint are run differently in CI. If the failure is related to either of these checks, please use the following commands to run them locally:

# For mypy (substitute "3.10" with the failing version if needed)
pre-commit run --hook-stage manual mypy-3.10
# For markdownlint
pre-commit run --hook-stage manual markdownlint

vllm/envs.py

mergify · 2026-01-24T03:13:00Z

Hi @Rohan138, the pre-commit checks have failed. Please run:

uv pip install pre-commit
pre-commit install
pre-commit run --all-files

Then, commit the changes and push to your branch.

For future commits, pre-commit will run automatically on changed files before each commit.

Tip

Is mypy or markdownlint failing?

mypy and markdownlint are run differently in CI. If the failure is related to either of these checks, please use the following commands to run them locally:

# For mypy (substitute "3.10" with the failing version if needed)
pre-commit run --hook-stage manual mypy-3.10
# For markdownlint
pre-commit run --hook-stage manual markdownlint

Signed-off-by: Rohan138 <rohanpotdar138@gmail.com>

mergify · 2026-01-28T00:08:28Z

Hi @Rohan138, the pre-commit checks have failed. Please run:

uv pip install pre-commit
pre-commit install
pre-commit run --all-files

Then, commit the changes and push to your branch.

For future commits, pre-commit will run automatically on changed files before each commit.

Tip

Is mypy or markdownlint failing?

mypy and markdownlint are run differently in CI. If the failure is related to either of these checks, please use the following commands to run them locally:

# For mypy (substitute "3.10" with the failing version if needed)
pre-commit run --hook-stage manual mypy-3.10
# For markdownlint
pre-commit run --hook-stage manual markdownlint

Signed-off-by: Rohan138 <rohanpotdar138@gmail.com>

ProExpertProg

LGTM, nice work! Just one comment about improving the test

tests/compile/test_fuse_act_padding.py

Signed-off-by: Rohan138 <rohanpotdar138@gmail.com>

mergify · 2026-01-28T18:34:15Z

Hi @Rohan138, the pre-commit checks have failed. Please run:

uv pip install pre-commit
pre-commit install
pre-commit run --all-files

Then, commit the changes and push to your branch.

For future commits, pre-commit will run automatically on changed files before each commit.

Tip

Is mypy or markdownlint failing?

mypy and markdownlint are run differently in CI. If the failure is related to either of these checks, please use the following commands to run them locally:

# For mypy (substitute "3.10" with the failing version if needed)
pre-commit run --hook-stage manual mypy-3.10
# For markdownlint
pre-commit run --hook-stage manual markdownlint

Signed-off-by: Rohan138 <rohanpotdar138@gmail.com>

Signed-off-by: Rohan138 <rohanpotdar138@gmail.com> Signed-off-by: PiratePai <416932041@qq.com> Signed-off-by: Pai <416932041@qq.com>

Signed-off-by: Rohan138 <rohanpotdar138@gmail.com>

mergify bot added the gpt-oss Related to GPT-OSS models label Dec 18, 2025

github-project-automation bot added this to gpt-oss Issues & Enhancements Dec 18, 2025

github-project-automation bot moved this to To Triage in gpt-oss Issues & Enhancements Dec 18, 2025

gemini-code-assist bot reviewed Dec 18, 2025

View reviewed changes

vllm/model_executor/models/gpt_oss.py Show resolved Hide resolved

Rohan138 force-pushed the fused_aiter_triton_rmsnorm_pad branch from d0c16df to df26ddd Compare December 18, 2025 17:45

mergify bot added the needs-rebase label Jan 10, 2026

mergify bot removed the needs-rebase label Jan 20, 2026

Rohan138 force-pushed the fused_aiter_triton_rmsnorm_pad branch from 871820c to b332997 Compare January 20, 2026 17:50

Rohan138 marked this pull request as ready for review January 20, 2026 17:54

Rohan138 requested a review from tjtanaa as a code owner January 20, 2026 17:54

cursor bot reviewed Jan 20, 2026

View reviewed changes

vllm/_aiter_ops.py Outdated Show resolved Hide resolved

Rohan138 force-pushed the fused_aiter_triton_rmsnorm_pad branch from 81f5dd5 to a28f213 Compare January 20, 2026 22:51

gshtras reviewed Jan 20, 2026

View reviewed changes

vllm/model_executor/models/gpt_oss.py Outdated Show resolved Hide resolved

gshtras reviewed Jan 20, 2026

View reviewed changes

vllm/model_executor/models/gpt_oss.py Outdated Show resolved Hide resolved

ProExpertProg requested changes Jan 20, 2026

View reviewed changes

github-project-automation bot moved this from To Triage to In progress in gpt-oss Issues & Enhancements Jan 20, 2026

Rohan138 force-pushed the fused_aiter_triton_rmsnorm_pad branch from 0521ee6 to b332997 Compare January 23, 2026 17:46

Rohan138 requested review from youkaichao and zou3519 as code owners January 24, 2026 01:55

Rohan138 force-pushed the fused_aiter_triton_rmsnorm_pad branch 2 times, most recently from aac06a8 to 5bb4123 Compare January 24, 2026 02:02

Rohan138 commented Jan 24, 2026

View reviewed changes

vllm/compilation/pass_manager.py Show resolved Hide resolved

Rohan138 commented Jan 24, 2026

View reviewed changes

vllm/compilation/rocm_aiter_fusion.py Outdated Show resolved Hide resolved

Rohan138 commented Jan 24, 2026

View reviewed changes

vllm/envs.py Outdated Show resolved Hide resolved

squash into one (now working) commit

1bdfd52

Signed-off-by: Rohan138 <rohanpotdar138@gmail.com>

Rohan138 added 2 commits January 27, 2026 14:56

Merge branch 'main' into fused_aiter_triton_rmsnorm_pad

a6b6cfa

drop num_local experts, just use a dummy shape

2cdce68

Signed-off-by: Rohan138 <rohanpotdar138@gmail.com>

Rohan138 requested review from ProExpertProg and gshtras January 27, 2026 22:29

Add unit test

8fe47e0

Signed-off-by: Rohan138 <rohanpotdar138@gmail.com>

fix lint

1228560

Signed-off-by: Rohan138 <rohanpotdar138@gmail.com>

ProExpertProg approved these changes Jan 28, 2026

View reviewed changes

tests/compile/test_fuse_act_padding.py Outdated Show resolved Hide resolved

github-project-automation bot moved this from In progress to Ready in gpt-oss Issues & Enhancements Jan 28, 2026

ProExpertProg added rocm Related to AMD ROCm ready ONLY add when PR is ready to merge/full CI is needed labels Jan 28, 2026

Rohan138 added 2 commits January 28, 2026 12:08

add layers to fuse_act_padding test

de0be00

Signed-off-by: Rohan138 <rohanpotdar138@gmail.com>

move import into test to fix CI

88a805c

Signed-off-by: Rohan138 <rohanpotdar138@gmail.com>

Rohan138 and others added 2 commits January 28, 2026 12:34

fix lint

ea638bc

Signed-off-by: Rohan138 <rohanpotdar138@gmail.com>

Merge branch 'main' into fused_aiter_triton_rmsnorm_pad

71254a6

ProExpertProg enabled auto-merge (squash) January 28, 2026 20:09

ProExpertProg merged commit 59bcc5b into vllm-project:main Jan 28, 2026
58 checks passed

github-project-automation bot moved this from Ready to Done in gpt-oss Issues & Enhancements Jan 28, 2026

gshtras pushed a commit to ROCm/vllm that referenced this pull request Jan 30, 2026

Use aiter triton fused_add_rmsnorm_pad for gpt-oss (vllm-project#30976)

53b5436

Signed-off-by: Rohan138 <rohanpotdar138@gmail.com>

apd10 pushed a commit to apd10/vllm that referenced this pull request Jan 31, 2026

Use aiter triton fused_add_rmsnorm_pad for gpt-oss (vllm-project#30976)

f86af6d

Signed-off-by: Rohan138 <rohanpotdar138@gmail.com>

rasmith mentioned this pull request Feb 6, 2026

[CI][BugFix][AMD] Add check for model_config being None and update conftest.py to load AITER of available to fix Kernels MoE Test % N #33952

Closed

5 tasks

Rohan138 mentioned this pull request Feb 16, 2026

[ROCm][Bugfix]: Only save unpadded sizes for shared_experts in MoERunner to fix rmsnorm pad fusion #34636

Merged

5 tasks

ItzDEXX pushed a commit to ItzDEXX/vllm that referenced this pull request Feb 19, 2026

Use aiter triton fused_add_rmsnorm_pad for gpt-oss (vllm-project#30976)

ce36fba

Signed-off-by: Rohan138 <rohanpotdar138@gmail.com>

amd-hhashemi mentioned this pull request Feb 20, 2026

Improvements to wvSplitKrc skinny GEMM solution #34304

Merged

5 tasks

Klaud-Cold mentioned this pull request Feb 20, 2026

update mi325/mi300 to vllm 0.16 SemiAnalysisAI/InferenceX#607

Closed

Rohan138 deleted the fused_aiter_triton_rmsnorm_pad branch February 24, 2026 23:26

Uh oh!

Conversation

Rohan138 commented Dec 18, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Test Plan

Test Result

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

mergify bot commented Jan 10, 2026

Uh oh!

cursor bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ProExpertProg left a comment

Choose a reason for hiding this comment

Uh oh!

tjtanaa commented Jan 24, 2026

Uh oh!

mergify bot commented Jan 24, 2026

Uh oh!

Uh oh!

Uh oh!

mergify bot commented Jan 24, 2026

Uh oh!

Uh oh!

mergify bot commented Jan 24, 2026

Uh oh!

mergify bot commented Jan 28, 2026

Uh oh!

ProExpertProg left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

mergify bot commented Jan 28, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Rohan138 commented Dec 18, 2025 •

edited by github-actions bot

Loading