Skip to content

Use aiter triton fused_add_rmsnorm_pad for gpt-oss#30976

Merged
ProExpertProg merged 20 commits intovllm-project:mainfrom
ROCm:fused_aiter_triton_rmsnorm_pad
Jan 28, 2026
Merged

Use aiter triton fused_add_rmsnorm_pad for gpt-oss#30976
ProExpertProg merged 20 commits intovllm-project:mainfrom
ROCm:fused_aiter_triton_rmsnorm_pad

Conversation

@Rohan138
Copy link
Contributor

@Rohan138 Rohan138 commented Dec 18, 2025

Purpose

Adds fused padding op before router GEMM on ROCm, eliminating this unfused pad after the GEMM before the fused_moe: https://github.com/ROCm/vllm/blob/main/vllm/model_executor/layers/fused_moe/layer.py#1603

Before:
image
After:
image

Follow-up/alternate possibility is to replace this with a single F.pad before the router, then add a fusion pass to fuse AITER CK rmsnorm and pad to PassManager similar to #25693. Done

See also #30357 (gpt-oss quark w4a8 enablement) and #30647 (eliminate padding op on NV w4a8 gpt-oss)

Test Plan

Test Result


Essential Elements of an Effective PR Description Checklist
  • The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
  • The test plan, such as providing test command.
  • The test results, such as pasting the results comparison before and after, or e2e results
  • (Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
  • (Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a fused add+rmsnorm+pad kernel for the gpt-oss model on ROCm, aiming to improve performance by fusing these operations. The changes add a new feature flag and conditionally use the new fused kernel within the TransformerBlock.

My review identified a critical issue where the residual tensor is not un-padded after the fused operation. This would lead to a shape mismatch and a runtime error in the subsequent layer. I have provided a code suggestion to resolve this. The rest of the changes appear to correctly implement the intended feature.

@Rohan138 Rohan138 force-pushed the fused_aiter_triton_rmsnorm_pad branch from d0c16df to df26ddd Compare December 18, 2025 17:45
@mergify
Copy link

mergify bot commented Jan 10, 2026

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @Rohan138.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

@mergify mergify bot added the needs-rebase label Jan 10, 2026
@mergify mergify bot removed the needs-rebase label Jan 20, 2026
@Rohan138 Rohan138 force-pushed the fused_aiter_triton_rmsnorm_pad branch from 871820c to b332997 Compare January 20, 2026 17:50
@Rohan138 Rohan138 marked this pull request as ready for review January 20, 2026 17:54
@Rohan138 Rohan138 requested a review from tjtanaa as a code owner January 20, 2026 17:54
Copy link

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Bugbot Autofix is OFF. To automatically fix reported issues with Cloud Agents, enable Autofix in the Cursor dashboard.

Comment @cursor review or bugbot run to trigger another review on this PR

@Rohan138 Rohan138 force-pushed the fused_aiter_triton_rmsnorm_pad branch from 81f5dd5 to a28f213 Compare January 20, 2026 22:51
Copy link
Collaborator

@ProExpertProg ProExpertProg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's do this via compile pass instead of platform-specific model definition changes

@github-project-automation github-project-automation bot moved this from To Triage to In progress in gpt-oss Issues & Enhancements Jan 20, 2026
@Rohan138 Rohan138 force-pushed the fused_aiter_triton_rmsnorm_pad branch from 0521ee6 to b332997 Compare January 23, 2026 17:46
@tjtanaa
Copy link
Collaborator

tjtanaa commented Jan 24, 2026

@Rohan138 I also prefer @ProExpertProg suggestion and through fusion pass we don't need to add more flags.

@mergify
Copy link

mergify bot commented Jan 24, 2026

Hi @Rohan138, the pre-commit checks have failed. Please run:

uv pip install pre-commit
pre-commit install
pre-commit run --all-files

Then, commit the changes and push to your branch.

For future commits, pre-commit will run automatically on changed files before each commit.

Tip

Is mypy or markdownlint failing?
mypy and markdownlint are run differently in CI. If the failure is related to either of these checks, please use the following commands to run them locally:
# For mypy (substitute "3.10" with the failing version if needed)
pre-commit run --hook-stage manual mypy-3.10
# For markdownlint
pre-commit run --hook-stage manual markdownlint

@Rohan138 Rohan138 force-pushed the fused_aiter_triton_rmsnorm_pad branch 2 times, most recently from aac06a8 to 5bb4123 Compare January 24, 2026 02:02
@mergify
Copy link

mergify bot commented Jan 24, 2026

Hi @Rohan138, the pre-commit checks have failed. Please run:

uv pip install pre-commit
pre-commit install
pre-commit run --all-files

Then, commit the changes and push to your branch.

For future commits, pre-commit will run automatically on changed files before each commit.

Tip

Is mypy or markdownlint failing?
mypy and markdownlint are run differently in CI. If the failure is related to either of these checks, please use the following commands to run them locally:
# For mypy (substitute "3.10" with the failing version if needed)
pre-commit run --hook-stage manual mypy-3.10
# For markdownlint
pre-commit run --hook-stage manual markdownlint

@mergify
Copy link

mergify bot commented Jan 24, 2026

Hi @Rohan138, the pre-commit checks have failed. Please run:

uv pip install pre-commit
pre-commit install
pre-commit run --all-files

Then, commit the changes and push to your branch.

For future commits, pre-commit will run automatically on changed files before each commit.

Tip

Is mypy or markdownlint failing?
mypy and markdownlint are run differently in CI. If the failure is related to either of these checks, please use the following commands to run them locally:
# For mypy (substitute "3.10" with the failing version if needed)
pre-commit run --hook-stage manual mypy-3.10
# For markdownlint
pre-commit run --hook-stage manual markdownlint

Signed-off-by: Rohan138 <rohanpotdar138@gmail.com>
Signed-off-by: Rohan138 <rohanpotdar138@gmail.com>
@mergify
Copy link

mergify bot commented Jan 28, 2026

Hi @Rohan138, the pre-commit checks have failed. Please run:

uv pip install pre-commit
pre-commit install
pre-commit run --all-files

Then, commit the changes and push to your branch.

For future commits, pre-commit will run automatically on changed files before each commit.

Tip

Is mypy or markdownlint failing?
mypy and markdownlint are run differently in CI. If the failure is related to either of these checks, please use the following commands to run them locally:
# For mypy (substitute "3.10" with the failing version if needed)
pre-commit run --hook-stage manual mypy-3.10
# For markdownlint
pre-commit run --hook-stage manual markdownlint

Signed-off-by: Rohan138 <rohanpotdar138@gmail.com>
Copy link
Collaborator

@ProExpertProg ProExpertProg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, nice work! Just one comment about improving the test

@github-project-automation github-project-automation bot moved this from In progress to Ready in gpt-oss Issues & Enhancements Jan 28, 2026
@ProExpertProg ProExpertProg added rocm Related to AMD ROCm ready ONLY add when PR is ready to merge/full CI is needed labels Jan 28, 2026
Signed-off-by: Rohan138 <rohanpotdar138@gmail.com>
Signed-off-by: Rohan138 <rohanpotdar138@gmail.com>
@mergify
Copy link

mergify bot commented Jan 28, 2026

Hi @Rohan138, the pre-commit checks have failed. Please run:

uv pip install pre-commit
pre-commit install
pre-commit run --all-files

Then, commit the changes and push to your branch.

For future commits, pre-commit will run automatically on changed files before each commit.

Tip

Is mypy or markdownlint failing?
mypy and markdownlint are run differently in CI. If the failure is related to either of these checks, please use the following commands to run them locally:
# For mypy (substitute "3.10" with the failing version if needed)
pre-commit run --hook-stage manual mypy-3.10
# For markdownlint
pre-commit run --hook-stage manual markdownlint

Rohan138 and others added 2 commits January 28, 2026 12:34
Signed-off-by: Rohan138 <rohanpotdar138@gmail.com>
@ProExpertProg ProExpertProg enabled auto-merge (squash) January 28, 2026 20:09
@ProExpertProg ProExpertProg merged commit 59bcc5b into vllm-project:main Jan 28, 2026
58 checks passed
gshtras pushed a commit to ROCm/vllm that referenced this pull request Jan 30, 2026
Signed-off-by: Rohan138 <rohanpotdar138@gmail.com>
apd10 pushed a commit to apd10/vllm that referenced this pull request Jan 31, 2026
Signed-off-by: Rohan138 <rohanpotdar138@gmail.com>
PiratePai pushed a commit to PiratePai/epd_shm that referenced this pull request Feb 3, 2026
Signed-off-by: Rohan138 <rohanpotdar138@gmail.com>
Signed-off-by: PiratePai <416932041@qq.com>
Signed-off-by: Pai <416932041@qq.com>
ItzDEXX pushed a commit to ItzDEXX/vllm that referenced this pull request Feb 19, 2026
Signed-off-by: Rohan138 <rohanpotdar138@gmail.com>
@Rohan138 Rohan138 deleted the fused_aiter_triton_rmsnorm_pad branch February 24, 2026 23:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

gpt-oss Related to GPT-OSS models ready ONLY add when PR is ready to merge/full CI is needed rocm Related to AMD ROCm

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

4 participants