Use aiter triton fused_add_rmsnorm_pad for gpt-oss#30976
Use aiter triton fused_add_rmsnorm_pad for gpt-oss#30976ProExpertProg merged 20 commits intovllm-project:mainfrom
Conversation
There was a problem hiding this comment.
Code Review
This pull request introduces a fused add+rmsnorm+pad kernel for the gpt-oss model on ROCm, aiming to improve performance by fusing these operations. The changes add a new feature flag and conditionally use the new fused kernel within the TransformerBlock.
My review identified a critical issue where the residual tensor is not un-padded after the fused operation. This would lead to a shape mismatch and a runtime error in the subsequent layer. I have provided a code suggestion to resolve this. The rest of the changes appear to correctly implement the intended feature.
d0c16df to
df26ddd
Compare
|
This pull request has merge conflicts that must be resolved before it can be |
871820c to
b332997
Compare
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 1 potential issue.
Bugbot Autofix is OFF. To automatically fix reported issues with Cloud Agents, enable Autofix in the Cursor dashboard.
Comment @cursor review or bugbot run to trigger another review on this PR
81f5dd5 to
a28f213
Compare
ProExpertProg
left a comment
There was a problem hiding this comment.
Let's do this via compile pass instead of platform-specific model definition changes
0521ee6 to
b332997
Compare
|
@Rohan138 I also prefer @ProExpertProg suggestion and through fusion pass we don't need to add more flags. |
|
Hi @Rohan138, the pre-commit checks have failed. Please run: uv pip install pre-commit
pre-commit install
pre-commit run --all-filesThen, commit the changes and push to your branch. For future commits, Tip Is
|
aac06a8 to
5bb4123
Compare
|
Hi @Rohan138, the pre-commit checks have failed. Please run: uv pip install pre-commit
pre-commit install
pre-commit run --all-filesThen, commit the changes and push to your branch. For future commits, Tip Is
|
|
Hi @Rohan138, the pre-commit checks have failed. Please run: uv pip install pre-commit
pre-commit install
pre-commit run --all-filesThen, commit the changes and push to your branch. For future commits, Tip Is
|
Signed-off-by: Rohan138 <rohanpotdar138@gmail.com>
Signed-off-by: Rohan138 <rohanpotdar138@gmail.com>
Signed-off-by: Rohan138 <rohanpotdar138@gmail.com>
|
Hi @Rohan138, the pre-commit checks have failed. Please run: uv pip install pre-commit
pre-commit install
pre-commit run --all-filesThen, commit the changes and push to your branch. For future commits, Tip Is
|
ProExpertProg
left a comment
There was a problem hiding this comment.
LGTM, nice work! Just one comment about improving the test
Signed-off-by: Rohan138 <rohanpotdar138@gmail.com>
Signed-off-by: Rohan138 <rohanpotdar138@gmail.com>
|
Hi @Rohan138, the pre-commit checks have failed. Please run: uv pip install pre-commit
pre-commit install
pre-commit run --all-filesThen, commit the changes and push to your branch. For future commits, Tip Is
|
Signed-off-by: Rohan138 <rohanpotdar138@gmail.com>
Signed-off-by: Rohan138 <rohanpotdar138@gmail.com>
Signed-off-by: Rohan138 <rohanpotdar138@gmail.com> Signed-off-by: PiratePai <416932041@qq.com> Signed-off-by: Pai <416932041@qq.com>
Signed-off-by: Rohan138 <rohanpotdar138@gmail.com>
Purpose
Adds fused padding op before router GEMM on ROCm, eliminating this unfused pad after the GEMM before the fused_moe: https://github.com/ROCm/vllm/blob/main/vllm/model_executor/layers/fused_moe/layer.py#1603
Before:


After:
Follow-up/alternate possibility is to replace this with a singleDoneF.padbefore the router, then add a fusion pass to fuse AITER CK rmsnorm and pad to PassManager similar to #25693.See also #30357 (gpt-oss quark w4a8 enablement) and #30647 (eliminate padding op on NV w4a8 gpt-oss)
Test Plan
Test Result
Essential Elements of an Effective PR Description Checklist
supported_models.mdandexamplesfor a new model.