[ROCm][Bugfix]: Only save unpadded sizes for shared_experts in MoERunner to fix rmsnorm pad fusion#34636
Conversation
Signed-off-by: Rohan138 <rohanpotdar138@gmail.com>
There was a problem hiding this comment.
Code Review
This pull request addresses a silent regression on ROCm where an RMSNorm+padding fusion was being disabled. The root cause was an unconditional creation of a reference to hidden_states before a transformation, which added an extra user to the tensor and broke the fusion pattern.
The fix is to only create this reference (original_hidden_states) when it's actually needed, i.e., when shared_experts are present. This is done by making the assignment conditional. Additionally, a related condition was refactored for better clarity, changing isinstance(fused_output, tuple) to the more explicit self.shared_experts is not None.
The changes are correct, well-targeted, and effectively resolve the issue. The code is now more robust and the fusion is re-enabled as intended.
|
Triggered the MoE integration tests manually, please enable those again if pushing |
|
Failures seem unrelated? cc @ProExpertProg |
|
Can you merge from main to fix the CI failures? |
…ner to fix rmsnorm pad fusion (vllm-project#34636) Signed-off-by: Rohan138 <rohanpotdar138@gmail.com>
…ner to fix rmsnorm pad fusion (vllm-project#34636) Signed-off-by: Rohan138 <rohanpotdar138@gmail.com> Signed-off-by: joezuo <qianzhou.zuo@gmail.com>
…ner to fix rmsnorm pad fusion (vllm-project#34636) Signed-off-by: Rohan138 <rohanpotdar138@gmail.com>
…ner to fix rmsnorm pad fusion (vllm-project#34636) Signed-off-by: Rohan138 <rohanpotdar138@gmail.com>
…ner to fix rmsnorm pad fusion (vllm-project#34636) Signed-off-by: Rohan138 <rohanpotdar138@gmail.com>
…ner to fix rmsnorm pad fusion (vllm-project#34636) Signed-off-by: Rohan138 <rohanpotdar138@gmail.com>
…ner to fix rmsnorm pad fusion (vllm-project#34636) Signed-off-by: Rohan138 <rohanpotdar138@gmail.com> Signed-off-by: Andrii Skliar <askliar@nvidia.com>
…ner to fix rmsnorm pad fusion (vllm-project#34636) Signed-off-by: Rohan138 <rohanpotdar138@gmail.com>
Purpose
#32344 introduced a silent regression for gpt-oss on ROCm by disabling the pattern matching for the RMSNorm+padding fusion introduced in #30976.
Specifically, passing the
original_hidden_statesinto themoe_forwardcustom op breaks pattern matching for theAddAiterRMSNormPadPattern, since there is an additional userauto_functionalized(moe_forward)of the unpadded hidden states output from the RMSNorm:IMO this is a cleaner fix than rewriting the whole pass, especially since even without the fusion, keeping the
original_hidden_statesaround for themoe_forwardpass is additional memory overhead we don't actually need. Alternatively, we could just have themoe_forwardandmoe_forward_sharedops have different call signatures as before:Test Plan
vllm serve openai/gpt-oss-120b --attention-backend ROCM_AITER_UNIFIED_ATTNand checking perf+traces. I'll follow up with expanding our e2e fusions tests on ROCm in a separate PR.Test Result
Before:
After:
Essential Elements of an Effective PR Description Checklist
supported_models.mdandexamplesfor a new model.