fix pad_align for gfx942 by Rohan138 · Pull Request #32307 · vllm-project/vllm

Rohan138 · 2026-01-14T07:26:24Z

This line in vllm/model_executor/layers/fused_moe/layer.py originally from #22421 seems to be padding mxfp4 hidden_size on rocm to a multiple of 256. However, we only really need to pad to 256 for preshuffle reasons on gfx950; for all other cases e.g. gfx942, padding to 128 should be sufficient to avoid masked loads. See #28024 which is doing this correctly.

Ideally we should deduplicate the two padding functions across https://github.com/vllm-project/vllm/blob/main/vllm/model_executor/layers/fused_moe/layer.py and https://github.com/vllm-project/vllm/blob/main/vllm/model_executor/layers/quantization/mxfp4.py, since these are more or less identical

Will also be deduplicated/fixed in #30647, this PR fixes this individual issue until the other PR is merged.

Purpose

Test Plan

Test Result

Essential Elements of an Effective PR Description Checklist

The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
The test plan, such as providing test command.
The test results, such as pasting the results comparison before and after, or e2e results
(Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
(Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

Signed-off-by: Rohan138 <rohanpotdar138@gmail.com>

gemini-code-assist

Code Review

This pull request correctly addresses an issue with mxfp4 padding alignment on ROCm devices in the fused MoE layer. The original implementation incorrectly applied a hardcoded padding of 256 bytes for all ROCm devices, whereas this is only required for gfx950. The fix introduces a new utility function, get_padding_alignment, which dynamically determines the correct padding (128 or 256 bytes) based on the specific ROCm GPU architecture. This change is well-implemented and aligns with the intended behavior. The author's note about deduplicating this padding logic in the future is a good next step for improving code maintainability.

cursor

Cursor Bugbot has reviewed your changes and found 1 potential issue.

^{Bugbot Autofix is OFF. To automatically fix reported issues with Cloud Agents, enable Autofix in the Cursor dashboard.}

Comment @cursor review or bugbot run to trigger another review on this PR

cursor · 2026-01-21T21:43:00Z

vllm/model_executor/layers/fused_moe/layer.py

            hidden_size = round_up(hidden_size, 256)
+        elif current_platform.is_rocm():
+            pad_align = get_padding_alignment()
+            hidden_size = round_up(hidden_size, pad_align)


Missing triton availability check in ROCm padding

Low Severity

The new ROCm branch calls get_padding_alignment() which accesses triton.runtime.driver.active.get_current_target().arch without verifying triton is properly available. If has_triton_kernels() is False on ROCm, get_mxfp4_backend() returns Mxfp4Backend.NONE, but the code still enters the elif current_platform.is_rocm(): branch. When triton isn't properly initialized (no active drivers), the triton object is a placeholder that lacks a runtime attribute, causing an AttributeError. The original code used a hardcoded value of 256 and didn't have this dependency.

…nto fix_gfx942_pad_align

Rohan138 · 2026-02-24T21:08:39Z

Closing in favor of #34285

fix pad_align for gfx942

6d37db5

Signed-off-by: Rohan138 <rohanpotdar138@gmail.com>

gemini-code-assist bot reviewed Jan 14, 2026

View reviewed changes

Rohan138 and others added 2 commits January 14, 2026 13:57

Merge branch 'main' into fix_gfx942_pad_align

5637eef

Merge branch 'main' into fix_gfx942_pad_align

7fb8202

Rohan138 marked this pull request as ready for review January 21, 2026 21:33

Rohan138 requested review from mgoin and pavanimajety as code owners January 21, 2026 21:33

cursor bot reviewed Jan 21, 2026

View reviewed changes

Merge branch 'main' into fix_gfx942_pad_align

dfb5683

Rohan138 marked this pull request as draft January 21, 2026 22:08

Rohan138 mentioned this pull request Jan 24, 2026

Use aiter triton fused_add_rmsnorm_pad for gpt-oss #30976

Merged

5 tasks

Rohan138 added 2 commits January 28, 2026 01:33

Merge branch 'main' into fix_gfx942_pad_align

fd382d0

Merge branch 'fix_gfx942_pad_align' of https://github.com/ROCm/vllm i…

909681a

…nto fix_gfx942_pad_align

Rohan138 mentioned this pull request Feb 11, 2026

[Refactor] Move FusedMoE hidden_size roundup to quant_method #34285

Open

Rohan138 closed this Feb 24, 2026

Rohan138 deleted the fix_gfx942_pad_align branch February 24, 2026 22:45

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix pad_align for gfx942#32307

fix pad_align for gfx942#32307
Rohan138 wants to merge 6 commits intovllm-project:mainfrom
ROCm:fix_gfx942_pad_align

Rohan138 commented Jan 14, 2026 •

edited by github-actions bot

Loading

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

cursor bot left a comment

Uh oh!

cursor bot Jan 21, 2026

Uh oh!

Rohan138 commented Feb 24, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

Rohan138 commented Jan 14, 2026 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Test Plan

Test Result

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

cursor bot left a comment

Choose a reason for hiding this comment

Uh oh!

cursor bot Jan 21, 2026

Choose a reason for hiding this comment

Missing triton availability check in ROCm padding

Uh oh!

Rohan138 commented Feb 24, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Rohan138 commented Jan 14, 2026 •

edited by github-actions bot

Loading