Skip to content

fix pad_align for gfx942#32307

Closed
Rohan138 wants to merge 6 commits intovllm-project:mainfrom
ROCm:fix_gfx942_pad_align
Closed

fix pad_align for gfx942#32307
Rohan138 wants to merge 6 commits intovllm-project:mainfrom
ROCm:fix_gfx942_pad_align

Conversation

@Rohan138
Copy link
Contributor

@Rohan138 Rohan138 commented Jan 14, 2026

This line in vllm/model_executor/layers/fused_moe/layer.py originally from #22421 seems to be padding mxfp4 hidden_size on rocm to a multiple of 256. However, we only really need to pad to 256 for preshuffle reasons on gfx950; for all other cases e.g. gfx942, padding to 128 should be sufficient to avoid masked loads. See #28024 which is doing this correctly.

Ideally we should deduplicate the two padding functions across https://github.com/vllm-project/vllm/blob/main/vllm/model_executor/layers/fused_moe/layer.py and https://github.com/vllm-project/vllm/blob/main/vllm/model_executor/layers/quantization/mxfp4.py, since these are more or less identical

Will also be deduplicated/fixed in #30647, this PR fixes this individual issue until the other PR is merged.

Purpose

Test Plan

Test Result


Essential Elements of an Effective PR Description Checklist
  • The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
  • The test plan, such as providing test command.
  • The test results, such as pasting the results comparison before and after, or e2e results
  • (Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
  • (Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

Signed-off-by: Rohan138 <rohanpotdar138@gmail.com>
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request correctly addresses an issue with mxfp4 padding alignment on ROCm devices in the fused MoE layer. The original implementation incorrectly applied a hardcoded padding of 256 bytes for all ROCm devices, whereas this is only required for gfx950. The fix introduces a new utility function, get_padding_alignment, which dynamically determines the correct padding (128 or 256 bytes) based on the specific ROCm GPU architecture. This change is well-implemented and aligns with the intended behavior. The author's note about deduplicating this padding logic in the future is a good next step for improving code maintainability.

@Rohan138 Rohan138 marked this pull request as ready for review January 21, 2026 21:33
Copy link

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Bugbot Autofix is OFF. To automatically fix reported issues with Cloud Agents, enable Autofix in the Cursor dashboard.

Comment @cursor review or bugbot run to trigger another review on this PR

hidden_size = round_up(hidden_size, 256)
elif current_platform.is_rocm():
pad_align = get_padding_alignment()
hidden_size = round_up(hidden_size, pad_align)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing triton availability check in ROCm padding

Low Severity

The new ROCm branch calls get_padding_alignment() which accesses triton.runtime.driver.active.get_current_target().arch without verifying triton is properly available. If has_triton_kernels() is False on ROCm, get_mxfp4_backend() returns Mxfp4Backend.NONE, but the code still enters the elif current_platform.is_rocm(): branch. When triton isn't properly initialized (no active drivers), the triton object is a placeholder that lacks a runtime attribute, causing an AttributeError. The original code used a hardcoded value of 256 and didn't have this dependency.

Fix in Cursor Fix in Web

@Rohan138 Rohan138 marked this pull request as draft January 21, 2026 22:08
@Rohan138
Copy link
Contributor Author

Closing in favor of #34285

@Rohan138 Rohan138 closed this Feb 24, 2026
@Rohan138 Rohan138 deleted the fix_gfx942_pad_align branch February 24, 2026 22:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant