[Bugfix][ROCm][MoE] Fix mxfp4 oracle regressions from #37128 by AndreasKaratzas · Pull Request #37787 · vllm-project/vllm

AndreasKaratzas · 2026-03-22T02:52:54Z

Fixes several issues introduced by #37128 that broke gpt-oss on ROCm.

Restore gfx950 gate for CK mxfp4 backend selection. The old code only picked CK on gfx950 via on_gfx950(), the refactor dropped this and let CK get selected on gfx942 where it crashes.
Restore CK_MXFP4_MOE_DIM_ALIGNMENT (256) check. Models with intermediate_size not aligned to 256 (like gpt-oss-20b at 2880) hit a reshape error in aiter shuffle_scale_a16w4. Added is_supported_config to AiterExperts so the backend selector falls through to Triton.
Restore hidden_pad/intermediate_pad for CK path. These were passed to rocm_aiter_ops.fused_moe() in the old code but got lost in the refactor. Added fields to FusedMoEQuantConfig and wired them through.

Tested on MI325X (gfx942):

test_gpt_oss_speculative_reasoning_leakage passes
GPQA eval via gpt_oss.evals: 56.76% (1584 questions, effort=low)
Backend correctly falls back to Triton on non-gfx950

Signed-off-by: Andreas Karatzas <akaratza@amd.com>

gemini-code-assist

Code Review

This pull request addresses several regressions related to MoE with mxfp4 quantization, primarily affecting ROCm platforms, which were introduced in a recent refactoring. The fixes include restoring platform-specific checks for the CK backend, correctly handling dimension padding, and resolving an issue with LoRA on NVIDIA. My review identifies a critical contradiction in one of the changes: while the PR description claims to enable mxfp4 LoRA on ROCm, the code continues to raise a NotImplementedError, albeit with a more descriptive message. This discrepancy needs to be resolved.

vllm/model_executor/layers/fused_moe/oracle/mxfp4.py

mergify · 2026-03-22T02:56:27Z

Hi @AndreasKaratzas, the pre-commit checks have failed. Please run:

uv pip install pre-commit>=4.5.1
pre-commit install
pre-commit run --all-files

Then, commit the changes and push to your branch.

For future commits, pre-commit will run automatically on changed files before each commit.

Tip

Is mypy failing?

mypy is run differently in CI. If the failure is related to this check, please use the following command to run it locally:

# For mypy (substitute "3.10" with the failing version if needed)
pre-commit run --hook-stage manual mypy-3.10

Signed-off-by: Andreas Karatzas <akaratza@amd.com>

AndreasKaratzas · 2026-03-22T08:25:59Z

Putting this PR in draft mode, as CI regression seems not to be addressed by the PR. The most straight-forward solution probably is to revert the problematic PR.

mergify · 2026-03-22T19:26:46Z

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @AndreasKaratzas.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

… in AiterExperts Signed-off-by: Andreas Karatzas <akaratza@amd.com>

BowenBao · 2026-03-24T01:11:38Z

Please share if there's a discussion with upstream folks. #37128 broke CI, and rather than reverting it to figure out a proper solution, there's pressure to quickly land this PR. The size -> shape change and padding logic feel like quick(hacky) band-aids that will require the ROCm team to do more follow-up work down the line.

I don't feel this aligns with OSS best practices. However, if the maintainers are aware of this and have agreed it's the best way forward, I have nothing to add. cc @tjtanaa, @gshtras, @dllehr-amd, @ChuanLi1101

AndreasKaratzas · 2026-03-24T01:36:41Z

Please share if there's a discussion with upstream folks. #37128 broke CI, and rather than reverting it to figure out a proper solution, there's pressure to quickly land this PR. The size -> shape change and padding logic feel like quick(hacky) band-aids that will require the ROCm team to do more follow-up work down the line.

I don't feel this aligns with OSS best practices. However, if the maintainers are aware of this and have agreed it's the best way forward, I have nothing to add. cc @tjtanaa, @gshtras, @dllehr-amd, @ChuanLi1101

The .size() in to .shape change isn't a band-aid, it's the correct fix. triton_kernels.tensor.Tensor (used for MXFP4 swizzled weights since #37128) exposes .shape but not .size() or .dim(). .shape is the common interface that works with both tensor types and is the more Pythonic/numpy-standard accessor anyway. There's no follow-up work needed here.

On the padding logic, I hear the concern about it being incremental. The CK MXFP4 kernels on gfx950 require 256-byte aligned dimensions, and the padding was already happening in create_weights but wasn't being properly communicated to the layer/config.

gshtras · 2026-03-24T15:39:41Z

This appears to correctly fix the fallback to Triton caused by #35893 on gpt-oss-120b
lm_eval scores are back to normal when using CK MoE

…s_quant_scheme Signed-off-by: Andreas Karatzas <akaratza@amd.com>

gshtras · 2026-03-24T19:35:53Z

vllm/model_executor/layers/fused_moe/oracle/mxfp4.py

+            # the triton_kernels/aiter side. This matches pre-#37128.
+            raise NotImplementedError(
+                "Mxfp4 LoRA is only supported on CUDA. "
+                "ROCm support is blocked by triton_kernels.tensor.Tensor "


Nit. not is_cuda doesn't automatically mean rocm. Other platforms may get confused by this message.

Signed-off-by: Andreas Karatzas <akaratza@amd.com>

(vllm-project#37787) Signed-off-by: Andreas Karatzas <akaratza@amd.com>

(vllm-project#37787) Signed-off-by: Andreas Karatzas <akaratza@amd.com> Signed-off-by: Michel Belleau <michel.belleau@malaiwah.com>

(vllm-project#37787) Signed-off-by: Andreas Karatzas <akaratza@amd.com> Signed-off-by: Monishver Chandrasekaran <monishverchandrasekaran@gmail.com>

(vllm-project#37787) Signed-off-by: Andreas Karatzas <akaratza@amd.com> Signed-off-by: Nithin Chalapathi <nithin.ch10@gmail.com>

(vllm-project#37787) Signed-off-by: Andreas Karatzas <akaratza@amd.com>

(vllm-project#37787) Signed-off-by: Andreas Karatzas <akaratza@amd.com> Signed-off-by: Vinay Damodaran <vrdn@hey.com>

(vllm-project#37787) Signed-off-by: Andreas Karatzas <akaratza@amd.com> Signed-off-by: EricccYang <yangyang4991@gmail.com>

(vllm-project#37787) Signed-off-by: Andreas Karatzas <akaratza@amd.com> Signed-off-by: bhargav-patel-29 <bhargav.patel@tihiitb.org>

(vllm-project#37787) Signed-off-by: Andreas Karatzas <akaratza@amd.com>

(vllm-project#37787) Signed-off-by: Andreas Karatzas <akaratza@amd.com> Signed-off-by: rishitdholakia13 <rishit+github@cohere.com>

(vllm-project#37787) Signed-off-by: Andreas Karatzas <akaratza@amd.com> Signed-off-by: Rishi Puri <riship@nvidia.com>

(vllm-project#37787) Signed-off-by: Andreas Karatzas <akaratza@amd.com>

AndreasKaratzas added 3 commits March 21, 2026 21:09

[Bugfix][ROCm][MoE] Fix mxfp4 oracle regressions from vllm-project#37128

2e2a84f

Signed-off-by: Andreas Karatzas <akaratza@amd.com>

[Bugfix][ROCm][MoE] Fix mxfp4 oracle regressions from vllm-project#37128

163c5a2

Signed-off-by: Andreas Karatzas <akaratza@amd.com>

[Bugfix][ROCm][MoE] Fix mxfp4 oracle regressions from vllm-project#37128

3dc0ebc

Signed-off-by: Andreas Karatzas <akaratza@amd.com>

AndreasKaratzas marked this pull request as ready for review March 22, 2026 02:52

AndreasKaratzas requested review from jeejeelee, mgoin, pavanimajety, robertgshaw2-redhat, tjtanaa, tlrmchlsmth and yewentao256 as code owners March 22, 2026 02:53

AndreasKaratzas added the rocm Related to AMD ROCm label Mar 22, 2026

github-project-automation bot added this to AMD Mar 22, 2026

AndreasKaratzas added the ready ONLY add when PR is ready to merge/full CI is needed label Mar 22, 2026

github-project-automation bot moved this to Todo in AMD Mar 22, 2026

AndreasKaratzas mentioned this pull request Mar 22, 2026

Revert "[MoE Refactor] Mxfp4 oracle rebased" (#37128) #37786

Draft

mergify bot added gpt-oss Related to GPT-OSS models bug Something isn't working labels Mar 22, 2026

github-project-automation bot moved this to To Triage in gpt-oss Issues & Enhancements Mar 22, 2026

github-project-automation bot added this to gpt-oss Issues & Enhancements Mar 22, 2026

AndreasKaratzas mentioned this pull request Mar 22, 2026

[MoE Refactor] Mxfp4 oracle rebased #37128

Merged

5 tasks

gemini-code-assist bot reviewed Mar 22, 2026

View reviewed changes

vllm/model_executor/layers/fused_moe/oracle/mxfp4.py Outdated Show resolved Hide resolved

AndreasKaratzas added 3 commits March 21, 2026 21:57

[Bugfix][ROCm][MoE] Fix mxfp4 oracle regressions from vllm-project#37128

215c359

Signed-off-by: Andreas Karatzas <akaratza@amd.com>

[Bugfix][ROCm][MoE] Fix mxfp4 oracle regressions from vllm-project#37128

6d0dc26

Signed-off-by: Andreas Karatzas <akaratza@amd.com>

Merge remote-tracking branch 'origin/main' into akaratza_fix_gptoss

1d95d21

mergify bot added the ci/build label Mar 22, 2026

AndreasKaratzas marked this pull request as draft March 22, 2026 08:26

[ROCm][CI] Fix CK MXFP4 double-rounding and premature alignment check…

ec0a098

… in AiterExperts Signed-off-by: Andreas Karatzas <akaratza@amd.com>

AndreasKaratzas added 2 commits March 24, 2026 12:10

[ROCm] Move MXFP4 gfx950 check from is_supported_config into _support…

77cc55a

…s_quant_scheme Signed-off-by: Andreas Karatzas <akaratza@amd.com>

Merge remote-tracking branch 'origin/main' into akaratza_fix_gptoss

f7393b3

gshtras reviewed Mar 24, 2026

View reviewed changes

Make mxfp4 LoRA error message platform-agnostic

460f8d3

Signed-off-by: Andreas Karatzas <akaratza@amd.com>

Rohan138 approved these changes Mar 24, 2026

View reviewed changes

gshtras approved these changes Mar 24, 2026

View reviewed changes

github-project-automation bot moved this from To Triage to Ready in gpt-oss Issues & Enhancements Mar 24, 2026

Rohan138 mentioned this pull request Mar 24, 2026

{ROCm]: gpt-oss fusion/padding fixes #38043

Merged

5 tasks

tjtanaa merged commit 679c6a3 into vllm-project:main Mar 25, 2026
72 of 73 checks passed

github-project-automation bot moved this from Todo to Done in AMD Mar 25, 2026

github-project-automation bot moved this from Ready to Done in gpt-oss Issues & Enhancements Mar 25, 2026

AndreasKaratzas deleted the akaratza_fix_gptoss branch March 25, 2026 00:46

RhizoNymph pushed a commit to RhizoNymph/vllm that referenced this pull request Mar 26, 2026

[Bugfix][ROCm][MoE] Fix mxfp4 oracle regressions from vllm-project#37128

b9df541

(vllm-project#37787) Signed-off-by: Andreas Karatzas <akaratza@amd.com>

HenryTangDev pushed a commit to HenryTangMain/vllm that referenced this pull request Mar 27, 2026

[Bugfix][ROCm][MoE] Fix mxfp4 oracle regressions from vllm-project#37128

b5470b5

(vllm-project#37787) Signed-off-by: Andreas Karatzas <akaratza@amd.com>

JiantaoXu pushed a commit to JiantaoXu/vllm that referenced this pull request Mar 28, 2026

[Bugfix][ROCm][MoE] Fix mxfp4 oracle regressions from vllm-project#37128

a4b77f8

(vllm-project#37787) Signed-off-by: Andreas Karatzas <akaratza@amd.com>

vrdn-23 pushed a commit to vrdn-23/vllm that referenced this pull request Mar 30, 2026

[Bugfix][ROCm][MoE] Fix mxfp4 oracle regressions from vllm-project#37128

f04a188

(vllm-project#37787) Signed-off-by: Andreas Karatzas <akaratza@amd.com> Signed-off-by: Vinay Damodaran <vrdn@hey.com>

liuchenbing2026 pushed a commit to liuchenbing2026/vllm that referenced this pull request Apr 4, 2026

[Bugfix][ROCm][MoE] Fix mxfp4 oracle regressions from vllm-project#37128

2831be1

(vllm-project#37787) Signed-off-by: Andreas Karatzas <akaratza@amd.com>

puririshi98 pushed a commit to puririshi98/vllm that referenced this pull request Apr 7, 2026

[Bugfix][ROCm][MoE] Fix mxfp4 oracle regressions from vllm-project#37128

19cb9ec

(vllm-project#37787) Signed-off-by: Andreas Karatzas <akaratza@amd.com> Signed-off-by: Rishi Puri <riship@nvidia.com>

big-yellow-duck pushed a commit to EmbeddedLLM/vllm that referenced this pull request Apr 8, 2026

[Bugfix][ROCm][MoE] Fix mxfp4 oracle regressions from vllm-project#37128

551871a

(vllm-project#37787) Signed-off-by: Andreas Karatzas <akaratza@amd.com>

mtparet pushed a commit to blackfuel-ai/vllm that referenced this pull request Apr 9, 2026

[Bugfix][ROCm][MoE] Fix mxfp4 oracle regressions from vllm-project#37128

f8f05d2

(vllm-project#37787) Signed-off-by: Andreas Karatzas <akaratza@amd.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Bugfix][ROCm][MoE] Fix mxfp4 oracle regressions from #37128#37787

[Bugfix][ROCm][MoE] Fix mxfp4 oracle regressions from #37128#37787
tjtanaa merged 15 commits intovllm-project:mainfrom
ROCm:akaratza_fix_gptoss

AndreasKaratzas commented Mar 22, 2026 •

edited

Loading

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

Uh oh!

mergify bot commented Mar 22, 2026

Uh oh!

AndreasKaratzas commented Mar 22, 2026

Uh oh!

mergify bot commented Mar 22, 2026

Uh oh!

BowenBao commented Mar 24, 2026

Uh oh!

AndreasKaratzas commented Mar 24, 2026

Uh oh!

gshtras commented Mar 24, 2026

Uh oh!

gshtras Mar 24, 2026

Uh oh!

AndreasKaratzas Mar 24, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

Uh oh!

Conversation

AndreasKaratzas commented Mar 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

mergify bot commented Mar 22, 2026

Uh oh!

AndreasKaratzas commented Mar 22, 2026

Uh oh!

mergify bot commented Mar 22, 2026

Uh oh!

BowenBao commented Mar 24, 2026

Uh oh!

AndreasKaratzas commented Mar 24, 2026

Uh oh!

gshtras commented Mar 24, 2026

Uh oh!

gshtras Mar 24, 2026

Choose a reason for hiding this comment

Uh oh!

AndreasKaratzas Mar 24, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

AndreasKaratzas commented Mar 22, 2026 •

edited

Loading