[Bugfix][ROCm][MoE] Fix mxfp4 oracle regressions from #37128#37787
[Bugfix][ROCm][MoE] Fix mxfp4 oracle regressions from #37128#37787tjtanaa merged 15 commits intovllm-project:mainfrom
Conversation
There was a problem hiding this comment.
Code Review
This pull request addresses several regressions related to MoE with mxfp4 quantization, primarily affecting ROCm platforms, which were introduced in a recent refactoring. The fixes include restoring platform-specific checks for the CK backend, correctly handling dimension padding, and resolving an issue with LoRA on NVIDIA. My review identifies a critical contradiction in one of the changes: while the PR description claims to enable mxfp4 LoRA on ROCm, the code continues to raise a NotImplementedError, albeit with a more descriptive message. This discrepancy needs to be resolved.
|
Hi @AndreasKaratzas, the pre-commit checks have failed. Please run: uv pip install pre-commit>=4.5.1
pre-commit install
pre-commit run --all-filesThen, commit the changes and push to your branch. For future commits, Tip Is
|
|
Putting this PR in draft mode, as CI regression seems not to be addressed by the PR. The most straight-forward solution probably is to revert the problematic PR. |
|
This pull request has merge conflicts that must be resolved before it can be |
… in AiterExperts Signed-off-by: Andreas Karatzas <akaratza@amd.com>
|
Please share if there's a discussion with upstream folks. #37128 broke CI, and rather than reverting it to figure out a proper solution, there's pressure to quickly land this PR. The size -> shape change and padding logic feel like quick(hacky) band-aids that will require the ROCm team to do more follow-up work down the line. I don't feel this aligns with OSS best practices. However, if the maintainers are aware of this and have agreed it's the best way forward, I have nothing to add. cc @tjtanaa, @gshtras, @dllehr-amd, @ChuanLi1101 |
The .size() in to .shape change isn't a band-aid, it's the correct fix. triton_kernels.tensor.Tensor (used for MXFP4 swizzled weights since #37128) exposes .shape but not .size() or .dim(). On the padding logic, I hear the concern about it being incremental. The CK MXFP4 kernels on gfx950 require 256-byte aligned dimensions, and the padding was already happening in create_weights but wasn't being properly communicated to the layer/config. |
|
This appears to correctly fix the fallback to Triton caused by #35893 on gpt-oss-120b |
…s_quant_scheme Signed-off-by: Andreas Karatzas <akaratza@amd.com>
| # the triton_kernels/aiter side. This matches pre-#37128. | ||
| raise NotImplementedError( | ||
| "Mxfp4 LoRA is only supported on CUDA. " | ||
| "ROCm support is blocked by triton_kernels.tensor.Tensor " |
There was a problem hiding this comment.
Nit. not is_cuda doesn't automatically mean rocm. Other platforms may get confused by this message.
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
(vllm-project#37787) Signed-off-by: Andreas Karatzas <akaratza@amd.com>
(vllm-project#37787) Signed-off-by: Andreas Karatzas <akaratza@amd.com>
(vllm-project#37787) Signed-off-by: Andreas Karatzas <akaratza@amd.com> Signed-off-by: Michel Belleau <michel.belleau@malaiwah.com>
(vllm-project#37787) Signed-off-by: Andreas Karatzas <akaratza@amd.com> Signed-off-by: Monishver Chandrasekaran <monishverchandrasekaran@gmail.com>
(vllm-project#37787) Signed-off-by: Andreas Karatzas <akaratza@amd.com> Signed-off-by: Nithin Chalapathi <nithin.ch10@gmail.com>
(vllm-project#37787) Signed-off-by: Andreas Karatzas <akaratza@amd.com>
(vllm-project#37787) Signed-off-by: Andreas Karatzas <akaratza@amd.com> Signed-off-by: Vinay Damodaran <vrdn@hey.com>
(vllm-project#37787) Signed-off-by: Andreas Karatzas <akaratza@amd.com> Signed-off-by: EricccYang <yangyang4991@gmail.com>
(vllm-project#37787) Signed-off-by: Andreas Karatzas <akaratza@amd.com> Signed-off-by: bhargav-patel-29 <bhargav.patel@tihiitb.org>
(vllm-project#37787) Signed-off-by: Andreas Karatzas <akaratza@amd.com>
(vllm-project#37787) Signed-off-by: Andreas Karatzas <akaratza@amd.com> Signed-off-by: rishitdholakia13 <rishit+github@cohere.com>
(vllm-project#37787) Signed-off-by: Andreas Karatzas <akaratza@amd.com> Signed-off-by: Rishi Puri <riship@nvidia.com>
(vllm-project#37787) Signed-off-by: Andreas Karatzas <akaratza@amd.com>
(vllm-project#37787) Signed-off-by: Andreas Karatzas <akaratza@amd.com>
Fixes several issues introduced by #37128 that broke gpt-oss on ROCm.
Tested on MI325X (gfx942):
Related:
cc @kenroche