Skip to content

[Bugfix][Hardware][AMD] Gate FP4 BMM on gfx950 to fix MI300X crash#35103

Closed
c0de128 wants to merge 1 commit intovllm-project:mainfrom
c0de128:fix-fp4bmm-gfx950
Closed

[Bugfix][Hardware][AMD] Gate FP4 BMM on gfx950 to fix MI300X crash#35103
c0de128 wants to merge 1 commit intovllm-project:mainfrom
c0de128:fix-fp4bmm-gfx950

Conversation

@c0de128
Copy link
Contributor

@c0de128 c0de128 commented Feb 23, 2026

Summary

  • Gate is_fp4bmm_enabled() on on_gfx950() so MI300X/MI325X (gfx942) gracefully falls back to FP8 instead of crashing with RuntimeError: MXFP4 quantization is not supported on gfx942
  • MXFP4 requires CDNA4 hardware (gfx950: MI350X/MI355X)

Test plan

  • On MI300X (gfx942): is_fp4bmm_enabled() returns False, FP8 fallback used — no crash
  • On MI350X (gfx950): FP4 BMM still enabled and functional
  • pre-commit run --all-files passes

Fixes #34641

@c0de128 c0de128 requested a review from tjtanaa as a code owner February 23, 2026 14:29
@mergify mergify bot added rocm Related to AMD ROCm bug Something isn't working labels Feb 23, 2026
@github-project-automation github-project-automation bot moved this to Todo in AMD Feb 23, 2026
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

The pull request successfully addresses the reported crash on MI300X (gfx942) hardware by gating the FP4 Batched Matrix Multiply (BMM) feature to only be active on supported CDNA4 hardware (gfx950). By adding the on_gfx950() check within is_fp4bmm_enabled(), the system will now correctly fall back to FP8 on MI300X/MI325X instead of attempting to use unsupported MXFP4 operations. The implementation follows the repository's existing pattern of using local imports to handle circular dependencies between the operations and platform modules.

@c0de128
Copy link
Contributor Author

c0de128 commented Feb 24, 2026

@hongxiayang Could you take a look at this when you get a chance? This addresses the same issue as #34647 (gating FP4 BMM on gfx950 to fix MI300X crash) but with a minimal 3-line change in is_fp4bmm_enabled() rather than modifying the fused_moe dispatch logic.

@c0de128
Copy link
Contributor Author

c0de128 commented Feb 24, 2026

Note: this is a minimal 3-line alternative to #34647, which has been inactive for 6 days. @hongxiayang you noted #34647 was "very verbose" — this PR gates FP4 BMM entirely within is_fp4bmm_enabled() with just an on_gfx950() check, no other files touched. AMD CI Build #5331 is green (0 hard failures).

@c0de128
Copy link
Contributor Author

c0de128 commented Feb 24, 2026

@BowenBao This PR implements exactly the approach you suggested on #34647 — a single on_gfx950() gate in is_fp4bmm_enabled(), 3 lines changed, no other files touched. AMD CI is green (Build #5331). Would you be able to review?

MXFP4 quantization requires CDNA4 hardware (gfx950). Gate
is_fp4bmm_enabled() on on_gfx950() so MI300X/MI325X (gfx942)
gracefully falls back to FP8 instead of crashing.

Fixes vllm-project#34641

Signed-off-by: c0de128 <kevin.mckay@outlook.com>
@c0de128
Copy link
Contributor Author

c0de128 commented Feb 24, 2026

Rebased onto latest main to keep the branch current.

For context: this is the minimal alternative to #34647 (which received CHANGES_REQUESTED). On the underlying issue (#34641), the consensus direction was to gate at the is_fp4bmm_enabled() level rather than in the attention backend. This fix follows the existing on_gfx950() pattern already used elsewhere in _aiter_ops.py (e.g., aiter_moe_align_block_size, aiter_topk_softmax).

The change is 3 lines — adds an on_gfx950() check to is_fp4bmm_enabled() so MI300X correctly falls back to FP8 instead of crashing on unsupported MXFP4 ops.

@c0de128
Copy link
Contributor Author

c0de128 commented Feb 25, 2026

Closing in favor of #35250, which includes this fix along with the same gfx950 gate for is_asm_fp4_gemm_dynamic_quant_enabled().

@c0de128 c0de128 closed this Feb 25, 2026
@github-project-automation github-project-automation bot moved this from Todo to Done in AMD Feb 25, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working rocm Related to AMD ROCm

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

[ROCm] Default VLLM_ROCM_USE_AITER_FP4BMM=True crashes on MI300X (gfx942)

2 participants