[Bugfix][Hardware][AMD] Gate FP4 BMM on gfx950 to fix MI300X crash by c0de128 · Pull Request #35103 · vllm-project/vllm

c0de128 · 2026-02-23T14:29:04Z

Summary

Gate is_fp4bmm_enabled() on on_gfx950() so MI300X/MI325X (gfx942) gracefully falls back to FP8 instead of crashing with RuntimeError: MXFP4 quantization is not supported on gfx942
MXFP4 requires CDNA4 hardware (gfx950: MI350X/MI355X)

Test plan

On MI300X (gfx942): is_fp4bmm_enabled() returns False, FP8 fallback used — no crash
On MI350X (gfx950): FP4 BMM still enabled and functional
pre-commit run --all-files passes

gemini-code-assist

Code Review

The pull request successfully addresses the reported crash on MI300X (gfx942) hardware by gating the FP4 Batched Matrix Multiply (BMM) feature to only be active on supported CDNA4 hardware (gfx950). By adding the on_gfx950() check within is_fp4bmm_enabled(), the system will now correctly fall back to FP8 on MI300X/MI325X instead of attempting to use unsupported MXFP4 operations. The implementation follows the repository's existing pattern of using local imports to handle circular dependencies between the operations and platform modules.

c0de128 · 2026-02-24T14:55:02Z

@hongxiayang Could you take a look at this when you get a chance? This addresses the same issue as #34647 (gating FP4 BMM on gfx950 to fix MI300X crash) but with a minimal 3-line change in is_fp4bmm_enabled() rather than modifying the fused_moe dispatch logic.

c0de128 · 2026-02-24T17:00:23Z

Note: this is a minimal 3-line alternative to #34647, which has been inactive for 6 days. @hongxiayang you noted #34647 was "very verbose" — this PR gates FP4 BMM entirely within is_fp4bmm_enabled() with just an on_gfx950() check, no other files touched. AMD CI Build #5331 is green (0 hard failures).

c0de128 · 2026-02-24T19:31:20Z

@BowenBao This PR implements exactly the approach you suggested on #34647 — a single on_gfx950() gate in is_fp4bmm_enabled(), 3 lines changed, no other files touched. AMD CI is green (Build #5331). Would you be able to review?

MXFP4 quantization requires CDNA4 hardware (gfx950). Gate is_fp4bmm_enabled() on on_gfx950() so MI300X/MI325X (gfx942) gracefully falls back to FP8 instead of crashing. Fixes vllm-project#34641 Signed-off-by: c0de128 <kevin.mckay@outlook.com>

c0de128 · 2026-02-24T20:31:49Z

Rebased onto latest main to keep the branch current.

For context: this is the minimal alternative to #34647 (which received CHANGES_REQUESTED). On the underlying issue (#34641), the consensus direction was to gate at the is_fp4bmm_enabled() level rather than in the attention backend. This fix follows the existing on_gfx950() pattern already used elsewhere in _aiter_ops.py (e.g., aiter_moe_align_block_size, aiter_topk_softmax).

The change is 3 lines — adds an on_gfx950() check to is_fp4bmm_enabled() so MI300X correctly falls back to FP8 instead of crashing on unsupported MXFP4 ops.

c0de128 · 2026-02-25T00:47:35Z

Closing in favor of #35250, which includes this fix along with the same gfx950 gate for is_asm_fp4_gemm_dynamic_quant_enabled().

c0de128 requested a review from tjtanaa as a code owner February 23, 2026 14:29

mergify bot added rocm Related to AMD ROCm bug Something isn't working labels Feb 23, 2026

github-project-automation bot added this to AMD Feb 23, 2026

github-project-automation bot moved this to Todo in AMD Feb 23, 2026

gemini-code-assist bot reviewed Feb 23, 2026

View reviewed changes

c0de128 force-pushed the fix-fp4bmm-gfx950 branch from 5694483 to bb15718 Compare February 24, 2026 14:54

BowenBao approved these changes Feb 24, 2026

View reviewed changes

c0de128 force-pushed the fix-fp4bmm-gfx950 branch from bb15718 to 7e601a6 Compare February 24, 2026 20:31

c0de128 closed this Feb 25, 2026

github-project-automation bot moved this from Todo to Done in AMD Feb 25, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Bugfix][Hardware][AMD] Gate FP4 BMM on gfx950 to fix MI300X crash#35103

[Bugfix][Hardware][AMD] Gate FP4 BMM on gfx950 to fix MI300X crash#35103
c0de128 wants to merge 1 commit intovllm-project:mainfrom
c0de128:fix-fp4bmm-gfx950

c0de128 commented Feb 23, 2026

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

c0de128 commented Feb 24, 2026

Uh oh!

c0de128 commented Feb 24, 2026

Uh oh!

c0de128 commented Feb 24, 2026

Uh oh!

c0de128 commented Feb 24, 2026

Uh oh!

c0de128 commented Feb 25, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

c0de128 commented Feb 23, 2026

Summary

Test plan

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

c0de128 commented Feb 24, 2026

Uh oh!

c0de128 commented Feb 24, 2026

Uh oh!

c0de128 commented Feb 24, 2026

Uh oh!

c0de128 commented Feb 24, 2026

Uh oh!

c0de128 commented Feb 25, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants