[Bugfix] Register fp8 cutlass_group_gemm as supported for only SM90+SM100 by mgoin · Pull Request #33285 · vllm-project/vllm

mgoin · 2026-01-28T22:22:14Z

Purpose

We only have implementation for SM90 and SM100, so we should properly restrict for the FP8 oracle to work. Without this change users on SM120 would default to this kernel backend and see an unsupported error when it should be using the Triton kernel.

csrc/quantization/w8a8/cutlass/moe/grouped_mm_c3x_sm90.cu
csrc/quantization/w8a8/cutlass/moe/grouped_mm_c3x_sm100.cu

Test Plan

Test Result

Tested manually by locally changing the condition to disqualify cuda_device_capability == 100, where my resulting kernel selection changed from

(Worker_TP0 pid=1113294) INFO 01-28 17:18:33 [fp8.py:329] Using VLLM_CUTLASS Fp8 MoE backend out of potential backends: ['AITER', 'FLASHINFER_TRTLLM', 'FLASHINFER_CUTLASS', 'DEEPGEMM', 'BATCHED_DEEPGEMM', 'VLLM_CUTLASS', 'BATCHED_VLLM_CUTLASS', 'TRITON', 'BATCHED_TRITON', 'MARLIN'].

to

(Worker_TP0 pid=1118221) INFO 01-28 17:19:55 [fp8.py:329] Using TRITON Fp8 MoE backend out of potential backends: ['AITER', 'FLASHINFER_TRTLLM', 'FLASHINFER_CUTLASS', 'DEEPGEMM', 'BATCHED_DEEPGEMM', 'VLLM_CUTLASS', 'BATCHED_VLLM_CUTLASS', 'TRITON', 'BATCHED_TRITON', 'MARLIN'].

Essential Elements of an Effective PR Description Checklist

The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
The test plan, such as providing test command.
The test results, such as pasting the results comparison before and after, or e2e results
(Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
(Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

…M100 Signed-off-by: mgoin <mgoin64@gmail.com>

gemini-code-assist

Code Review

This pull request correctly restricts the cutlass_group_gemm to be supported only on architectures with compute capabilities 9.x (Hopper) and 10.x (Blackwell). The added check is straightforward and effectively prevents the kernel from being used on unsupported hardware, which resolves the underlying bug. The implementation is well-placed and looks good.

…M100 (vllm-project#33285) Signed-off-by: mgoin <mgoin64@gmail.com>

…M100 (#33285) Signed-off-by: mgoin <mgoin64@gmail.com> (cherry picked from commit 1bd47d6)

…M100 (vllm-project#33285) Signed-off-by: mgoin <mgoin64@gmail.com> Signed-off-by: PiratePai <416932041@qq.com> Signed-off-by: Pai <416932041@qq.com>

…M100 (vllm-project#33285) Signed-off-by: mgoin <mgoin64@gmail.com>

Properly register fp8 cutlass_group_gemm as supported for only SM90+S…

e848f0a

…M100 Signed-off-by: mgoin <mgoin64@gmail.com>

mergify bot added nvidia bug Something isn't working labels Jan 28, 2026

github-project-automation bot added this to NVIDIA Jan 28, 2026

gemini-code-assist bot reviewed Jan 28, 2026

View reviewed changes

mgoin requested a review from robertgshaw2-redhat January 28, 2026 22:24

mgoin added the ready ONLY add when PR is ready to merge/full CI is needed label Jan 28, 2026

vllm-bot merged commit 1bd47d6 into vllm-project:main Jan 29, 2026
48 of 49 checks passed

github-project-automation bot moved this to Done in NVIDIA Jan 29, 2026

mgoin added this to the v0.15.1 Hotfix milestone Jan 29, 2026

apd10 pushed a commit to apd10/vllm that referenced this pull request Jan 31, 2026

[Bugfix] Register fp8 cutlass_group_gemm as supported for only SM90+S…

ee1156f

…M100 (vllm-project#33285) Signed-off-by: mgoin <mgoin64@gmail.com>

khluu pushed a commit that referenced this pull request Feb 2, 2026

[Bugfix] Register fp8 cutlass_group_gemm as supported for only SM90+S…

39e8b49

…M100 (#33285) Signed-off-by: mgoin <mgoin64@gmail.com> (cherry picked from commit 1bd47d6)

ItzDEXX pushed a commit to ItzDEXX/vllm that referenced this pull request Feb 19, 2026

[Bugfix] Register fp8 cutlass_group_gemm as supported for only SM90+S…

8b83915

…M100 (vllm-project#33285) Signed-off-by: mgoin <mgoin64@gmail.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Bugfix] Register fp8 cutlass_group_gemm as supported for only SM90+SM100#33285

[Bugfix] Register fp8 cutlass_group_gemm as supported for only SM90+SM100#33285
vllm-bot merged 1 commit intovllm-project:mainfrom
neuralmagic:fix-cutlass_group_gemm_supported

mgoin commented Jan 28, 2026 •

edited by github-actions bot

Loading

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

mgoin commented Jan 28, 2026 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Test Plan

Test Result

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

mgoin commented Jan 28, 2026 •

edited by github-actions bot

Loading