[Bugfix] Add SM110/SM120 device capability checks for NVFP4 MoE backends by Code4me2 · Pull Request #33516 · vllm-project/vllm

Code4me2 · 2026-02-01T19:48:10Z

Summary

Extend device capability checks to include SM110 and SM120 GPU families in NVFP4 MoE backend selection code that was missed in PR #33417.

Problem

After PR #33417, NVFP4 MoE models still fail on RTX Blackwell GPUs (SM120) and DGX Spark (SM121) with errors like:

RuntimeError: FlashInfer-CUTLASS MoE kernel does not support current device

Root Cause

PR #33417 updated flashinfer_cutlass_moe.py and cutlass_moe.py but missed these files which still only check for SM100:

flashinfer_fp4_moe.py
flashinfer_trtllm_moe.py
flashinfer_cutedsl_moe.py
flashinfer_utils.py

Solution

Add explicit family checks for SM100/110/120, matching the approach in the files updated by #33417:

p.is_device_capability_family(100)
or p.is_device_capability_family(110)
or p.is_device_capability_family(120)

This enables support for:

SM100-109: Blackwell data center (B100, B200)
SM110-119: Future Blackwell variants
SM120-129: Blackwell consumer/workstation (RTX 5090, DGX Spark GB10)

Files Changed

vllm/model_executor/layers/quantization/utils/flashinfer_fp4_moe.py
vllm/model_executor/layers/fused_moe/flashinfer_trtllm_moe.py
vllm/model_executor/layers/fused_moe/flashinfer_cutedsl_moe.py
vllm/model_executor/layers/quantization/utils/flashinfer_utils.py

Testing

Tested on RTX 5090 (SM120) and DGX Spark GB10 (SM121) with nvidia/NVIDIA-Nemotron-3-Nano-30B-A3B-NVFP4

Related Issues

Fixes #31085
Fixes #30135
Fixes #29141
Related to #28589

dosubot · 2026-02-01T19:48:19Z

Related Documentation

No published documentation to review for changes on this repository.

Write your first living document

^{How did I do? Any feedback?}

gemini-code-assist

Code Review

This pull request correctly extends device capability checks to include SM110 and SM120 GPU families, fixing the reported bug. The changes are applied across all necessary files. However, the logic for checking the device capability is duplicated in multiple places. This code duplication is a maintainability risk, as it was the root cause of the issue this PR is fixing (some files were missed in a previous update). I've added suggestions to refactor this duplicated logic using any() for better readability and to make future updates easier and less error-prone.

gemini-code-assist · 2026-02-01T19:49:20Z

vllm/model_executor/layers/fused_moe/flashinfer_cutedsl_moe.py

+        return p.is_cuda() and (
+            p.is_device_capability_family(100)
+            or p.is_device_capability_family(110)
+            or p.is_device_capability_family(120)
+        )


This device capability check is duplicated across multiple files. To improve maintainability and avoid future bugs where one location is updated but others are missed, consider using any() with a generator expression. This makes the code more concise and easier to update.

return p.is_cuda() and any(p.is_device_capability_family(sm) for sm in (100, 110, 120))

gemini-code-assist · 2026-02-01T19:49:20Z

vllm/model_executor/layers/fused_moe/flashinfer_trtllm_moe.py

+    return p.is_cuda() and (
+        p.is_device_capability_family(100)
+        or p.is_device_capability_family(110)
+        or p.is_device_capability_family(120)
+    )


This device capability check is duplicated across multiple files. To improve maintainability and avoid future bugs where one location is updated but others are missed, consider using any() with a generator expression. This makes the code more concise and easier to update.

return p.is_cuda() and any(p.is_device_capability_family(sm) for sm in (100, 110, 120))

gemini-code-assist · 2026-02-01T19:49:20Z

vllm/model_executor/layers/quantization/utils/flashinfer_fp4_moe.py

+    return p.is_cuda() and (
+        p.is_device_capability_family(100)
+        or p.is_device_capability_family(110)
+        or p.is_device_capability_family(120)
+    )


This device capability check is duplicated across multiple files. To improve maintainability and avoid future bugs where one location is updated but others are missed, consider using any() with a generator expression. This makes the code more concise and easier to update.

return p.is_cuda() and any(p.is_device_capability_family(sm) for sm in (100, 110, 120))

gemini-code-assist · 2026-02-01T19:49:20Z

vllm/model_executor/layers/quantization/utils/flashinfer_utils.py

+    is_blackwell = (
+        current_platform.is_device_capability_family(100)
+        or current_platform.is_device_capability_family(110)
+        or current_platform.is_device_capability_family(120)
+    )


This device capability check is duplicated across multiple files. To improve maintainability and avoid future bugs where one location is updated but others are missed, consider using any() with a generator expression. This makes the code more concise and easier to update.

is_blackwell = any(current_platform.is_device_capability_family(sm) for sm in (100, 110, 120))

Extend device capability checks to include SM110 and SM120 GPU families, matching the approach used in flashinfer_cutlass_moe.py and cutlass_moe.py after PR vllm-project#33417. These files were not updated in vllm-project#33417 and still only checked for SM100: - flashinfer_fp4_moe.py - flashinfer_trtllm_moe.py - flashinfer_cutedsl_moe.py - flashinfer_utils.py The fix adds explicit family checks for SM100/110/120 using any() for cleaner, more maintainable code, enabling support for: - SM100-109: Blackwell data center (B100, B200) - SM110-119: Future Blackwell variants - SM120-129: Blackwell consumer/workstation (RTX 5090, DGX Spark GB10) Tested on RTX 5090 (SM120) and DGX Spark GB10 (SM121) with nvidia/NVIDIA-Nemotron-3-Nano-30B-A3B-NVFP4 Signed-off-by: code4me2 <velvetmoon222999@gmail.com>

johnnynunez · 2026-02-01T20:12:53Z

thanks! LGTM @mgoin

mgoin · 2026-02-02T15:55:19Z

@Code4me2 how is this correct? It was intentional to avoid sm120 for flashinfer_trtllm_moe and flashinfer_cutedsl_moe as these are unsupported on sm120. Maybe sm110 can be included but I need more specific validation.

Regarding this problem:

After PR #33417, NVFP4 MoE models still fail on RTX Blackwell GPUs (SM120) and DGX Spark (SM121) with errors like:
RuntimeError: FlashInfer-CUTLASS MoE kernel does not support current device

I'm not sure how these changes resolve that since they don't affect flashinfer_cutlass_moe at all. Can you give more context?

Code4me2 · 2026-02-02T16:44:59Z

@mgoin yeah sorry about this one. The series of the 3 PRs were me trying to get nemotron nano NVFP4 working on the dgx spark and RTX 5090, and I tried multiple different things. This was a lapse in judgement, because it was the other two PRs that fixed the issue. Apologies. Closing now.

mgoin · 2026-02-02T17:42:33Z

No worries I thought I was missing something, thanks!

Code4me2 requested review from mgoin, pavanimajety, robertgshaw2-redhat, tlrmchlsmth and yewentao256 as code owners February 1, 2026 19:48

mergify bot added nvidia bug Something isn't working labels Feb 1, 2026

github-project-automation bot added this to NVIDIA Feb 1, 2026

gemini-code-assist bot reviewed Feb 1, 2026

View reviewed changes

Code4me2 force-pushed the fix/sm120-sm121-capability-detection branch from 3481348 to 5cd21d6 Compare February 1, 2026 19:57

Code4me2 force-pushed the fix/sm120-sm121-capability-detection branch from 5cd21d6 to c35fcc7 Compare February 1, 2026 20:06

Code4me2 force-pushed the fix/sm120-sm121-capability-detection branch 2 times, most recently from f0643d6 to c35fcc7 Compare February 1, 2026 21:39

Code4me2 closed this Feb 2, 2026

github-project-automation bot moved this to Done in NVIDIA Feb 2, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Bugfix] Add SM110/SM120 device capability checks for NVFP4 MoE backends#33516

[Bugfix] Add SM110/SM120 device capability checks for NVFP4 MoE backends#33516
Code4me2 wants to merge 1 commit intovllm-project:mainfrom
Code4me2:fix/sm120-sm121-capability-detection

Code4me2 commented Feb 1, 2026

Uh oh!

dosubot bot commented Feb 1, 2026

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Feb 1, 2026

Uh oh!

gemini-code-assist bot Feb 1, 2026

Uh oh!

gemini-code-assist bot Feb 1, 2026

Uh oh!

gemini-code-assist bot Feb 1, 2026

Uh oh!

johnnynunez commented Feb 1, 2026 •

edited

Loading

Uh oh!

mgoin commented Feb 2, 2026

Uh oh!

Code4me2 commented Feb 2, 2026

Uh oh!

mgoin commented Feb 2, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

Conversation

Code4me2 commented Feb 1, 2026

Summary

Problem

Root Cause

Solution

Files Changed

Testing

Related Issues

Uh oh!

dosubot bot commented Feb 1, 2026

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Feb 1, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Feb 1, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Feb 1, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Feb 1, 2026

Choose a reason for hiding this comment

Uh oh!

johnnynunez commented Feb 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mgoin commented Feb 2, 2026

Uh oh!

Code4me2 commented Feb 2, 2026

Uh oh!

mgoin commented Feb 2, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

johnnynunez commented Feb 1, 2026 •

edited

Loading