Skip to content

[Bugfix] Add SM110/SM120 device capability checks for NVFP4 MoE backends#33516

Closed
Code4me2 wants to merge 1 commit intovllm-project:mainfrom
Code4me2:fix/sm120-sm121-capability-detection
Closed

[Bugfix] Add SM110/SM120 device capability checks for NVFP4 MoE backends#33516
Code4me2 wants to merge 1 commit intovllm-project:mainfrom
Code4me2:fix/sm120-sm121-capability-detection

Conversation

@Code4me2
Copy link
Contributor

@Code4me2 Code4me2 commented Feb 1, 2026

Summary

Extend device capability checks to include SM110 and SM120 GPU families in NVFP4 MoE backend selection code that was missed in PR #33417.

Problem

After PR #33417, NVFP4 MoE models still fail on RTX Blackwell GPUs (SM120) and DGX Spark (SM121) with errors like:

RuntimeError: FlashInfer-CUTLASS MoE kernel does not support current device

Root Cause

PR #33417 updated flashinfer_cutlass_moe.py and cutlass_moe.py but missed these files which still only check for SM100:

  • flashinfer_fp4_moe.py
  • flashinfer_trtllm_moe.py
  • flashinfer_cutedsl_moe.py
  • flashinfer_utils.py

Solution

Add explicit family checks for SM100/110/120, matching the approach in the files updated by #33417:

p.is_device_capability_family(100)
or p.is_device_capability_family(110)
or p.is_device_capability_family(120)

This enables support for:

  • SM100-109: Blackwell data center (B100, B200)
  • SM110-119: Future Blackwell variants
  • SM120-129: Blackwell consumer/workstation (RTX 5090, DGX Spark GB10)

Files Changed

  • vllm/model_executor/layers/quantization/utils/flashinfer_fp4_moe.py
  • vllm/model_executor/layers/fused_moe/flashinfer_trtllm_moe.py
  • vllm/model_executor/layers/fused_moe/flashinfer_cutedsl_moe.py
  • vllm/model_executor/layers/quantization/utils/flashinfer_utils.py

Testing

Tested on RTX 5090 (SM120) and DGX Spark GB10 (SM121) with nvidia/NVIDIA-Nemotron-3-Nano-30B-A3B-NVFP4

Related Issues

Fixes #31085
Fixes #30135
Fixes #29141
Related to #28589

@dosubot
Copy link

dosubot bot commented Feb 1, 2026

Related Documentation

No published documentation to review for changes on this repository.

Write your first living document

How did I do? Any feedback?  Join Discord

@mergify mergify bot added nvidia bug Something isn't working labels Feb 1, 2026
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request correctly extends device capability checks to include SM110 and SM120 GPU families, fixing the reported bug. The changes are applied across all necessary files. However, the logic for checking the device capability is duplicated in multiple places. This code duplication is a maintainability risk, as it was the root cause of the issue this PR is fixing (some files were missed in a previous update). I've added suggestions to refactor this duplicated logic using any() for better readability and to make future updates easier and less error-prone.

Comment on lines +59 to +63
return p.is_cuda() and (
p.is_device_capability_family(100)
or p.is_device_capability_family(110)
or p.is_device_capability_family(120)
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

This device capability check is duplicated across multiple files. To improve maintainability and avoid future bugs where one location is updated but others are missed, consider using any() with a generator expression. This makes the code more concise and easier to update.

        return p.is_cuda() and any(p.is_device_capability_family(sm) for sm in (100, 110, 120))

Comment on lines +33 to +37
return p.is_cuda() and (
p.is_device_capability_family(100)
or p.is_device_capability_family(110)
or p.is_device_capability_family(120)
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

This device capability check is duplicated across multiple files. To improve maintainability and avoid future bugs where one location is updated but others are missed, consider using any() with a generator expression. This makes the code more concise and easier to update.

    return p.is_cuda() and any(p.is_device_capability_family(sm) for sm in (100, 110, 120))

Comment on lines +47 to +51
return p.is_cuda() and (
p.is_device_capability_family(100)
or p.is_device_capability_family(110)
or p.is_device_capability_family(120)
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

This device capability check is duplicated across multiple files. To improve maintainability and avoid future bugs where one location is updated but others are missed, consider using any() with a generator expression. This makes the code more concise and easier to update.

    return p.is_cuda() and any(p.is_device_capability_family(sm) for sm in (100, 110, 120))

Comment on lines +171 to +175
is_blackwell = (
current_platform.is_device_capability_family(100)
or current_platform.is_device_capability_family(110)
or current_platform.is_device_capability_family(120)
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

This device capability check is duplicated across multiple files. To improve maintainability and avoid future bugs where one location is updated but others are missed, consider using any() with a generator expression. This makes the code more concise and easier to update.

    is_blackwell = any(current_platform.is_device_capability_family(sm) for sm in (100, 110, 120))

@Code4me2 Code4me2 force-pushed the fix/sm120-sm121-capability-detection branch from 3481348 to 5cd21d6 Compare February 1, 2026 19:57
Extend device capability checks to include SM110 and SM120 GPU families,
matching the approach used in flashinfer_cutlass_moe.py and cutlass_moe.py
after PR vllm-project#33417.

These files were not updated in vllm-project#33417 and still only checked for SM100:
- flashinfer_fp4_moe.py
- flashinfer_trtllm_moe.py
- flashinfer_cutedsl_moe.py
- flashinfer_utils.py

The fix adds explicit family checks for SM100/110/120 using any() for
cleaner, more maintainable code, enabling support for:
- SM100-109: Blackwell data center (B100, B200)
- SM110-119: Future Blackwell variants
- SM120-129: Blackwell consumer/workstation (RTX 5090, DGX Spark GB10)

Tested on RTX 5090 (SM120) and DGX Spark GB10 (SM121) with
nvidia/NVIDIA-Nemotron-3-Nano-30B-A3B-NVFP4

Signed-off-by: code4me2 <velvetmoon222999@gmail.com>
@Code4me2 Code4me2 force-pushed the fix/sm120-sm121-capability-detection branch from 5cd21d6 to c35fcc7 Compare February 1, 2026 20:06
@johnnynunez
Copy link
Contributor

johnnynunez commented Feb 1, 2026

thanks! LGTM @mgoin

@Code4me2 Code4me2 force-pushed the fix/sm120-sm121-capability-detection branch 2 times, most recently from f0643d6 to c35fcc7 Compare February 1, 2026 21:39
@mgoin
Copy link
Member

mgoin commented Feb 2, 2026

@Code4me2 how is this correct? It was intentional to avoid sm120 for flashinfer_trtllm_moe and flashinfer_cutedsl_moe as these are unsupported on sm120. Maybe sm110 can be included but I need more specific validation.

Regarding this problem:

After PR #33417, NVFP4 MoE models still fail on RTX Blackwell GPUs (SM120) and DGX Spark (SM121) with errors like:

RuntimeError: FlashInfer-CUTLASS MoE kernel does not support current device

I'm not sure how these changes resolve that since they don't affect flashinfer_cutlass_moe at all. Can you give more context?

@Code4me2
Copy link
Contributor Author

Code4me2 commented Feb 2, 2026

@mgoin yeah sorry about this one. The series of the 3 PRs were me trying to get nemotron nano NVFP4 working on the dgx spark and RTX 5090, and I tried multiple different things. This was a lapse in judgement, because it was the other two PRs that fixed the issue. Apologies. Closing now.

@Code4me2 Code4me2 closed this Feb 2, 2026
@github-project-automation github-project-automation bot moved this to Done in NVIDIA Feb 2, 2026
@mgoin
Copy link
Member

mgoin commented Feb 2, 2026

No worries I thought I was missing something, thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working nvidia

Projects

Status: Done

3 participants