[Bugfix] Add SM110/SM120 device capability checks for NVFP4 MoE backends#33516
[Bugfix] Add SM110/SM120 device capability checks for NVFP4 MoE backends#33516Code4me2 wants to merge 1 commit intovllm-project:mainfrom
Conversation
|
Related Documentation No published documentation to review for changes on this repository. |
There was a problem hiding this comment.
Code Review
This pull request correctly extends device capability checks to include SM110 and SM120 GPU families, fixing the reported bug. The changes are applied across all necessary files. However, the logic for checking the device capability is duplicated in multiple places. This code duplication is a maintainability risk, as it was the root cause of the issue this PR is fixing (some files were missed in a previous update). I've added suggestions to refactor this duplicated logic using any() for better readability and to make future updates easier and less error-prone.
| return p.is_cuda() and ( | ||
| p.is_device_capability_family(100) | ||
| or p.is_device_capability_family(110) | ||
| or p.is_device_capability_family(120) | ||
| ) |
There was a problem hiding this comment.
This device capability check is duplicated across multiple files. To improve maintainability and avoid future bugs where one location is updated but others are missed, consider using any() with a generator expression. This makes the code more concise and easier to update.
return p.is_cuda() and any(p.is_device_capability_family(sm) for sm in (100, 110, 120))| return p.is_cuda() and ( | ||
| p.is_device_capability_family(100) | ||
| or p.is_device_capability_family(110) | ||
| or p.is_device_capability_family(120) | ||
| ) |
There was a problem hiding this comment.
This device capability check is duplicated across multiple files. To improve maintainability and avoid future bugs where one location is updated but others are missed, consider using any() with a generator expression. This makes the code more concise and easier to update.
return p.is_cuda() and any(p.is_device_capability_family(sm) for sm in (100, 110, 120))| return p.is_cuda() and ( | ||
| p.is_device_capability_family(100) | ||
| or p.is_device_capability_family(110) | ||
| or p.is_device_capability_family(120) | ||
| ) |
There was a problem hiding this comment.
This device capability check is duplicated across multiple files. To improve maintainability and avoid future bugs where one location is updated but others are missed, consider using any() with a generator expression. This makes the code more concise and easier to update.
return p.is_cuda() and any(p.is_device_capability_family(sm) for sm in (100, 110, 120))| is_blackwell = ( | ||
| current_platform.is_device_capability_family(100) | ||
| or current_platform.is_device_capability_family(110) | ||
| or current_platform.is_device_capability_family(120) | ||
| ) |
There was a problem hiding this comment.
This device capability check is duplicated across multiple files. To improve maintainability and avoid future bugs where one location is updated but others are missed, consider using any() with a generator expression. This makes the code more concise and easier to update.
is_blackwell = any(current_platform.is_device_capability_family(sm) for sm in (100, 110, 120))3481348 to
5cd21d6
Compare
Extend device capability checks to include SM110 and SM120 GPU families, matching the approach used in flashinfer_cutlass_moe.py and cutlass_moe.py after PR vllm-project#33417. These files were not updated in vllm-project#33417 and still only checked for SM100: - flashinfer_fp4_moe.py - flashinfer_trtllm_moe.py - flashinfer_cutedsl_moe.py - flashinfer_utils.py The fix adds explicit family checks for SM100/110/120 using any() for cleaner, more maintainable code, enabling support for: - SM100-109: Blackwell data center (B100, B200) - SM110-119: Future Blackwell variants - SM120-129: Blackwell consumer/workstation (RTX 5090, DGX Spark GB10) Tested on RTX 5090 (SM120) and DGX Spark GB10 (SM121) with nvidia/NVIDIA-Nemotron-3-Nano-30B-A3B-NVFP4 Signed-off-by: code4me2 <velvetmoon222999@gmail.com>
5cd21d6 to
c35fcc7
Compare
|
thanks! LGTM @mgoin |
f0643d6 to
c35fcc7
Compare
|
@Code4me2 how is this correct? It was intentional to avoid sm120 for flashinfer_trtllm_moe and flashinfer_cutedsl_moe as these are unsupported on sm120. Maybe sm110 can be included but I need more specific validation. Regarding this problem:
I'm not sure how these changes resolve that since they don't affect flashinfer_cutlass_moe at all. Can you give more context? |
|
@mgoin yeah sorry about this one. The series of the 3 PRs were me trying to get nemotron nano NVFP4 working on the dgx spark and RTX 5090, and I tried multiple different things. This was a lapse in judgement, because it was the other two PRs that fixed the issue. Apologies. Closing now. |
|
No worries I thought I was missing something, thanks! |
Summary
Extend device capability checks to include SM110 and SM120 GPU families in NVFP4 MoE backend selection code that was missed in PR #33417.
Problem
After PR #33417, NVFP4 MoE models still fail on RTX Blackwell GPUs (SM120) and DGX Spark (SM121) with errors like:
Root Cause
PR #33417 updated
flashinfer_cutlass_moe.pyandcutlass_moe.pybut missed these files which still only check for SM100:flashinfer_fp4_moe.pyflashinfer_trtllm_moe.pyflashinfer_cutedsl_moe.pyflashinfer_utils.pySolution
Add explicit family checks for SM100/110/120, matching the approach in the files updated by #33417:
This enables support for:
Files Changed
vllm/model_executor/layers/quantization/utils/flashinfer_fp4_moe.pyvllm/model_executor/layers/fused_moe/flashinfer_trtllm_moe.pyvllm/model_executor/layers/fused_moe/flashinfer_cutedsl_moe.pyvllm/model_executor/layers/quantization/utils/flashinfer_utils.pyTesting
Tested on RTX 5090 (SM120) and DGX Spark GB10 (SM121) with
nvidia/NVIDIA-Nemotron-3-Nano-30B-A3B-NVFP4Related Issues
Fixes #31085
Fixes #30135
Fixes #29141
Related to #28589