[CUDA] cutlass_moe_mm: proper sm version check by Aidyn-A · Pull Request #29302 · vllm-project/vllm

Aidyn-A · 2025-11-24T09:22:13Z

This is a follow-up on #26098 with a couple of nit-pics.

gemini-code-assist

Code Review

This pull request updates the SM version checks for cutlass_moe_mm. The change to use an exact match for SM90 (version_num == 90) is a good improvement for clarity and correctness. However, the upper bound for the SM100+ check has been made very specific (<= 110), which is inconsistent with other parts of the code and could be brittle for future hardware. I've suggested widening this range to be more forward-compatible and updating the corresponding error message.

csrc/quantization/w8a8/cutlass/scaled_mm_entry.cu

Aidyn-A · 2025-11-24T09:24:57Z

csrc/quantization/w8a8/cutlass/scaled_mm_entry.cu

@@ -254,15 +254,15 @@ void cutlass_moe_mm(
    bool per_act_token, bool per_out_ch) {
  int32_t version_num = get_sm_version_num();
 #if defined ENABLE_CUTLASS_MOE_SM100 && ENABLE_CUTLASS_MOE_SM100
-  if (version_num >= 100 && version_num < 110) {
+  if (version_num >= 100 && version_num <= 110) {


This will ensure that the cutlass_moe_mm_sm100 kernel is accessible for Thor on Both CUDA 12.8-12.9 sm_101 and CUDA 13.0+ sm_110.

Aidyn-A · 2025-11-24T09:27:56Z

csrc/quantization/w8a8/cutlass/scaled_mm_entry.cu

    cutlass_moe_mm_sm100(out_tensors, a_tensors, b_tensors, a_scales, b_scales,
                         expert_offsets, problem_sizes, a_strides, b_strides,
                         c_strides, per_act_token, per_out_ch);
    return;
  }
 #endif
 #if defined ENABLE_CUTLASS_MOE_SM90 && ENABLE_CUTLASS_MOE_SM90
-  if (version_num >= 90 && version_num < 100) {
+  if (version_num == 90) {


There are no versions in the range of [91, 100) existing, hence keeping strictly 90.

Aidyn-A · 2025-11-24T09:41:24Z

vllm/utils/mem_utils.py

@@ -83,7 +83,7 @@ def measure(self):
        self.torch_peak = torch.cuda.memory_stats().get("allocated_bytes.all.peak", 0)

        self.free_memory, self.total_memory = torch.cuda.mem_get_info()
-        shared_sysmem_device_mem_sms = ((8, 7), (11, 0), (12, 1))  # Orin, Thor, Spark
+        shared_sysmem_device_mem_sms = ((8, 7), (10, 1), (11, 0), (12, 1))  # Orin, Thor, Thor, Spark


(10, 1) is Thor on CUDA v12.8 and v12.9.

Aidyn-A · 2025-11-26T07:09:00Z

cc @ProExpertProg

ProExpertProg

lgtm, cc @mgoin @tlrmchlsmth

Signed-off-by: Aidyn-A <aidyn.b.aitzhan@gmail.com>

github-actions · 2026-03-10T02:27:05Z

This pull request has been automatically marked as stale because it has not had any activity within 90 days. It will be automatically closed if no further activity occurs within 30 days. Leave a comment if you feel this pull request should remain open. Thank you!

mergify bot added the nvidia label Nov 24, 2025

github-project-automation bot added this to NVIDIA Nov 24, 2025

gemini-code-assist bot reviewed Nov 24, 2025

View reviewed changes

csrc/quantization/w8a8/cutlass/scaled_mm_entry.cu Show resolved Hide resolved

csrc/quantization/w8a8/cutlass/scaled_mm_entry.cu Show resolved Hide resolved

Aidyn-A commented Nov 24, 2025

View reviewed changes

Aidyn-A force-pushed the fix_sm_versions_for_cutlass_moe_mm branch from 7224389 to 0bd7f7d Compare November 25, 2025 08:26

ProExpertProg approved these changes Dec 1, 2025

View reviewed changes

github-project-automation bot moved this to In review in NVIDIA Dec 1, 2025

Aidyn-A added 3 commits December 8, 2025 10:46

cutlass_moe_mm: prope sm version check

7dbd6b4

Signed-off-by: Aidyn-A <aidyn.b.aitzhan@gmail.com>

add missing sm_101

4494594

Signed-off-by: Aidyn-A <aidyn.b.aitzhan@gmail.com>

fix lint

91df94a

Signed-off-by: Aidyn-A <aidyn.b.aitzhan@gmail.com>

Aidyn-A force-pushed the fix_sm_versions_for_cutlass_moe_mm branch from 0bd7f7d to 91df94a Compare December 8, 2025 06:46

github-actions bot added the stale Over 90 days of inactivity label Mar 10, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[CUDA] cutlass_moe_mm: proper sm version check#29302

[CUDA] cutlass_moe_mm: proper sm version check#29302
Aidyn-A wants to merge 3 commits intovllm-project:mainfrom
Aidyn-A:fix_sm_versions_for_cutlass_moe_mm

Aidyn-A commented Nov 24, 2025

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

Uh oh!

Uh oh!

Aidyn-A Nov 24, 2025

Uh oh!

Aidyn-A Nov 24, 2025

Uh oh!

Aidyn-A Nov 24, 2025

Uh oh!

Aidyn-A commented Nov 26, 2025

Uh oh!

ProExpertProg left a comment

Uh oh!

github-actions bot commented Mar 10, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

Aidyn-A commented Nov 24, 2025

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Aidyn-A Nov 24, 2025

Choose a reason for hiding this comment

Uh oh!

Aidyn-A Nov 24, 2025

Choose a reason for hiding this comment

Uh oh!

Aidyn-A Nov 24, 2025

Choose a reason for hiding this comment

Uh oh!

Aidyn-A commented Nov 26, 2025

Uh oh!

ProExpertProg left a comment

Choose a reason for hiding this comment

Uh oh!

github-actions bot commented Mar 10, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants