Skip to content

[CUDA] cutlass_moe_mm: proper sm version check#29302

Open
Aidyn-A wants to merge 3 commits intovllm-project:mainfrom
Aidyn-A:fix_sm_versions_for_cutlass_moe_mm
Open

[CUDA] cutlass_moe_mm: proper sm version check#29302
Aidyn-A wants to merge 3 commits intovllm-project:mainfrom
Aidyn-A:fix_sm_versions_for_cutlass_moe_mm

Conversation

@Aidyn-A
Copy link
Contributor

@Aidyn-A Aidyn-A commented Nov 24, 2025

This is a follow-up on #26098 with a couple of nit-pics.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request updates the SM version checks for cutlass_moe_mm. The change to use an exact match for SM90 (version_num == 90) is a good improvement for clarity and correctness. However, the upper bound for the SM100+ check has been made very specific (<= 110), which is inconsistent with other parts of the code and could be brittle for future hardware. I've suggested widening this range to be more forward-compatible and updating the corresponding error message.

@@ -254,15 +254,15 @@ void cutlass_moe_mm(
bool per_act_token, bool per_out_ch) {
int32_t version_num = get_sm_version_num();
#if defined ENABLE_CUTLASS_MOE_SM100 && ENABLE_CUTLASS_MOE_SM100
if (version_num >= 100 && version_num < 110) {
if (version_num >= 100 && version_num <= 110) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will ensure that the cutlass_moe_mm_sm100 kernel is accessible for Thor on Both CUDA 12.8-12.9 sm_101 and CUDA 13.0+ sm_110.

cutlass_moe_mm_sm100(out_tensors, a_tensors, b_tensors, a_scales, b_scales,
expert_offsets, problem_sizes, a_strides, b_strides,
c_strides, per_act_token, per_out_ch);
return;
}
#endif
#if defined ENABLE_CUTLASS_MOE_SM90 && ENABLE_CUTLASS_MOE_SM90
if (version_num >= 90 && version_num < 100) {
if (version_num == 90) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are no versions in the range of [91, 100) existing, hence keeping strictly 90.

@@ -83,7 +83,7 @@ def measure(self):
self.torch_peak = torch.cuda.memory_stats().get("allocated_bytes.all.peak", 0)

self.free_memory, self.total_memory = torch.cuda.mem_get_info()
shared_sysmem_device_mem_sms = ((8, 7), (11, 0), (12, 1)) # Orin, Thor, Spark
shared_sysmem_device_mem_sms = ((8, 7), (10, 1), (11, 0), (12, 1)) # Orin, Thor, Thor, Spark
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(10, 1) is Thor on CUDA v12.8 and v12.9.

@Aidyn-A Aidyn-A force-pushed the fix_sm_versions_for_cutlass_moe_mm branch from 7224389 to 0bd7f7d Compare November 25, 2025 08:26
@Aidyn-A
Copy link
Contributor Author

Aidyn-A commented Nov 26, 2025

cc @ProExpertProg

Copy link
Collaborator

@ProExpertProg ProExpertProg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm, cc @mgoin @tlrmchlsmth

@github-project-automation github-project-automation bot moved this to In review in NVIDIA Dec 1, 2025
Signed-off-by: Aidyn-A <aidyn.b.aitzhan@gmail.com>
Signed-off-by: Aidyn-A <aidyn.b.aitzhan@gmail.com>
Signed-off-by: Aidyn-A <aidyn.b.aitzhan@gmail.com>
@Aidyn-A Aidyn-A force-pushed the fix_sm_versions_for_cutlass_moe_mm branch from 0bd7f7d to 91df94a Compare December 8, 2025 06:46
@github-actions
Copy link

This pull request has been automatically marked as stale because it has not had any activity within 90 days. It will be automatically closed if no further activity occurs within 30 days. Leave a comment if you feel this pull request should remain open. Thank you!

@github-actions github-actions bot added the stale Over 90 days of inactivity label Mar 10, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

nvidia stale Over 90 days of inactivity

Projects

Status: In review

Development

Successfully merging this pull request may close these issues.

2 participants