[CUDA] cutlass_moe_mm: proper sm version check#29302
[CUDA] cutlass_moe_mm: proper sm version check#29302Aidyn-A wants to merge 3 commits intovllm-project:mainfrom
Conversation
There was a problem hiding this comment.
Code Review
This pull request updates the SM version checks for cutlass_moe_mm. The change to use an exact match for SM90 (version_num == 90) is a good improvement for clarity and correctness. However, the upper bound for the SM100+ check has been made very specific (<= 110), which is inconsistent with other parts of the code and could be brittle for future hardware. I've suggested widening this range to be more forward-compatible and updating the corresponding error message.
| @@ -254,15 +254,15 @@ void cutlass_moe_mm( | |||
| bool per_act_token, bool per_out_ch) { | |||
| int32_t version_num = get_sm_version_num(); | |||
| #if defined ENABLE_CUTLASS_MOE_SM100 && ENABLE_CUTLASS_MOE_SM100 | |||
| if (version_num >= 100 && version_num < 110) { | |||
| if (version_num >= 100 && version_num <= 110) { | |||
There was a problem hiding this comment.
This will ensure that the cutlass_moe_mm_sm100 kernel is accessible for Thor on Both CUDA 12.8-12.9 sm_101 and CUDA 13.0+ sm_110.
| cutlass_moe_mm_sm100(out_tensors, a_tensors, b_tensors, a_scales, b_scales, | ||
| expert_offsets, problem_sizes, a_strides, b_strides, | ||
| c_strides, per_act_token, per_out_ch); | ||
| return; | ||
| } | ||
| #endif | ||
| #if defined ENABLE_CUTLASS_MOE_SM90 && ENABLE_CUTLASS_MOE_SM90 | ||
| if (version_num >= 90 && version_num < 100) { | ||
| if (version_num == 90) { |
There was a problem hiding this comment.
There are no versions in the range of [91, 100) existing, hence keeping strictly 90.
vllm/utils/mem_utils.py
Outdated
| @@ -83,7 +83,7 @@ def measure(self): | |||
| self.torch_peak = torch.cuda.memory_stats().get("allocated_bytes.all.peak", 0) | |||
|
|
|||
| self.free_memory, self.total_memory = torch.cuda.mem_get_info() | |||
| shared_sysmem_device_mem_sms = ((8, 7), (11, 0), (12, 1)) # Orin, Thor, Spark | |||
| shared_sysmem_device_mem_sms = ((8, 7), (10, 1), (11, 0), (12, 1)) # Orin, Thor, Thor, Spark | |||
There was a problem hiding this comment.
(10, 1) is Thor on CUDA v12.8 and v12.9.
7224389 to
0bd7f7d
Compare
ProExpertProg
left a comment
There was a problem hiding this comment.
lgtm, cc @mgoin @tlrmchlsmth
Signed-off-by: Aidyn-A <aidyn.b.aitzhan@gmail.com>
Signed-off-by: Aidyn-A <aidyn.b.aitzhan@gmail.com>
0bd7f7d to
91df94a
Compare
|
This pull request has been automatically marked as stale because it has not had any activity within 90 days. It will be automatically closed if no further activity occurs within 30 days. Leave a comment if you feel this pull request should remain open. Thank you! |
This is a follow-up on #26098 with a couple of nit-pics.