cutlass: enable SM121-gated MXFP4 MoE kernel path by christopherowen · Pull Request #3038 · NVIDIA/cutlass

christopherowen · 2026-02-16T04:56:12Z

Add SM121-gated MXFP4 kernel wiring and launch config updates for MoE inference paths.

Validation:

Builds cleanly on SM121 toolchains.
Runtime sanity and end-to-end vLLM mxfp4 serve checks pass on SM121.
Heavily community-tested across DGX Spark/SM121 setups.

…apes

…dation

Junkai-Wu · 2026-02-26T09:34:57Z

test/unit/gemm/device/sm121_blockscaled_tensorop_gemm/CMakeLists.txt

@@ -0,0 +1,14 @@
+if (CUTLASS_NVCC_ARCHS MATCHES 121a)


Please add copyright to this file.

Thanks for catching this. I’ve added the standard NVIDIA copyright header.

Junkai-Wu · 2026-02-26T09:36:43Z

I've ported this PR to internal repository to run the pipeline. I'll merge this PR after codes are merged in internal repository.

ANIKET-SHIVAM · 2026-02-26T19:28:46Z

include/cutlass/gemm/collective/sm120_blockscaled_mma_array_tma.hpp

-      stride_a = InternalStrideA{};
-      stride_b = InternalStrideB{};
+      // However, TMA descriptor encoding requires valid non-zero strides.
+      // Use tile dimensions as placeholder values for the runtime stride components.


Why is this change needed?
TMA descriptors that are being created on host (here in to_underlying_arguments) are mainly to just properly initialize kernel dependent parameters, but global tensor pointer, shapes and strides will be correctly updated on device - based on the group - prior to any usage.
Were you running into any issues with this?

@christopherowen please check

Junkai-Wu · 2026-03-03T09:48:33Z

test/unit/gemm/device/sm121_blockscaled_tensorop_gemm/CMakeLists.txt

+
+cutlass_test_unit_gemm_device_add_executable(
+  cutlass_test_unit_bs_grouped_gemm_device_tensorop_sm121
+  ../sm120_blockscaled_tensorop_gemm/sm120_bs_gemm_mxf8_mxf4_f32_group_gemm_fusion.cu


Porting some internal review comments:

It seems like this PR added support for CTA_TILE_M == 64 which is smaller than 128. Can we add one unit test in this file to test this case?

@christopherowen check

Junkai-Wu · 2026-03-03T09:49:26Z

test/unit/gemm/device/sm121_blockscaled_tensorop_gemm/CMakeLists.txt

+if (CUTLASS_NVCC_ARCHS MATCHES 121a)
+
+add_custom_target(
+  cutlass_test_unit_gemm_device_sm121_bs


Porting some internal review comments:

Let's just spell out blockscaled and not call the kernel bs.

@christopherowen check

johnnynunez · 2026-03-29T04:39:57Z

@christopherowen could you apply the comments to straightforward this PR?

christopherowen added 3 commits February 16, 2026 05:50

add SM120 blockscaled tile constraints to support production MXFP4 sh…

24cc8c8

…apes

add cleanup to support upstream-ready SM120 pingpong and pipeline code

aaaff2a

add SM121 grouped blockscaled test wiring to support unfused MoE vali…

cac6eff

…dation

Junkai-Wu reviewed Feb 26, 2026

View reviewed changes

ANIKET-SHIVAM reviewed Feb 27, 2026

View reviewed changes

comply with Copyright request

476b496

Junkai-Wu reviewed Mar 3, 2026

View reviewed changes

johnnynunez mentioned this pull request Mar 28, 2026

[NVIDIA] Bugfix NVFP4 DGX Spark and RTX50 vllm-project/vllm#38423

Open

8 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

cutlass: enable SM121-gated MXFP4 MoE kernel path#3038

cutlass: enable SM121-gated MXFP4 MoE kernel path#3038
christopherowen wants to merge 4 commits intoNVIDIA:mainfrom
christopherowen:sm121_mxfp4

christopherowen commented Feb 16, 2026

Uh oh!

Junkai-Wu Feb 26, 2026

Uh oh!

christopherowen Feb 28, 2026

Uh oh!

Junkai-Wu commented Feb 26, 2026

Uh oh!

ANIKET-SHIVAM Feb 26, 2026

Uh oh!

johnnynunez Mar 28, 2026

Uh oh!

Junkai-Wu Mar 3, 2026

Uh oh!

johnnynunez Mar 28, 2026

Uh oh!

Junkai-Wu Mar 3, 2026

Uh oh!

johnnynunez Mar 28, 2026

Uh oh!

johnnynunez commented Mar 29, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

christopherowen commented Feb 16, 2026

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Junkai-Wu commented Feb 26, 2026

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

johnnynunez commented Mar 29, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants