[Fix][MoE] Add SM120 support for FP8 MoE path by malaiwah · Pull Request #32237 · vllm-project/vllm

malaiwah · 2026-01-13T05:32:38Z

Summary

Add SM120 (Blackwell) support to cutlass_moe_mm to enable FP8 MoE models (GLM-4.7, MiniMax M2.1) on RTX PRO 6000 Blackwell GPUs.

Note

This is currently untested. The user will test it on RTX PRO 6000 Blackwell hardware with GLM-4.7-FP8 and update with performance results after battle testing.

Changes

1. Add SM120 function declaration

Declared cutlass_moe_mm_sm120() function after the SM100 section, following the same pattern as existing implementations.

2. Add SM120 conditional branch

Added SM120 support in cutlass_moe_mm() function:

Checks version_num >= 120 && version_num < 130 before SM100 branch
Calls cutlass_moe_mm_sm120() when SM120 is detected
Undated SM100 range logic

3. Create SM120 implementation file

Added csrc/quantization/w8a8/cutlass/moe/grouped_mm_c3x_sm120.cu implementing the SM120 MoE grouped_gemm kernels following the SM100 pattern.

4. Update error messages

Updated error messages to include SM120 as a supported capability.

Problem

Previously, SM120 devices (RTX PRO 6000 Blackwell) encountered this error:

No compiled cutlass_scaled_mm for CUDA device capability: 120. Required capability: 90 or 100

The issue occurred because cutlass_moe_mm() only supported SM90 (90-99) and SM100 (100-109), while the non-MoE cutlass_scaled_mm() path already had SM120 support.

MoE models like GLM-4.7 and MiniMax M2.1 use the cutlass_moe_mm() path, so they failed on Blackwell hardware.

Scope

This PR focuses on adding support for SM120 architecture (RTX PRO 6000 Blackwell) with version number 120. The range includes SM120/SM121 (GB10 Spark) bounded by version_num < 130. SM110 support is not included as it would require a separate .cu implementation file and there's no hardware available for testing.

Fixes

Resolves #32109

github-actions · 2026-01-13T05:32:46Z

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors.

You ask your reviewers to trigger select CI tests on top of fastcheck CI.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

If you have any questions, please reach out to us on Slack at https://slack.vllm.ai.

🚀

gemini-code-assist

Code Review

This pull request aims to add support for the SM120 (Blackwell) architecture to the FP8 MoE path. The changes correctly add the necessary function declaration and dispatch logic in cutlass_moe_mm to handle the new architecture. The error messages and conditional logic for existing architectures are also updated appropriately. However, a critical piece seems to be missing: the implementation file for the cutlass_moe_mm_sm120 function. Without this, the project will fail to link when support for SM120 is enabled.

csrc/quantization/w8a8/cutlass/scaled_mm_entry.cu

mergify · 2026-01-13T12:17:22Z

Documentation preview: https://vllm--32237.org.readthedocs.build/en/32237/

vllm/entrypoints/openai/serving_chat.py

malaiwah · 2026-01-13T12:52:29Z

Oh my... something went wrong with my rebase and massively screwed up the PR.

Will fix.

mergify · 2026-01-13T13:06:46Z

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @malaiwah.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

Add SM120 (Blackwell) support to cutlass_moe_mm to enable FP8 MoE models (GLM-4.7, MiniMax M2.1) on RTX PRO 6000 Blackwell GPUs. Changes: - Add cutlass_moe_mm_sm120 function declaration - Add SM120 conditional branch (version_num >= 120 && version_num < 130) - Create csrc/quantization/w8a8/cutlass/moe/grouped_mm_c3x_sm120.cu implementation - Update error message to include SM120 in supported capabilities Note: This is currently untested. Will be battle tested on RTX PRO 6000 Blackwell hardware with GLM-4.7-FP8. Performance results will be added to the PR description after testing. Fixes: #32109

malaiwah · 2026-01-13T13:08:55Z

This one is turning out not clean at all, will close and reopen fresh new upon testing.

mergify bot added the nvidia label Jan 13, 2026

github-project-automation bot added this to NVIDIA Jan 13, 2026

malaiwah mentioned this pull request Jan 13, 2026

[Bug]: Blackwell (SM120) FP8 MoE path fails for GLM-4.7 : No compiled cutlass_scaled_mm for CUDA device capability: 120 on RTX PRO 6000 Blackwell #32109

Closed

1 task

gemini-code-assist bot reviewed Jan 13, 2026

View reviewed changes

csrc/quantization/w8a8/cutlass/scaled_mm_entry.cu Show resolved Hide resolved

cursor bot reviewed Jan 13, 2026

View reviewed changes

csrc/quantization/w8a8/cutlass/scaled_mm_entry.cu Show resolved Hide resolved

malaiwah force-pushed the fix/sm120-fp8-moe-support branch 2 times, most recently from 2466348 to f8f99b6 Compare January 13, 2026 12:16

malaiwah requested review from aarnphm and chaunceyjiang as code owners January 13, 2026 12:16

mergify bot added documentation Improvements or additions to documentation frontend cpu Related to CPU backends structured-output v1 labels Jan 13, 2026

github-project-automation bot added this to Structured Output Jan 13, 2026

cursor bot reviewed Jan 13, 2026

View reviewed changes

vllm/entrypoints/openai/serving_chat.py Show resolved Hide resolved

mergify bot added the needs-rebase label Jan 13, 2026

malaiwah force-pushed the fix/sm120-fp8-moe-support branch from f8f99b6 to e1a1278 Compare January 13, 2026 13:07

malaiwah closed this Jan 13, 2026

github-project-automation bot moved this to Done in Structured Output Jan 13, 2026

github-project-automation bot moved this to Done in NVIDIA Jan 13, 2026

This was referenced Jan 23, 2026

[Fix] Update CUTLASS_REVISION to v4.3.5 #32918

Closed

Sm120 130 #32917

Draft

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Fix][MoE] Add SM120 support for FP8 MoE path#32237

[Fix][MoE] Add SM120 support for FP8 MoE path#32237
malaiwah wants to merge 1 commit intovllm-project:mainfrom
malaiwah:fix/sm120-fp8-moe-support

malaiwah commented Jan 13, 2026 •

edited

Loading

Uh oh!

github-actions bot commented Jan 13, 2026

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

Uh oh!

Uh oh!

mergify bot commented Jan 13, 2026

Uh oh!

Uh oh!

malaiwah commented Jan 13, 2026

Uh oh!

mergify bot commented Jan 13, 2026

Uh oh!

malaiwah commented Jan 13, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

malaiwah commented Jan 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Changes

1. Add SM120 function declaration

2. Add SM120 conditional branch

3. Create SM120 implementation file

4. Update error messages

Problem

Scope

Fixes

Uh oh!

github-actions bot commented Jan 13, 2026

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

mergify bot commented Jan 13, 2026

Uh oh!

Uh oh!

malaiwah commented Jan 13, 2026

Uh oh!

mergify bot commented Jan 13, 2026

Uh oh!

malaiwah commented Jan 13, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

malaiwah commented Jan 13, 2026 •

edited

Loading