Skip to content

[Fix][MoE] Add SM120 support for FP8 MoE path#32237

Closed
malaiwah wants to merge 1 commit intovllm-project:mainfrom
malaiwah:fix/sm120-fp8-moe-support
Closed

[Fix][MoE] Add SM120 support for FP8 MoE path#32237
malaiwah wants to merge 1 commit intovllm-project:mainfrom
malaiwah:fix/sm120-fp8-moe-support

Conversation

@malaiwah
Copy link
Copy Markdown

@malaiwah malaiwah commented Jan 13, 2026

Summary

Add SM120 (Blackwell) support to cutlass_moe_mm to enable FP8 MoE models (GLM-4.7, MiniMax M2.1) on RTX PRO 6000 Blackwell GPUs.

Note

This is currently untested. The user will test it on RTX PRO 6000 Blackwell hardware with GLM-4.7-FP8 and update with performance results after battle testing.

Changes

1. Add SM120 function declaration

Declared cutlass_moe_mm_sm120() function after the SM100 section, following the same pattern as existing implementations.

2. Add SM120 conditional branch

Added SM120 support in cutlass_moe_mm() function:

  • Checks version_num >= 120 && version_num < 130 before SM100 branch
  • Calls cutlass_moe_mm_sm120() when SM120 is detected
  • Undated SM100 range logic

3. Create SM120 implementation file

Added csrc/quantization/w8a8/cutlass/moe/grouped_mm_c3x_sm120.cu implementing the SM120 MoE grouped_gemm kernels following the SM100 pattern.

4. Update error messages

Updated error messages to include SM120 as a supported capability.

Problem

Previously, SM120 devices (RTX PRO 6000 Blackwell) encountered this error:

No compiled cutlass_scaled_mm for CUDA device capability: 120. Required capability: 90 or 100

The issue occurred because cutlass_moe_mm() only supported SM90 (90-99) and SM100 (100-109), while the non-MoE cutlass_scaled_mm() path already had SM120 support.

MoE models like GLM-4.7 and MiniMax M2.1 use the cutlass_moe_mm() path, so they failed on Blackwell hardware.

Scope

This PR focuses on adding support for SM120 architecture (RTX PRO 6000 Blackwell) with version number 120. The range includes SM120/SM121 (GB10 Spark) bounded by version_num < 130. SM110 support is not included as it would require a separate .cu implementation file and there's no hardware available for testing.

Fixes

Resolves #32109

@github-actions
Copy link
Copy Markdown

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors.

You ask your reviewers to trigger select CI tests on top of fastcheck CI.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

If you have any questions, please reach out to us on Slack at https://slack.vllm.ai.

🚀

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request aims to add support for the SM120 (Blackwell) architecture to the FP8 MoE path. The changes correctly add the necessary function declaration and dispatch logic in cutlass_moe_mm to handle the new architecture. The error messages and conditional logic for existing architectures are also updated appropriately. However, a critical piece seems to be missing: the implementation file for the cutlass_moe_mm_sm120 function. Without this, the project will fail to link when support for SM120 is enabled.

@malaiwah malaiwah force-pushed the fix/sm120-fp8-moe-support branch 2 times, most recently from 2466348 to f8f99b6 Compare January 13, 2026 12:16
@mergify
Copy link
Copy Markdown

mergify bot commented Jan 13, 2026

Documentation preview: https://vllm--32237.org.readthedocs.build/en/32237/

@mergify mergify bot added documentation Improvements or additions to documentation frontend cpu Related to CPU backends structured-output v1 labels Jan 13, 2026
@malaiwah
Copy link
Copy Markdown
Author

Oh my... something went wrong with my rebase and massively screwed up the PR.

Will fix.

@mergify
Copy link
Copy Markdown

mergify bot commented Jan 13, 2026

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @malaiwah.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

@mergify mergify bot added the needs-rebase label Jan 13, 2026
Add SM120 (Blackwell) support to cutlass_moe_mm to enable FP8 MoE models
(GLM-4.7, MiniMax M2.1) on RTX PRO 6000 Blackwell GPUs.

Changes:
- Add cutlass_moe_mm_sm120 function declaration
- Add SM120 conditional branch (version_num >= 120 && version_num < 130)
- Create csrc/quantization/w8a8/cutlass/moe/grouped_mm_c3x_sm120.cu implementation
- Update error message to include SM120 in supported capabilities

Note: This is currently untested. Will be battle tested on RTX PRO 6000 Blackwell hardware
with GLM-4.7-FP8. Performance results will be added to the PR description after testing.

Fixes: #32109
@malaiwah malaiwah force-pushed the fix/sm120-fp8-moe-support branch from f8f99b6 to e1a1278 Compare January 13, 2026 13:07
@malaiwah
Copy link
Copy Markdown
Author

This one is turning out not clean at all, will close and reopen fresh new upon testing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cpu Related to CPU backends documentation Improvements or additions to documentation frontend needs-rebase nvidia structured-output v1

Projects

Status: Done
Status: Done

Development

Successfully merging this pull request may close these issues.

[Bug]: Blackwell (SM120) FP8 MoE path fails for GLM-4.7 : No compiled cutlass_scaled_mm for CUDA device capability: 120 on RTX PRO 6000 Blackwell

1 participant