[Fix][MoE] Add SM120 support for FP8 MoE path#32237
[Fix][MoE] Add SM120 support for FP8 MoE path#32237malaiwah wants to merge 1 commit intovllm-project:mainfrom
Conversation
|
👋 Hi! Thank you for contributing to the vLLM project. 💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels. Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run You ask your reviewers to trigger select CI tests on top of Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging. To run CI, PR reviewers can either: Add If you have any questions, please reach out to us on Slack at https://slack.vllm.ai. 🚀 |
There was a problem hiding this comment.
Code Review
This pull request aims to add support for the SM120 (Blackwell) architecture to the FP8 MoE path. The changes correctly add the necessary function declaration and dispatch logic in cutlass_moe_mm to handle the new architecture. The error messages and conditional logic for existing architectures are also updated appropriately. However, a critical piece seems to be missing: the implementation file for the cutlass_moe_mm_sm120 function. Without this, the project will fail to link when support for SM120 is enabled.
2466348 to
f8f99b6
Compare
|
Documentation preview: https://vllm--32237.org.readthedocs.build/en/32237/ |
|
Oh my... something went wrong with my rebase and massively screwed up the PR. Will fix. |
|
This pull request has merge conflicts that must be resolved before it can be |
Add SM120 (Blackwell) support to cutlass_moe_mm to enable FP8 MoE models (GLM-4.7, MiniMax M2.1) on RTX PRO 6000 Blackwell GPUs. Changes: - Add cutlass_moe_mm_sm120 function declaration - Add SM120 conditional branch (version_num >= 120 && version_num < 130) - Create csrc/quantization/w8a8/cutlass/moe/grouped_mm_c3x_sm120.cu implementation - Update error message to include SM120 in supported capabilities Note: This is currently untested. Will be battle tested on RTX PRO 6000 Blackwell hardware with GLM-4.7-FP8. Performance results will be added to the PR description after testing. Fixes: #32109
f8f99b6 to
e1a1278
Compare
|
This one is turning out not clean at all, will close and reopen fresh new upon testing. |
Summary
Add SM120 (Blackwell) support to
cutlass_moe_mmto enable FP8 MoE models (GLM-4.7, MiniMax M2.1) on RTX PRO 6000 Blackwell GPUs.Note
This is currently untested. The user will test it on RTX PRO 6000 Blackwell hardware with GLM-4.7-FP8 and update with performance results after battle testing.
Changes
1. Add SM120 function declaration
Declared
cutlass_moe_mm_sm120()function after the SM100 section, following the same pattern as existing implementations.2. Add SM120 conditional branch
Added SM120 support in
cutlass_moe_mm()function:version_num >= 120 && version_num < 130before SM100 branchcutlass_moe_mm_sm120()when SM120 is detected3. Create SM120 implementation file
Added
csrc/quantization/w8a8/cutlass/moe/grouped_mm_c3x_sm120.cuimplementing the SM120 MoE grouped_gemm kernels following the SM100 pattern.4. Update error messages
Updated error messages to include SM120 as a supported capability.
Problem
Previously, SM120 devices (RTX PRO 6000 Blackwell) encountered this error:
The issue occurred because
cutlass_moe_mm()only supported SM90 (90-99) and SM100 (100-109), while the non-MoEcutlass_scaled_mm()path already had SM120 support.MoE models like GLM-4.7 and MiniMax M2.1 use the
cutlass_moe_mm()path, so they failed on Blackwell hardware.Scope
This PR focuses on adding support for SM120 architecture (RTX PRO 6000 Blackwell) with version number 120. The range includes SM120/SM121 (GB10 Spark) bounded by
version_num < 130. SM110 support is not included as it would require a separate.cuimplementation file and there's no hardware available for testing.Fixes
Resolves #32109