Skip to content

Add SM120 (Blackwell desktop) MXFP4 support#16975

Closed
amittell wants to merge 3 commits intosgl-project:mainfrom
amittell:sm120-mxfp4-support
Closed

Add SM120 (Blackwell desktop) MXFP4 support#16975
amittell wants to merge 3 commits intosgl-project:mainfrom
amittell:sm120-mxfp4-support

Conversation

@amittell
Copy link
Copy Markdown

SM120 (RTX PRO 6000, RTX 5090) doesn't support persistent kernels the same way SM100 does. This adds SM120-specific configuration following the same pattern as vLLM PR vllm-project/vllm#31089:

  • Use StridedLayout instead of TMA block layout
  • Set is_persistent=False and num_stages=1

Tested with GPT-OSS-120B on RTX PRO 6000:

  • 4K: 151 tok/s
  • 131K: 57 tok/s

Fixes #13061, related to #9707, #12695

SM120 (RTX PRO 6000, RTX 5090) doesn't support persistent kernels
the same way SM100 does. This adds SM120-specific configuration
following the same pattern as vLLM PR #31089:

- Use StridedLayout instead of TMA block layout
- Set is_persistent=False and num_stages=1

Tested with GPT-OSS-120B on RTX PRO 6000:
- 4K context: 151 tok/s
- 131K context: 57 tok/s

Fixes sgl-project#13061
Related: sgl-project#9707, sgl-project#12695
@gemini-code-assist
Copy link
Copy Markdown
Contributor

Warning

You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again!

SM120 (Blackwell desktop) doesn't support flashinfer_mxfp4 due to
persistent kernel limitations. This adds:

1. server_args.py: Auto-select triton_kernel for SM120 + MXFP4
2. mxfp4.py: Use StridedLayout for SM120 with triton_kernel backend

Tested on RTX PRO 6000 with GPT-OSS-120B - server starts and runs
without needing to manually specify --moe-runner-backend.
@b8zhong
Copy link
Copy Markdown
Collaborator

b8zhong commented Mar 2, 2026

@amittell Could you help fix the merge conflicts? Thanks!

@b8zhong
Copy link
Copy Markdown
Collaborator

b8zhong commented Mar 2, 2026

Hi, I add you as coauthor in #19718. Thanks~

@b8zhong b8zhong closed this Mar 2, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug] Fail to run gpt-oss with FlashInfer MXFP4 moe kernel on 5090

2 participants