[WIP]enable mxfp8 on nvidia sm120 by wolfcomos · Pull Request #19112 · sgl-project/sglang

wolfcomos · 2026-02-21T07:37:11Z

Motivation

Following up to #17449 , sm120 indeed supports mxfp8 forward pass according to discussion from NVIDIA/TransformerEngine#2668.

Modifications

enabled sm120 mxfp8_block_scaled_matmul_triton kernel: added selection flag and set default sm120 machine stage size to 1 due to the shared memory constraint. Updated MoE assertions to include sm120 support

Accuracy Tests

Testing machine: RTX5070 TI

python3 benchmark/gsm8k/bench_sglang.py --num-shots 8 --num-questions 1000 --parallel 10 --platinum

python3 -m sglang.launch_server --model-path Qwen/Qwen2.5-0.5B-Instruct --port 30000

Baseline bf16:
Accuracy: 0.359
Invalid: 0.003
Latency: 57.425 s
Output throughput: 2188.128 token/s

python3 -m sglang.launch_server --model-path Qwen/Qwen2.5-0.5B-Instruct --quantization mxfp8 --port 30000

bf16 with online mxfp8 quantization
Accuracy: 0.348
Invalid: 0.003
Latency: 307.610 s
Output throughput: 414.304 token/s

unit test based on test_block_fp8.py

python3 -m pytest python/sglang/test/test_block_fp8.py::TestMXFP8DenseLinear -v


==================================== test session starts ====================================
platform linux -- Python 3.12.3, pytest-9.0.2, pluggy-1.6.0 -- /opt/venv-sglang-dev/bin/python3
cachedir: .pytest_cache
rootdir: /sglang-dev/sglang/python
configfile: pyproject.toml
plugins: anyio-4.12.1
collected 1 item                                                                            

python/sglang/test/test_block_fp8.py::TestMXFP8DenseLinear::test_mxfp8_dense_linear 
python/sglang/test/test_block_fp8.py::TestMXFP8DenseLinear::test_mxfp8_dense_linear PASSED [100%]

Benchmarking and Profiling

Checklist

Format your code according to the Format code with pre-commit.
Add unit tests according to the Run and add unit tests.
Update documentation according to Write documentations.
Provide accuracy and speed benchmark results according to Test the accuracy and Benchmark the speed.
Follow the SGLang code style guidance.

Review Process

Ping Merge Oncalls to start the PR flow. See the PR Merge Process.
Get approvals from CODEOWNERS and other reviewers.
Trigger CI tests with comments or contact authorized users to do so.
- /tag-run-ci-label, /rerun-failed-ci, /tag-and-rerun-ci
After green CI and required approvals, ask Merge Oncalls to merge.

gemini-code-assist · 2026-02-21T07:37:15Z

Warning

You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again!

b8zhong · 2026-03-02T01:47:48Z

/tag-and-rerun-ci

Co-authored-by: Your Name <you@example.com>

Your Name added 7 commits February 20, 2026 20:32

enable mxfp8 on sm120

60e3d18

cleanup, test on qwen2.5 0.5b instruct

4f93829

add torch empty cache to avoid oom problem

2e60fc7

revert

9e2fc8c

revert

a57233b

revert

d05d9ad

revert

e37fc8a

wolfcomos requested review from AniZpZ, BBuf, Edwardf0t1, FlamingoPg and ch-wan as code owners February 21, 2026 07:37

wolfcomos changed the title ~~enable mxfp8 on nvidia sm120~~ [WIP]enable mxfp8 on nvidia sm120 Feb 21, 2026

b8zhong approved these changes Mar 2, 2026

View reviewed changes

github-actions bot added the run-ci label Mar 2, 2026

b8zhong mentioned this pull request Mar 2, 2026

SM120 Performance Optimization Plan #19637

Open

4 tasks

Kangyan-Zhou merged commit e5edf22 into sgl-project:main Mar 2, 2026
135 of 160 checks passed

wolfcomos mentioned this pull request Mar 4, 2026

fix cuda graph capturing error in sm120 mxfp8 triton path #19835

Open

5 tasks

Kangyan-Zhou pushed a commit to Kangyan-Zhou/sglang that referenced this pull request Mar 4, 2026

[WIP]enable mxfp8 on nvidia sm120 (sgl-project#19112)

d5b9897

Co-authored-by: Your Name <you@example.com>

magicYang1573 pushed a commit to magicYang1573/sglang that referenced this pull request Mar 9, 2026

[WIP]enable mxfp8 on nvidia sm120 (sgl-project#19112)

e5f7c40

Co-authored-by: Your Name <you@example.com>

Wangzheee pushed a commit to Wangzheee/sglang that referenced this pull request Mar 21, 2026

[WIP]enable mxfp8 on nvidia sm120 (sgl-project#19112)

bf72991

Co-authored-by: Your Name <you@example.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[WIP]enable mxfp8 on nvidia sm120#19112

[WIP]enable mxfp8 on nvidia sm120#19112
Kangyan-Zhou merged 7 commits intosgl-project:mainfrom
wolfcomos:mxfp8_sm120

wolfcomos commented Feb 21, 2026 •

edited

Loading

Uh oh!

gemini-code-assist bot commented Feb 21, 2026

Uh oh!

b8zhong commented Mar 2, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

wolfcomos commented Feb 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation

Modifications

Accuracy Tests

Benchmarking and Profiling

Checklist

Review Process

Uh oh!

gemini-code-assist bot commented Feb 21, 2026

Uh oh!

b8zhong commented Mar 2, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

wolfcomos commented Feb 21, 2026 •

edited

Loading