[rebase]Deepseek_v4 support w4(mxfp4)a16 on hopper by shiyu7 · Pull Request #24986 · sgl-project/sglang

shiyu7 · 2026-05-11T14:21:32Z

Motivation

Modifications

Rebase the MXFP4 support from the deepseek_v4 branch onto the main branch.

Accuracy Tests

V4 Flash

DeepSeek V4 Flash run command:

SGLANG_DSV4_FP4_EXPERTS=1 SGLANG_JIT_DEEPGEMM_PRECOMPILE=0 GLOO_SOCKET_IFNAME=eth0 \
sglang serve \
  --trust-remote-code \
  --model-path /0424/models/DeepSeek-V4-Flash \
  --tp 8 \
  --chunked-prefill-size 8192 \
  --mem-fraction-static 0.8 \
  --max-running-requests 128 \
  --speculative-algo EAGLE --speculative-num-steps 3 --speculative-eagle-topk 1 --speculative-num-draft-tokens 4 \
  --tool-call-parser deepseekv4 \
  --reasoning-parser deepseek-v4 \
  --host 0.0.0.0 \
  --moe-runner-backend marlin \
  --disable-radix-cache \
  --port 30000

curl -X POST http://127.0.0.1:30000/v1/chat/completions -H "Content-Type: application/json" -d '{
    "model": "/data/models/DeepSeek-V4-Pro/",
    "messages": [
        {
            "role": "user",
            "content": "介绍下杭州这个城市"
        }
    ],
    "max_tokens": 1000,
    "temperature": 0.7
}'
{"id":"c58e205754414fc8b273f55d00e65334","object":"chat.completion","created":1778494603,"model":"/data/models/DeepSeek-V4-Pro/","choices":[{"index":0,"message":{"role":"assistant","content":"嗯，用户想了解杭州这个城市，问题比较开放，需要提供一个全面但有条理的介绍。杭州是一个很知名的城市，可以从几个核心方面入手：它的地理和地位、历史文化、现代发展、旅游美食，最后给个总结性的评价。\n\n想到了先点明杭州是浙江省会、长三角重要城市，有“人间天堂”的美誉。历史文化部分要突出西湖、大运河、良渚文化、宋韵和名人故事。现代发展重点讲数字经济，尤其是阿里巴巴和互联网产业，以及城市治理的智能化。旅游方面西湖十景、西溪湿地、灵隐寺、龙井茶和杭帮菜是必提的。最后总结一下杭州古今融合、生活品质高的特点，给用户一个整体印象。\n\n这样结构比较清晰，信息量也够，用户能快速把握杭州的独特之处。</think>杭州是中国浙江省的省会，地处中国东南沿海、长江三角洲南翼，是长三角地区重要的中心城市之一。这座城市以“人间天堂”的美誉闻名于世，兼具深厚的历史底蕴、秀美的自然风光和蓬勃的现代活力。\n\n### 核心魅力：历史与自然的完美交融\n\n1.  **千年古都，人文荟萃**\n    -   **西湖文化景观**：杭州的灵魂所在。西湖及其周边的群山、园林、寺院和古迹（如断桥、雷峰塔、苏堤、三潭印月）构成了世界文化遗产“杭州西湖文化景观”，是自然美与人文景观结合的典范。无数文人墨客在此留下诗篇（如苏轼的“欲把西湖比西子，淡妆浓抹总相宜”）。\n    -   **京杭大运河**：作为大运河的南端终点，杭州见证了千年漕运的繁华。运河沿岸的历史街区（如拱宸桥西、小河直街）保留了老杭州的市井生活气息。\n    -   **良渚古城遗址**：实证中华五千年文明史的圣地，位于杭州西北部。其宏大的古城、水利系统和精美玉器，展现了新石器时代晚期长江流域的灿烂文明。\n    -   **南宋古韵**：杭州曾是南宋的都城（临安），如今在吴山脚下、河坊街一带，仍能感受到南宋市井的遗风。宋韵文化是杭州文化的重要底色。\n\n2.  **数字经济，创新之都**\n    -   **阿里巴巴总部所在地**：杭州是阿里巴巴、蚂蚁集团等全球知名互联网企业的诞生地，这使其成为中国数字经济的领跑者。电子商务、云计算、金融科技、人工智能等产业高度发达。\n    -   **“中国硅谷”**：以滨江区、未来科技城为代表的区域，聚集了大量高科技企业和创业公司，形成了浓厚的创新创业氛围。城市生活全面数字化，移动支付、智慧交通等应用场景非常普及。\n\n3.  **精致生活，品质之城**\n    -   **茶文化**：西湖龙井是中国十大名茶之首，产自西湖周边的狮峰、龙井、云栖等地。在茶园品茗、体验采茶制茶，是杭州独特的休闲方式。\n    -   **杭帮菜**：以清淡、鲜美、精致著称，注重原汁原味。代表菜品有西湖醋鱼、龙井虾仁、东坡肉、叫花鸡、片儿川（面条）等。\n    -   **园林与隐逸**：杭州的园林（如郭庄、刘庄）虽不如苏州园林名气大，但巧妙借景西湖，自有一番灵动。历史上许多文人选择在此隐居（如林和靖“梅妻鹤子”），形成了独特的隐逸文化。\n\n### 旅行与生活指南\n\n-   **必游景点**：\n    -   **西湖十景**：苏堤春晓、断桥残雪、雷峰夕照、柳浪闻莺等，四季皆景。\n    -   **西溪国家湿地公园**：城市中的湿地，可乘船穿行于水道，感受“一曲溪流一曲烟”的野趣。\n    -   **灵隐寺**：千年古刹，佛教圣地，背靠北高峰，环境清幽。\n    -   **龙井村/满觉陇**：探访茶园，品尝龙井茶，秋天满觉陇的桂花香沁人心脾。\n    -   **宋城**：大型主题公园，以宋代文化为背景，有震撼的《宋城千古情》演出。\n\n-   **特色体验**：\n    -   **骑行或漫步西湖**：环湖约15公里，是感受杭州慢生活的最佳方式。\n    -   **运河夜游**：乘坐游船，欣赏两岸灯光下的古桥、老街，体验流动的杭州。\n    -   **在茶馆发呆**：无论是龙井村里的农家茶室，还是西湖","reasoning_content":null,"tool_calls":null},"logprobs":null,"finish_reason":"length","matched_stop":null}],"usage":{"prompt_tokens":9,"total_tokens":1009,"completion_tokens":1000,"prompt_tokens_details":null,"reasoning_tokens":0},"metadata":{"weight_version":"default"}}

MMLU

python3 bench_sglang.py --parallel 128 --backend srt --host http://127.0.0.1 --port 30000 --data_dir /data00/mmlu
100%|█████████████████████████████████████████████████████████████████████████████| 14042/14042 [25:04<00:00,  9.33it/s]
subject: abstract_algebra, #q:100, acc: 0.870
subject: anatomy, #q:135, acc: 0.889
subject: astronomy, #q:152, acc: 0.947
subject: business_ethics, #q:100, acc: 0.840
subject: clinical_knowledge, #q:265, acc: 0.894
subject: college_biology, #q:144, acc: 0.979
subject: college_chemistry, #q:100, acc: 0.690
subject: college_computer_science, #q:100, acc: 0.920
subject: college_mathematics, #q:100, acc: 0.880
subject: college_medicine, #q:173, acc: 0.867
subject: college_physics, #q:102, acc: 0.961
subject: computer_security, #q:100, acc: 0.840
subject: conceptual_physics, #q:235, acc: 0.962
subject: econometrics, #q:114, acc: 0.816
subject: electrical_engineering, #q:145, acc: 0.890
subject: elementary_mathematics, #q:378, acc: 0.958
subject: formal_logic, #q:126, acc: 0.754
subject: global_facts, #q:100, acc: 0.750
subject: high_school_biology, #q:310, acc: 0.965
subject: high_school_chemistry, #q:203, acc: 0.877
subject: high_school_computer_science, #q:100, acc: 0.950
subject: high_school_european_history, #q:165, acc: 0.903
subject: high_school_geography, #q:198, acc: 0.960
subject: high_school_government_and_politics, #q:193, acc: 0.995
subject: high_school_macroeconomics, #q:390, acc: 0.933
subject: high_school_mathematics, #q:270, acc: 0.819
subject: high_school_microeconomics, #q:238, acc: 0.971
subject: high_school_physics, #q:151, acc: 0.868
subject: high_school_psychology, #q:545, acc: 0.963
subject: high_school_statistics, #q:216, acc: 0.894
subject: high_school_us_history, #q:204, acc: 0.951
subject: high_school_world_history, #q:237, acc: 0.937
subject: human_aging, #q:223, acc: 0.870
subject: human_sexuality, #q:131, acc: 0.931
subject: international_law, #q:121, acc: 0.934
subject: jurisprudence, #q:108, acc: 0.907
subject: logical_fallacies, #q:163, acc: 0.920
subject: machine_learning, #q:112, acc: 0.839
subject: management, #q:103, acc: 0.922
subject: marketing, #q:234, acc: 0.957
subject: medical_genetics, #q:100, acc: 0.980
subject: miscellaneous, #q:783, acc: 0.951
subject: moral_disputes, #q:346, acc: 0.879
subject: moral_scenarios, #q:895, acc: 0.791
subject: nutrition, #q:306, acc: 0.944
subject: philosophy, #q:311, acc: 0.910
subject: prehistory, #q:324, acc: 0.941
subject: professional_accounting, #q:282, acc: 0.848
subject: professional_law, #q:1534, acc: 0.735
subject: professional_medicine, #q:272, acc: 0.941
subject: professional_psychology, #q:612, acc: 0.928
subject: public_relations, #q:110, acc: 0.836
subject: security_studies, #q:245, acc: 0.873
subject: sociology, #q:201, acc: 0.965
subject: us_foreign_policy, #q:100, acc: 0.960
subject: virology, #q:166, acc: 0.584
subject: world_religions, #q:171, acc: 0.930
Total latency: 1504.904
Average accuracy: 0.885

GSM8K

python3 bench_sglang.py --host http://localhost  --port 30000 --data-path /data00 --num-questions 5000 --parallel 100
100%|███████████████████████████████████████████████████████████████████████████████| 1319/1319 [03:25<00:00,  6.42it/s]
Accuracy: 0.951
Invalid: 0.000
Latency: 205.435 s
Output throughput: 586.176 token/s

GPQA

python -m sglang.test.run_eval --port 30000 --eval-name gpqa --num-examples 32 --max-tokens 128000 --repeat 8 --top-p 0.95 --temperature 1.0 --thinking-mode deepseek-v3
Repeat: 8, mean: 0.914█████████████████████████████████████████████▎                    | 24/32 [18:06<05:48, 43.54s/it]
Scores: ['0.938', '0.875', '0.938', '0.938', '0.906', '0.906', '0.938', '0.875']        | 24/32 [17:20<04:45, 35.74s/it]
Mean latency: 1048.722 s
====================
Output throughput: 377.080 token/s
[METRIC] gpqa_mean_score=0.9140625 labels={"model": "/0424/models/DeepSeek-V4-Flash", "eval": "gpqa", "repeat": 8}
Writing report to /tmp/gpqa__0424_models_DeepSeek-V4-Flash.html
{'chars': np.float64(1366.96875), 'chars:std': np.float64(346.02027436761205), 'score:std': np.float64(0.33071891388307384), 'scores': ['0.938', '0.875', '0.938', '0.938', '0.906', '0.906', '0.938', '0.875'], 'mean_score': np.float64(0.9140625), 'latency': 1048.721761024004, 'output_throughput': 377.0803560077443}
Writing results to /tmp/gpqa__0424_models_DeepSeek-V4-Flash.json

LongBench

/sgl-workspace/LongBench# python result.py
file='.jsonl' (easy_acc + hard_acc) / len(pred_data)=0.6974789915966386
['Model\tOverall\tEasy\tHard\tShort\tMedium\tLong', '\t69.7\t72.9\t67.6\t62.8\t73.2\t75.0']

V4 Pro

I have added the accuracy tests from Pro. Please let me know if we need any other testing requirements.

Note : PR #24952 is necessary

Run command

SGLANG_SHARED_EXPERT_TP1=1 SGLANG_ENABLE_THINKING=1 SGLANG_DSV4_FP4_EXPERTS=1 SGLANG_JIT_DEEPGEMM_PRECOMPILE=0 GLOO_SOCKET_IFNAME=eth0 NCCL_MIN_NCHANNELS=24 NCCL_IB_QPS_PER_CONNECTION=8 sglang serve --trust-remote-code --model-path /data/models/DeepSeek-V4-Pro/ --tp 16  --enable-dp-attention --max-running-requests 16 --enable-metrics --host 0.0.0.0 --port 8080 --mem-fraction-static 0.9 --moe-runner-backend marlin --dist-init-addr 192.168.122.78:30300 --nnodes 2 --tool-call-parser deepseekv4 --reasoning-parser deepseek-v4 --node-rank 0  --dp-size 2 --cuda-graph-max-bs 8

GSM8K

~/sglang/benchmark/gsm8k# python3 bench_sglang.py --parallel 128 --backend srt --host http://127.0.0.1  --data-path /data00/mmlu --port 8080
100%|████████████████████████████████████████████████████████| 200/200 [02:58<00:00,  1.12it/s]
Accuracy: 0.975
Invalid: 0.000
Latency: 178.352 s
Output throughput: 100.722 token/s

Speed Tests and Profiling

Checklist

Format your code according to the Format code with pre-commit.
Add unit tests according to the Run and add unit tests.
Update documentation according to Write documentations.
Provide accuracy and speed benchmark results according to Test the accuracy and Benchmark the speed.
Follow the SGLang code style guidance.

Review and Merge Process

Ping Merge Oncalls to start the process. See the PR Merge Process.
Get approvals from CODEOWNERS and other reviewers.
Trigger CI tests with comments or contact authorized users to do so.
- Common commands include /tag-and-rerun-ci, /tag-run-ci-label, /rerun-failed-ci
After green CI and required approvals, ask Merge Oncalls or people with Write permission to merge the PR.

gemini-code-assist · 2026-05-11T14:21:35Z

Warning

You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again!

yhyang201 · 2026-05-12T02:48:29Z

/tag-and-rerun-ci

Fridge003 · 2026-05-12T03:17:53Z

/rerun-stage stage-c-test-dsv4-8-gpu-h200

github-actions · 2026-05-12T03:18:28Z

🚀 Triggered stage-c-test-dsv4-8-gpu-h200 to run independently (skipping dependencies). View workflow run

yhyang201 · 2026-05-12T23:53:05Z

/rerun-stage stage-c-test-dsv4-8-gpu-h200

github-actions · 2026-05-12T23:53:34Z

🚀 Triggered stage-c-test-dsv4-8-gpu-h200 to run independently (skipping dependencies). View workflow run

Fridge003 · 2026-05-13T05:12:17Z

@shiyu7 Please take a look at this issue https://github.com/sgl-project/sglang/actions/runs/25769201102/job/75688469859

shiyu7 · 2026-05-13T07:45:56Z

Thanks @Fridge003 @yhyang201 ～

I found that when an nvshmem error occurs, the watchdog triggers and attempts to generate a Python dump. I believe these two issues are related. Since we haven't modified any DeepEP-related code, I've slightly increased the watchdog timeout to 900s. Could you please rerun the CI?

Fridge003 · 2026-05-13T20:44:04Z

/rerun-stage stage-c-test-dsv4-8-gpu-h200

github-actions · 2026-05-13T20:44:45Z

🚀 Triggered stage-c-test-dsv4-8-gpu-h200 to run independently (skipping dependencies). View workflow run

Fridge003 · 2026-05-13T22:36:23Z

/rerun-stage stage-c-test-dsv4-8-gpu-h200

github-actions · 2026-05-13T22:36:52Z

🚀 Triggered stage-c-test-dsv4-8-gpu-h200 to run independently (skipping dependencies). View workflow run

shiyu7 requested review from AniZpZ, BBuf, DarkSharpness, Edwardf0t1, FlamingoPg, Fridge003, HaiShaw, HydraQYH, Ying1123, b8zhong, celve, ch-wan, ispobock, merrymercy and yuan-luo as code owners May 11, 2026 14:21

github-actions Bot added the jit-kernel label May 11, 2026

shiyu7 force-pushed the feat/0511 branch from 4b4dbf9 to 1a69297 Compare May 12, 2026 02:40

github-actions Bot added the run-ci label May 12, 2026

shiyu7 force-pushed the feat/0511 branch 2 times, most recently from 55d32cd to 91d1df1 Compare May 12, 2026 08:31

shiyu7 added 2 commits May 12, 2026 16:32

feat: support deepseek v4 hopper mxfp4

87ea76c

fix: import set_weight_attrs for mxfp4 marlin moe

b5780c8

shiyu7 force-pushed the feat/0511 branch from 91d1df1 to b5780c8 Compare May 12, 2026 08:32

fix: fix the ci timeout error

2b85f8f

github-actions Bot added the deepseek label May 13, 2026

Merge branch 'main' into feat/0511

62ca39f

Fridge003 merged commit 37f1843 into sgl-project:main May 13, 2026
73 of 134 checks passed

Fridge003 pushed a commit that referenced this pull request May 13, 2026

[rebase]Deepseek_v4 support w4(mxfp4)a16 on hopper (#24986)

f417cf1

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[rebase]Deepseek_v4 support w4(mxfp4)a16 on hopper#24986

[rebase]Deepseek_v4 support w4(mxfp4)a16 on hopper#24986
Fridge003 merged 4 commits into
sgl-project:mainfrom
shiyu7:feat/0511

shiyu7 commented May 11, 2026 •

edited

Loading

Uh oh!

gemini-code-assist Bot commented May 11, 2026

Uh oh!

yhyang201 commented May 12, 2026

Uh oh!

Fridge003 commented May 12, 2026

Uh oh!

github-actions Bot commented May 12, 2026

Uh oh!

yhyang201 commented May 12, 2026

Uh oh!

github-actions Bot commented May 12, 2026

Uh oh!

Fridge003 commented May 13, 2026

Uh oh!

shiyu7 commented May 13, 2026

Uh oh!

Fridge003 commented May 13, 2026

Uh oh!

github-actions Bot commented May 13, 2026

Uh oh!

Fridge003 commented May 13, 2026

Uh oh!

github-actions Bot commented May 13, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

shiyu7 commented May 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation

Modifications

Accuracy Tests

V4 Flash

V4 Pro

Speed Tests and Profiling

Checklist

Review and Merge Process

Uh oh!

gemini-code-assist Bot commented May 11, 2026

Uh oh!

yhyang201 commented May 12, 2026

Uh oh!

Fridge003 commented May 12, 2026

Uh oh!

github-actions Bot commented May 12, 2026

Uh oh!

yhyang201 commented May 12, 2026

Uh oh!

github-actions Bot commented May 12, 2026

Uh oh!

Fridge003 commented May 13, 2026

Uh oh!

shiyu7 commented May 13, 2026

Uh oh!

Fridge003 commented May 13, 2026

Uh oh!

github-actions Bot commented May 13, 2026

Uh oh!

Fridge003 commented May 13, 2026

Uh oh!

github-actions Bot commented May 13, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

shiyu7 commented May 11, 2026 •

edited

Loading