Skip to content

[rebase]Deepseek_v4 support w4(mxfp4)a16 on hopper#24986

Merged
Fridge003 merged 4 commits into
sgl-project:mainfrom
shiyu7:feat/0511
May 13, 2026
Merged

[rebase]Deepseek_v4 support w4(mxfp4)a16 on hopper#24986
Fridge003 merged 4 commits into
sgl-project:mainfrom
shiyu7:feat/0511

Conversation

@shiyu7
Copy link
Copy Markdown
Contributor

@shiyu7 shiyu7 commented May 11, 2026

Motivation

According to @zhangxiaolei123456 #23686

Modifications

Rebase the MXFP4 support from the deepseek_v4 branch onto the main branch.

Accuracy Tests

V4 Flash

DeepSeek V4 Flash run command:

SGLANG_DSV4_FP4_EXPERTS=1 SGLANG_JIT_DEEPGEMM_PRECOMPILE=0 GLOO_SOCKET_IFNAME=eth0 \
sglang serve \
  --trust-remote-code \
  --model-path /0424/models/DeepSeek-V4-Flash \
  --tp 8 \
  --chunked-prefill-size 8192 \
  --mem-fraction-static 0.8 \
  --max-running-requests 128 \
  --speculative-algo EAGLE --speculative-num-steps 3 --speculative-eagle-topk 1 --speculative-num-draft-tokens 4 \
  --tool-call-parser deepseekv4 \
  --reasoning-parser deepseek-v4 \
  --host 0.0.0.0 \
  --moe-runner-backend marlin \
  --disable-radix-cache \
  --port 30000
curl -X POST http://127.0.0.1:30000/v1/chat/completions -H "Content-Type: application/json" -d '{
    "model": "/data/models/DeepSeek-V4-Pro/",
    "messages": [
        {
            "role": "user",
            "content": "介绍下杭州这个城市"
        }
    ],
    "max_tokens": 1000,
    "temperature": 0.7
}'
{"id":"c58e205754414fc8b273f55d00e65334","object":"chat.completion","created":1778494603,"model":"/data/models/DeepSeek-V4-Pro/","choices":[{"index":0,"message":{"role":"assistant","content":"嗯,用户想了解杭州这个城市,问题比较开放,需要提供一个全面但有条理的介绍。杭州是一个很知名的城市,可以从几个核心方面入手:它的地理和地位、历史文化、现代发展、旅游美食,最后给个总结性的评价。\n\n想到了先点明杭州是浙江省会、长三角重要城市,有“人间天堂”的美誉。历史文化部分要突出西湖、大运河、良渚文化、宋韵和名人故事。现代发展重点讲数字经济,尤其是阿里巴巴和互联网产业,以及城市治理的智能化。旅游方面西湖十景、西溪湿地、灵隐寺、龙井茶和杭帮菜是必提的。最后总结一下杭州古今融合、生活品质高的特点,给用户一个整体印象。\n\n这样结构比较清晰,信息量也够,用户能快速把握杭州的独特之处。</think>杭州是中国浙江省的省会,地处中国东南沿海、长江三角洲南翼,是长三角地区重要的中心城市之一。这座城市以“人间天堂”的美誉闻名于世,兼具深厚的历史底蕴、秀美的自然风光和蓬勃的现代活力。\n\n### 核心魅力:历史与自然的完美交融\n\n1.  **千年古都,人文荟萃**\n    -   **西湖文化景观**:杭州的灵魂所在。西湖及其周边的群山、园林、寺院和古迹(如断桥、雷峰塔、苏堤、三潭印月)构成了世界文化遗产“杭州西湖文化景观”,是自然美与人文景观结合的典范。无数文人墨客在此留下诗篇(如苏轼的“欲把西湖比西子,淡妆浓抹总相宜”)。\n    -   **京杭大运河**:作为大运河的南端终点,杭州见证了千年漕运的繁华。运河沿岸的历史街区(如拱宸桥西、小河直街)保留了老杭州的市井生活气息。\n    -   **良渚古城遗址**:实证中华五千年文明史的圣地,位于杭州西北部。其宏大的古城、水利系统和精美玉器,展现了新石器时代晚期长江流域的灿烂文明。\n    -   **南宋古韵**:杭州曾是南宋的都城(临安),如今在吴山脚下、河坊街一带,仍能感受到南宋市井的遗风。宋韵文化是杭州文化的重要底色。\n\n2.  **数字经济,创新之都**\n    -   **阿里巴巴总部所在地**:杭州是阿里巴巴、蚂蚁集团等全球知名互联网企业的诞生地,这使其成为中国数字经济的领跑者。电子商务、云计算、金融科技、人工智能等产业高度发达。\n    -   **“中国硅谷”**:以滨江区、未来科技城为代表的区域,聚集了大量高科技企业和创业公司,形成了浓厚的创新创业氛围。城市生活全面数字化,移动支付、智慧交通等应用场景非常普及。\n\n3.  **精致生活,品质之城**\n    -   **茶文化**:西湖龙井是中国十大名茶之首,产自西湖周边的狮峰、龙井、云栖等地。在茶园品茗、体验采茶制茶,是杭州独特的休闲方式。\n    -   **杭帮菜**:以清淡、鲜美、精致著称,注重原汁原味。代表菜品有西湖醋鱼、龙井虾仁、东坡肉、叫花鸡、片儿川(面条)等。\n    -   **园林与隐逸**:杭州的园林(如郭庄、刘庄)虽不如苏州园林名气大,但巧妙借景西湖,自有一番灵动。历史上许多文人选择在此隐居(如林和靖“梅妻鹤子”),形成了独特的隐逸文化。\n\n### 旅行与生活指南\n\n-   **必游景点**:\n    -   **西湖十景**:苏堤春晓、断桥残雪、雷峰夕照、柳浪闻莺等,四季皆景。\n    -   **西溪国家湿地公园**:城市中的湿地,可乘船穿行于水道,感受“一曲溪流一曲烟”的野趣。\n    -   **灵隐寺**:千年古刹,佛教圣地,背靠北高峰,环境清幽。\n    -   **龙井村/满觉陇**:探访茶园,品尝龙井茶,秋天满觉陇的桂花香沁人心脾。\n    -   **宋城**:大型主题公园,以宋代文化为背景,有震撼的《宋城千古情》演出。\n\n-   **特色体验**:\n    -   **骑行或漫步西湖**:环湖约15公里,是感受杭州慢生活的最佳方式。\n    -   **运河夜游**:乘坐游船,欣赏两岸灯光下的古桥、老街,体验流动的杭州。\n    -   **在茶馆发呆**:无论是龙井村里的农家茶室,还是西湖","reasoning_content":null,"tool_calls":null},"logprobs":null,"finish_reason":"length","matched_stop":null}],"usage":{"prompt_tokens":9,"total_tokens":1009,"completion_tokens":1000,"prompt_tokens_details":null,"reasoning_tokens":0},"metadata":{"weight_version":"default"}}

MMLU

python3 bench_sglang.py --parallel 128 --backend srt --host http://127.0.0.1 --port 30000 --data_dir /data00/mmlu
100%|█████████████████████████████████████████████████████████████████████████████| 14042/14042 [25:04<00:00,  9.33it/s]
subject: abstract_algebra, #q:100, acc: 0.870
subject: anatomy, #q:135, acc: 0.889
subject: astronomy, #q:152, acc: 0.947
subject: business_ethics, #q:100, acc: 0.840
subject: clinical_knowledge, #q:265, acc: 0.894
subject: college_biology, #q:144, acc: 0.979
subject: college_chemistry, #q:100, acc: 0.690
subject: college_computer_science, #q:100, acc: 0.920
subject: college_mathematics, #q:100, acc: 0.880
subject: college_medicine, #q:173, acc: 0.867
subject: college_physics, #q:102, acc: 0.961
subject: computer_security, #q:100, acc: 0.840
subject: conceptual_physics, #q:235, acc: 0.962
subject: econometrics, #q:114, acc: 0.816
subject: electrical_engineering, #q:145, acc: 0.890
subject: elementary_mathematics, #q:378, acc: 0.958
subject: formal_logic, #q:126, acc: 0.754
subject: global_facts, #q:100, acc: 0.750
subject: high_school_biology, #q:310, acc: 0.965
subject: high_school_chemistry, #q:203, acc: 0.877
subject: high_school_computer_science, #q:100, acc: 0.950
subject: high_school_european_history, #q:165, acc: 0.903
subject: high_school_geography, #q:198, acc: 0.960
subject: high_school_government_and_politics, #q:193, acc: 0.995
subject: high_school_macroeconomics, #q:390, acc: 0.933
subject: high_school_mathematics, #q:270, acc: 0.819
subject: high_school_microeconomics, #q:238, acc: 0.971
subject: high_school_physics, #q:151, acc: 0.868
subject: high_school_psychology, #q:545, acc: 0.963
subject: high_school_statistics, #q:216, acc: 0.894
subject: high_school_us_history, #q:204, acc: 0.951
subject: high_school_world_history, #q:237, acc: 0.937
subject: human_aging, #q:223, acc: 0.870
subject: human_sexuality, #q:131, acc: 0.931
subject: international_law, #q:121, acc: 0.934
subject: jurisprudence, #q:108, acc: 0.907
subject: logical_fallacies, #q:163, acc: 0.920
subject: machine_learning, #q:112, acc: 0.839
subject: management, #q:103, acc: 0.922
subject: marketing, #q:234, acc: 0.957
subject: medical_genetics, #q:100, acc: 0.980
subject: miscellaneous, #q:783, acc: 0.951
subject: moral_disputes, #q:346, acc: 0.879
subject: moral_scenarios, #q:895, acc: 0.791
subject: nutrition, #q:306, acc: 0.944
subject: philosophy, #q:311, acc: 0.910
subject: prehistory, #q:324, acc: 0.941
subject: professional_accounting, #q:282, acc: 0.848
subject: professional_law, #q:1534, acc: 0.735
subject: professional_medicine, #q:272, acc: 0.941
subject: professional_psychology, #q:612, acc: 0.928
subject: public_relations, #q:110, acc: 0.836
subject: security_studies, #q:245, acc: 0.873
subject: sociology, #q:201, acc: 0.965
subject: us_foreign_policy, #q:100, acc: 0.960
subject: virology, #q:166, acc: 0.584
subject: world_religions, #q:171, acc: 0.930
Total latency: 1504.904
Average accuracy: 0.885

GSM8K

python3 bench_sglang.py --host http://localhost  --port 30000 --data-path /data00 --num-questions 5000 --parallel 100
100%|███████████████████████████████████████████████████████████████████████████████| 1319/1319 [03:25<00:00,  6.42it/s]
Accuracy: 0.951
Invalid: 0.000
Latency: 205.435 s
Output throughput: 586.176 token/s

GPQA

python -m sglang.test.run_eval --port 30000 --eval-name gpqa --num-examples 32 --max-tokens 128000 --repeat 8 --top-p 0.95 --temperature 1.0 --thinking-mode deepseek-v3
Repeat: 8, mean: 0.914█████████████████████████████████████████████▎                    | 24/32 [18:06<05:48, 43.54s/it]
Scores: ['0.938', '0.875', '0.938', '0.938', '0.906', '0.906', '0.938', '0.875']        | 24/32 [17:20<04:45, 35.74s/it]
Mean latency: 1048.722 s
====================
Output throughput: 377.080 token/s
[METRIC] gpqa_mean_score=0.9140625 labels={"model": "/0424/models/DeepSeek-V4-Flash", "eval": "gpqa", "repeat": 8}
Writing report to /tmp/gpqa__0424_models_DeepSeek-V4-Flash.html
{'chars': np.float64(1366.96875), 'chars:std': np.float64(346.02027436761205), 'score:std': np.float64(0.33071891388307384), 'scores': ['0.938', '0.875', '0.938', '0.938', '0.906', '0.906', '0.938', '0.875'], 'mean_score': np.float64(0.9140625), 'latency': 1048.721761024004, 'output_throughput': 377.0803560077443}
Writing results to /tmp/gpqa__0424_models_DeepSeek-V4-Flash.json

LongBench

/sgl-workspace/LongBench# python result.py
file='.jsonl' (easy_acc + hard_acc) / len(pred_data)=0.6974789915966386
['Model\tOverall\tEasy\tHard\tShort\tMedium\tLong', '\t69.7\t72.9\t67.6\t62.8\t73.2\t75.0']

V4 Pro

I have added the accuracy tests from Pro. Please let me know if we need any other testing requirements.

Note : PR #24952 is necessary

Run command

SGLANG_SHARED_EXPERT_TP1=1 SGLANG_ENABLE_THINKING=1 SGLANG_DSV4_FP4_EXPERTS=1 SGLANG_JIT_DEEPGEMM_PRECOMPILE=0 GLOO_SOCKET_IFNAME=eth0 NCCL_MIN_NCHANNELS=24 NCCL_IB_QPS_PER_CONNECTION=8 sglang serve --trust-remote-code --model-path /data/models/DeepSeek-V4-Pro/ --tp 16  --enable-dp-attention --max-running-requests 16 --enable-metrics --host 0.0.0.0 --port 8080 --mem-fraction-static 0.9 --moe-runner-backend marlin --dist-init-addr 192.168.122.78:30300 --nnodes 2 --tool-call-parser deepseekv4 --reasoning-parser deepseek-v4 --node-rank 0  --dp-size 2 --cuda-graph-max-bs 8

GSM8K

~/sglang/benchmark/gsm8k# python3 bench_sglang.py --parallel 128 --backend srt --host http://127.0.0.1  --data-path /data00/mmlu --port 8080
100%|████████████████████████████████████████████████████████| 200/200 [02:58<00:00,  1.12it/s]
Accuracy: 0.975
Invalid: 0.000
Latency: 178.352 s
Output throughput: 100.722 token/s

Speed Tests and Profiling

Checklist

Review and Merge Process

  1. Ping Merge Oncalls to start the process. See the PR Merge Process.
  2. Get approvals from CODEOWNERS and other reviewers.
  3. Trigger CI tests with comments or contact authorized users to do so.
    • Common commands include /tag-and-rerun-ci, /tag-run-ci-label, /rerun-failed-ci
  4. After green CI and required approvals, ask Merge Oncalls or people with Write permission to merge the PR.

@gemini-code-assist
Copy link
Copy Markdown
Contributor

Warning

You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again!

@yhyang201
Copy link
Copy Markdown
Collaborator

/tag-and-rerun-ci

@Fridge003
Copy link
Copy Markdown
Collaborator

/rerun-stage stage-c-test-dsv4-8-gpu-h200

@github-actions
Copy link
Copy Markdown
Contributor

🚀 Triggered stage-c-test-dsv4-8-gpu-h200 to run independently (skipping dependencies). View workflow run

@shiyu7 shiyu7 force-pushed the feat/0511 branch 2 times, most recently from 55d32cd to 91d1df1 Compare May 12, 2026 08:31
@yhyang201
Copy link
Copy Markdown
Collaborator

/rerun-stage stage-c-test-dsv4-8-gpu-h200

@github-actions
Copy link
Copy Markdown
Contributor

🚀 Triggered stage-c-test-dsv4-8-gpu-h200 to run independently (skipping dependencies). View workflow run

@Fridge003
Copy link
Copy Markdown
Collaborator

@shiyu7 Please take a look at this issue https://github.com/sgl-project/sglang/actions/runs/25769201102/job/75688469859

@shiyu7
Copy link
Copy Markdown
Contributor Author

shiyu7 commented May 13, 2026

Thanks @Fridge003 @yhyang201

I found that when an nvshmem error occurs, the watchdog triggers and attempts to generate a Python dump. I believe these two issues are related. Since we haven't modified any DeepEP-related code, I've slightly increased the watchdog timeout to 900s. Could you please rerun the CI?

@Fridge003
Copy link
Copy Markdown
Collaborator

/rerun-stage stage-c-test-dsv4-8-gpu-h200

@github-actions
Copy link
Copy Markdown
Contributor

🚀 Triggered stage-c-test-dsv4-8-gpu-h200 to run independently (skipping dependencies). View workflow run

@Fridge003
Copy link
Copy Markdown
Collaborator

/rerun-stage stage-c-test-dsv4-8-gpu-h200

@github-actions
Copy link
Copy Markdown
Contributor

🚀 Triggered stage-c-test-dsv4-8-gpu-h200 to run independently (skipping dependencies). View workflow run

@Fridge003 Fridge003 merged commit 37f1843 into sgl-project:main May 13, 2026
73 of 134 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants