Fix DeepSeek V4 expert distribution recording by xutizhou · Pull Request #25961 · sgl-project/sglang

xutizhou · 2026-05-21T07:26:46Z

Summary

Add DeepSeek V4 decoder-layer context for the expert distribution recorder so recorded expert counts are attributed to the correct layer.
Keep the same piecewise CUDA graph behavior as DeepSeek V2/V3 by only entering the recorder layer context when piecewise CUDA graph is disabled.
This enables DeepSeek V4 record dumps to be used for static EPLB expert location initialization.

Verification

PYTHONPATH=$PWD/python python -m py_compile python/sglang/srt/models/deepseek_v4.py
Pre-commit hooks from git commit --amend passed.
DeepSeek-V4-Flash-FP8 TP4 DeepEP record smoke:
- start record -> generate -> stop record -> dump record all returned successfully.
- Dumped logical_count shape: (16, 43, 256), total count: 2400.
- Nonzero recorded layers: 40 / 43, matching V4's early dense/non-MoE layers followed by MoE layers.
DeepSeek-V4-Flash-FP8 TP4 DeepEP record workload smoke:
- bench_serving random workload completed while recording: 16 requests, random input len 512, random output len 16, max concurrency 4.
- Dumped logical_count shape: (32, 43, 256), total count: 189600.
- Nonzero recorded layers: 40 / 43.
DeepSeek-V4-Flash-FP8 TP4 DeepEP static EPLB smoke:
- Launched with --ep-dispatch-algorithm static --init-expert-location <record_dump>.
- Server logged init_expert_location from init_by_eplb using ServerArgs.init_expert_location on all TP ranks.
- /generate returned HTTP 200 after static EPLB initialization.

CI States

Latest PR Test (Base): ❌ Run #26263830117
Latest PR Test (Extra): ❌ Run #26263830039

gemini-code-assist

Code Review

This pull request updates the forward pass of the DeepseekV4 model to support expert distribution recording. It introduces a conditional context manager within the layer iteration loop that activates the expert distribution recorder when piecewise CUDA graphs are disabled. I have no feedback to provide as there were no review comments.

leihuang-sketch · 2026-05-21T09:34:34Z

PR #25948 already exists

xutizhou · 2026-05-21T09:48:46Z

Additional EPLB functional smoke using the PR #19290 MMLU prompt workload shape, adapted to DeepSeek-V4-Flash-FP8 TP4 single-node:

Record phase:
- Started V4 TP4 DeepEP server with --ep-num-redundant-experts 4 --expert-distribution-recorder-mode stat --expert-distribution-recorder-buffer-size 64.
- Sent 64 prompts from /lustre/raplab/client/xutingz/workspace/bench/waterfill/mmlu_record_2k.json with max_new_tokens=1, concurrency 4.
- Result: 64/64 requests succeeded.
- Dump file: /tmp/dsv4_mmlu_eplb_pr/expert_distribution_recorder_1779356669.2558024.pt.
- Dump validation: logical_count shape (64, 43, 256), sum 5441520, nonzero layers 40 / 43, first three layers zero as expected for V4 early dense/non-MoE layers.
Static EPLB phase:
- Restarted V4 TP4 DeepEP server with --ep-dispatch-algorithm static --init-expert-location /tmp/dsv4_mmlu_eplb_pr/expert_distribution_recorder_1779356669.2558024.pt.
- /server_info confirmed ep_dispatch_algorithm=static, ep_num_redundant_experts=4, and the expected init_expert_location path.
- Sent 64 prompts from /lustre/raplab/client/xutingz/workspace/bench/waterfill/mmlu_bench_2k.json with max_new_tokens=1, concurrency 4.
- Result: 64/64 requests succeeded.

xutizhou · 2026-05-21T15:22:26Z

Pure EPLB verification on DeepSeek V4 Flash FP8, current PR commit bfee2a34020d8e85adb334b0e5412358a30ee01b.

Additional audit result:

V4 has num_hash_layers=3; these are HashTopK MoE layers, not dense layers like V3's first 3. They should be recorded.
This PR now records both paths: the outer V4 layer context sets the current layer, and HashTopK reports topk_ids after EPLB logical-to-physical remap and padding mask, matching normal TopK recorder semantics.

Corrected record validation:

MMLU record set: 2000/2000 requests succeeded, 0 errors.
Dumped logical_count shape: (2000, 43, 256).
Total recorded selections: 121,900,614.
Non-zero MoE layers: 43/43; first3 nonzero [True, True, True].
First 8 layer sums: all 2,834,898.
Record file: /lustre/raplab/client/xutingz/workspace/bench/waterfill/dsv4_pr25961_record_2000_hash_topk_20260522_095540/expert_distribution_recorder_1779415690.286911.pt.

Serving throughput, MMLU bench set disjoint from record set, batch 1000 prompts, concurrency 256, max_tokens=1, 4 warmup + 8 measured rounds, DeepEP normal, CUDA graph disabled. No DeepEP Waterfill was enabled in either run:

No EPLB / trivial expert location: 51,811 tok/s.
Explicit EPLB via --init-expert-location ... --enable-eplb --eplb-rebalance-num-iterations 1000000: 56,346 tok/s, +8.75%.

Both runs completed all measured rounds with 0 request errors. Logs also show enable_deepep_waterfill=False for the EPLB run and enable_eplb=False, init_expert_location='trivial' for the no-EPLB run.

Run dirs:

Record: /lustre/raplab/client/xutingz/workspace/bench/waterfill/dsv4_pr25961_record_2000_hash_topk_20260522_095540
No EPLB: /lustre/raplab/client/xutingz/workspace/bench/waterfill/v4_ab_pr25961_hashfix_pure_no_eplb_1000c256_20260522_100858
EPLB: /lustre/raplab/client/xutingz/workspace/bench/waterfill/v4_ab_pr25961_hashfix_pure_enable_eplb_1000c256_20260522_101410

ch-wan · 2026-05-24T17:10:22Z

I have merged #25948, and added you as a co-author. Thanks.

github-actions Bot added the deepseek label May 21, 2026

gemini-code-assist Bot reviewed May 21, 2026

View reviewed changes

xutizhou assigned ch-wan May 22, 2026

Fix DeepSeek V4 expert distribution layer context

bfee2a3

xutizhou force-pushed the fix/v4-eplb-record-support branch from eee7695 to bfee2a3 Compare May 22, 2026 01:52

xutizhou requested review from BBuf, Edwardf0t1, Fridge003, HaiShaw, Ying1123, ch-wan, ispobock and merrymercy as code owners May 22, 2026 01:52

ch-wan closed this May 24, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix DeepSeek V4 expert distribution recording#25961

Fix DeepSeek V4 expert distribution recording#25961
xutizhou wants to merge 1 commit into
sgl-project:mainfrom
xutizhou:fix/v4-eplb-record-support

xutizhou commented May 21, 2026 •

edited by github-actions Bot

Loading

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

leihuang-sketch commented May 21, 2026 •

edited

Loading

Uh oh!

xutizhou commented May 21, 2026

Uh oh!

xutizhou commented May 21, 2026 •

edited

Loading

Uh oh!

ch-wan commented May 24, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

xutizhou commented May 21, 2026 • edited by github-actions Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Verification

CI States

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

leihuang-sketch commented May 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

xutizhou commented May 21, 2026

Uh oh!

xutizhou commented May 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ch-wan commented May 24, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

xutizhou commented May 21, 2026 •

edited by github-actions Bot

Loading

leihuang-sketch commented May 21, 2026 •

edited

Loading

xutizhou commented May 21, 2026 •

edited

Loading