Fix DeepSeek V4 expert distribution recording#25961
Conversation
There was a problem hiding this comment.
Code Review
This pull request updates the forward pass of the DeepseekV4 model to support expert distribution recording. It introduces a conditional context manager within the layer iteration loop that activates the expert distribution recorder when piecewise CUDA graphs are disabled. I have no feedback to provide as there were no review comments.
|
PR #25948 already exists |
|
Additional EPLB functional smoke using the PR #19290 MMLU prompt workload shape, adapted to DeepSeek-V4-Flash-FP8 TP4 single-node:
|
|
Pure EPLB verification on DeepSeek V4 Flash FP8, current PR commit Additional audit result:
Corrected record validation:
Serving throughput, MMLU bench set disjoint from record set, batch 1000 prompts, concurrency 256,
Both runs completed all measured rounds with 0 request errors. Logs also show Run dirs:
|
eee7695 to
bfee2a3
Compare
|
I have merged #25948, and added you as a co-author. Thanks. |
Summary
Verification
PYTHONPATH=$PWD/python python -m py_compile python/sglang/srt/models/deepseek_v4.pygit commit --amendpassed.logical_countshape:(16, 43, 256), total count:2400.40 / 43, matching V4's early dense/non-MoE layers followed by MoE layers.bench_servingrandom workload completed while recording: 16 requests, random input len 512, random output len 16, max concurrency 4.logical_countshape:(32, 43, 256), total count:189600.40 / 43.--ep-dispatch-algorithm static --init-expert-location <record_dump>.init_expert_location from init_by_eplb using ServerArgs.init_expert_locationon all TP ranks./generatereturned HTTP 200 after static EPLB initialization.CI States
Latest PR Test (Base): ❌ Run #26263830117
Latest PR Test (Extra): ❌ Run #26263830039