-
Notifications
You must be signed in to change notification settings - Fork 6.4k
feat: [2/2][DeepEP] Add waterfill load balancing for shared expert dispatch #19290
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
ch-wan
merged 118 commits into
sgl-project:main
from
xutizhou:feat/deepep-waterfill-eplb-balance
May 14, 2026
+761
−27
Merged
Changes from all commits
Commits
Show all changes
118 commits
Select commit
Hold shift + click to select a range
575f526
feat: Add DeepEP-based waterfill load balancing for shared expert
xutizhou d698b42
test: Add benchmark script for DeepEP Waterfill comparison
xutizhou 5f2a187
fix: Fix critical bugs in DeepEP waterfill implementation
xutizhou 73cb51c
feat: Implement DeepEP-based waterfill - shared expert as 9th routed …
xutizhou bb71a88
feat: Complete DeepEP waterfill implementation
xutizhou 3f517cd
fix: Correct routed_scaling_factor application
xutizhou 7f20c27
feat: Add MIN_TOKENS_PER_RANK threshold for sparse destination redirect
xutizhou 25c5d29
test: Add CPU unit tests for DeepEP waterfill
xutizhou b945db7
test: Add comprehensive CPU unit tests for DeepEP waterfill
xutizhou f49426e
fix: Improve tile utilization in waterfill algorithm
xutizhou c7153e6
fix: Handle Normal vs Low Latency mode weight application differently
xutizhou 5137e12
feat: Add comprehensive test suite for DeepEP waterfill
xutizhou 2bd315b
perf: Optimize waterfill algorithm kernel performance
xutizhou 819a4bc
perf: Add Triton kernel for GPU-optimized waterfill assignment
xutizhou 3099590
perf: Implement fused Triton kernel for waterfill algorithm
xutizhou 64f7765
Fix Waterfill expert weight loading mapping
xutizhou 2d315fb
DeepEP Waterfill: post-load hook, DeepGEMM zero-init, tests
xutizhou f0b044c
Enhance DeepEPWaterfillBalancer: Add local computation option for sha…
xutizhou 673cf45
DeepEP Waterfill: clean & align dispatch design
xutizhou a43dbf3
Restore DeepEP Dockerfile
xutizhou 5138d84
DeepEP Waterfill: clarify shared experts fusion semantics
xutizhou c7840b0
DeepEP Waterfill: unify num_fused_shared_experts semantics
xutizhou 9154c96
Add DeepEP Waterfill e2e accuracy+serving test script
xutizhou ace4a2a
deepep waterfill: fix e2e skip flags; warn on shared weight copy; use…
xutizhou a661233
fix no padding workaround
xutizhou e1f2e98
bench: disable radix cache in deepep waterfill e2e server
xutizhou 6d2922f
refactor(deep_gemm): replace zero initialization with empty tensor al…
xutizhou bc4719a
Fix FP8 scale copy for Waterfill shared expert; reduce default perf c…
xutizhou 82761eb
EPLB: ignore Waterfill shared slot in routed expert weight updates
xutizhou e291330
Waterfill: use physical expert count for EPLB redundant experts
xutizhou 6c67ab4
bench: add init_expert_location + eplb tag for torch profile
xutizhou 24593d9
debug: print per-rank token balance for Waterfill+EPLB
xutizhou 8d18548
deepep: waterfill shared-dest uses global load weights under EPLB
xutizhou 3211da6
deepep: fuse shared-dest weight compute into waterfill triton
xutizhou 3cd7e01
waterfill + eplb print log
xutizhou eda170b
feat(deepep): improve waterfill balance with global sparse redirect
xutizhou 502595d
feat(bench): add imbalance eval scripts and waterfill-first options
xutizhou 7e436fc
perf(deepep): reduce waterfill comm regression
xutizhou dd3053f
feat: routed-only waterfill + robust imbalance eval cleanup
xutizhou 9c1db65
feat: skip waterfill sparse-redirect sync when unnecessary
xutizhou f2995e5
feat(bench): harden waterfill e2e runner
xutizhou 9fae809
perf(deepep): restore local sparse redirect
xutizhou 76a8b5e
perf(deepep): improve waterfill balance under static EPLB
xutizhou 2e52329
perf(deepep): fix cross-source herding in waterfill shared dispatch
xutizhou 1b93582
Revert "perf(deepep): fix cross-source herding in waterfill shared di…
xutizhou 46d25cc
feat(bench): add waterfill benchmark skill documentation
xutizhou 00c93fb
fix: waterfill deadlock, dp_size token capacity, sgl-kernel compat, a…
xutizhou a213c6a
feat(bench): enhance multi-node waterfill benchmark documentation and…
xutizhou 1699f3a
fix(bench): update EP32 configuration and add moe_dense_tp_size support
xutizhou d26d61e
fix: correct topk column count in waterfill num_tokens==0 path
xutizhou 74730a1
feat(deepep): dp-aware waterfill with fused all_reduce and corrected …
xutizhou 81e1ad6
perf(deepep): eliminate runtime all_reduce via static EPLB weights
xutizhou 58a6d94
perf(waterfill): eliminate GPU-CPU syncs, use local counts, LOCAL_PRE…
xutizhou 166ff24
fix(waterfill): fix Triton type mismatch in target_total derivation
xutizhou 7484521
fix(waterfill): always derive target_total in kernel from routed_counts
xutizhou ca380a2
perf: skip local_tokens_per_rank in static path, pre-alloc counts buffer
xutizhou f2c353c
feat: add SGLANG_DISABLE_STATIC_WATERFILL env to force dynamic all_re…
xutizhou 2ffbb8a
refactor(waterfill): unify shared expert fusion, remove dead code, fi…
xutizhou c3bcbae
fix(waterfill): use precomputed_target_total in histogram kernel, rem…
xutizhou bdcc325
Revert "Merge branch 'main' of github.com:sgl-project/sglang"
xutizhou b743d3f
refactor(waterfill): remove V2 code, merge get_moe_weights with main …
xutizhou ed64616
refactor(waterfill): extract profiling helpers, replace evt_xxx with …
xutizhou 1c97b69
refactor(waterfill): remove EPLB debug logging and profile timing ins…
xutizhou def91ef
refactor(waterfill): remove PyTorch fallbacks, condense comments, del…
xutizhou 8518f24
docs: update waterfill benchmark skill with latest results
xutizhou 5f10966
refactor(waterfill): delete dead sparse redirect code, unused estimat…
xutizhou 34756b3
refactor(waterfill): condense comments, remove unused class fields, t…
xutizhou b6359fc
refactor(waterfill): merge forward_deepep_waterfill into forward_deepep
xutizhou a5968fe
refactor(waterfill): move expand_topk into DeepEPWaterfillBalancer class
xutizhou 2144145
refactor(waterfill): trim comments, extract helpers, remove dead code
xutizhou 53b643d
fix(waterfill): correct local preference ratio, remove unused histogram
xutizhou 951b469
refactor(waterfill): move static weight init into DeepEPWaterfillBala…
xutizhou 2c56f84
refactor(waterfill): simplify small-batch path, skip count_local_rout…
xutizhou 6032996
refactor(waterfill): condense Triton kernel comments, tensor allocati…
xutizhou e10ce91
refactor(waterfill): remove redundant inline comments (-3 lines)
xutizhou f3c74e6
refactor(waterfill): inline static helpers, reuse compute_gpu_physica…
xutizhou 147f95c
refactor(waterfill): move rank_load computation into ExpertLocationMe…
xutizhou 4ecc183
Revert unrelated changes: restore sgl-kernel 0.3.20, io_struct blank …
xutizhou b271fd9
Remove benchmark and skill files from PR (keep in working directory)
xutizhou f80efa8
Remove Dockerfile.deepep from PR (keep in working directory)
xutizhou 5be8b24
Remove unrelated changes from PR: revert bench_one_batch_server dp_si…
xutizhou 8e5aea6
Merge remote-tracking branch 'origin/main' into feat/deepep-waterfill…
xutizhou 1ab2e0c
upd
AichenF 9d1cdc5
Merge origin/main into feat/deepep-waterfill-eplb-balance
xutizhou b047235
fix: EPLB dispatch OOB with fused shared experts + restore waterfill …
xutizhou 8bd2f36
fix: add waterfill guard in _forward_shared_experts for defense-in-depth
xutizhou e6a9f38
Merge branch 'main' into feat/deepep-waterfill-eplb-balance
ch-wan 6792756
refactor(waterfill): address PR review comments — simplify deepseek_v…
AichenF d11c57c
refactor(waterfill): address PR review comments 1, 4, 5, 6
AichenF 75dbc1a
fix(waterfill): inherit nn.Module in WaterfillTopK
AichenF 89ac9d4
revert(waterfill): do not auto-enable expert_distribution_recorder_mode
AichenF 6642ba5
refactor(waterfill): trim verbose comments and help text
AichenF 41c47b6
fix(waterfill): skip low-batch routed count
xutizhou 144fd40
fix(waterfill): use ep allreduce for dynamic routing
xutizhou 75899e8
chore: clarify waterfill topk variable name
xutizhou 9e166b0
chore: revert unrelated waterfill cleanup
xutizhou fd0782b
chore: remove redundant waterfill mode log
xutizhou 585122d
docs(waterfill): clarify dynamic mode env
xutizhou 8fdf183
refactor(waterfill): integrate routing into TopK
xutizhou 5ec5d11
fix(waterfill): sync rank load for dynamic EPLB
xutizhou 253fc8f
refactor(waterfill): prepare TopK balancers in model runner
xutizhou 26e625a
chore(eplb): simplify init metadata return
xutizhou 7aec150
Experiment with waterfill topk fused shared handling
xutizhou 015d941
Revert "Experiment with waterfill topk fused shared handling"
xutizhou 4e76f92
Refactor DeepEP waterfill setup
xutizhou f08df3b
Remove unused waterfill local mask
xutizhou f16b38b
Refactor DeepEP waterfill boundaries
xutizhou 2fa9bfd
Restore low-batch dynamic waterfill behavior
xutizhou d91bd25
Avoid dynamic low-batch dispatch plan overhead
xutizhou cfe367e
Clarify DeepEP waterfill comments
xutizhou 4d2737b
Polish DeepEP waterfill expansion helpers
xutizhou 170bcd6
Rename static rank load binding helper
xutizhou 0389b97
docs: note one-shot static rank-load bind limitation
xutizhou 782b3ee
refactor: mark _all_reduce_dynamic_rank_load as @staticmethod
xutizhou 8548d70
refactor: call _all_reduce_dynamic_rank_load via class name
xutizhou 43fe3d9
rm static rank load
xutizhou 2f449e2
Merge branch 'main' into pr-19290
xutizhou d3cce1d
Merge remote-tracking branch 'origin/main' into pr-19290
xutizhou File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
when should we set this env variable?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
when we use dynamic waterfill.