Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
118 commits
Select commit Hold shift + click to select a range
575f526
feat: Add DeepEP-based waterfill load balancing for shared expert
xutizhou Jan 7, 2026
d698b42
test: Add benchmark script for DeepEP Waterfill comparison
xutizhou Jan 7, 2026
5f2a187
fix: Fix critical bugs in DeepEP waterfill implementation
xutizhou Jan 7, 2026
73cb51c
feat: Implement DeepEP-based waterfill - shared expert as 9th routed …
xutizhou Jan 7, 2026
bb71a88
feat: Complete DeepEP waterfill implementation
xutizhou Jan 7, 2026
3f517cd
fix: Correct routed_scaling_factor application
xutizhou Jan 7, 2026
7f20c27
feat: Add MIN_TOKENS_PER_RANK threshold for sparse destination redirect
xutizhou Jan 7, 2026
25c5d29
test: Add CPU unit tests for DeepEP waterfill
xutizhou Jan 7, 2026
b945db7
test: Add comprehensive CPU unit tests for DeepEP waterfill
xutizhou Jan 7, 2026
f49426e
fix: Improve tile utilization in waterfill algorithm
xutizhou Jan 7, 2026
c7153e6
fix: Handle Normal vs Low Latency mode weight application differently
xutizhou Jan 7, 2026
5137e12
feat: Add comprehensive test suite for DeepEP waterfill
xutizhou Jan 7, 2026
2bd315b
perf: Optimize waterfill algorithm kernel performance
xutizhou Jan 8, 2026
819a4bc
perf: Add Triton kernel for GPU-optimized waterfill assignment
xutizhou Jan 8, 2026
3099590
perf: Implement fused Triton kernel for waterfill algorithm
xutizhou Jan 8, 2026
64f7765
Fix Waterfill expert weight loading mapping
xutizhou Jan 16, 2026
2d315fb
DeepEP Waterfill: post-load hook, DeepGEMM zero-init, tests
xutizhou Jan 16, 2026
f0b044c
Enhance DeepEPWaterfillBalancer: Add local computation option for sha…
xutizhou Jan 17, 2026
673cf45
DeepEP Waterfill: clean & align dispatch design
xutizhou Jan 17, 2026
a43dbf3
Restore DeepEP Dockerfile
xutizhou Jan 17, 2026
5138d84
DeepEP Waterfill: clarify shared experts fusion semantics
xutizhou Jan 18, 2026
c7840b0
DeepEP Waterfill: unify num_fused_shared_experts semantics
xutizhou Jan 18, 2026
9154c96
Add DeepEP Waterfill e2e accuracy+serving test script
xutizhou Jan 18, 2026
ace4a2a
deepep waterfill: fix e2e skip flags; warn on shared weight copy; use…
xutizhou Jan 18, 2026
a661233
fix no padding workaround
xutizhou Jan 18, 2026
e1f2e98
bench: disable radix cache in deepep waterfill e2e server
xutizhou Jan 18, 2026
6d2922f
refactor(deep_gemm): replace zero initialization with empty tensor al…
xutizhou Jan 19, 2026
bc4719a
Fix FP8 scale copy for Waterfill shared expert; reduce default perf c…
xutizhou Jan 20, 2026
82761eb
EPLB: ignore Waterfill shared slot in routed expert weight updates
xutizhou Jan 20, 2026
e291330
Waterfill: use physical expert count for EPLB redundant experts
xutizhou Jan 20, 2026
6c67ab4
bench: add init_expert_location + eplb tag for torch profile
xutizhou Jan 20, 2026
24593d9
debug: print per-rank token balance for Waterfill+EPLB
xutizhou Jan 20, 2026
8d18548
deepep: waterfill shared-dest uses global load weights under EPLB
xutizhou Jan 20, 2026
3211da6
deepep: fuse shared-dest weight compute into waterfill triton
xutizhou Jan 20, 2026
3cd7e01
waterfill + eplb print log
xutizhou Jan 21, 2026
eda170b
feat(deepep): improve waterfill balance with global sparse redirect
xutizhou Jan 21, 2026
502595d
feat(bench): add imbalance eval scripts and waterfill-first options
xutizhou Jan 21, 2026
7e436fc
perf(deepep): reduce waterfill comm regression
xutizhou Jan 22, 2026
dd3053f
feat: routed-only waterfill + robust imbalance eval cleanup
xutizhou Jan 23, 2026
9c1db65
feat: skip waterfill sparse-redirect sync when unnecessary
xutizhou Jan 23, 2026
f2995e5
feat(bench): harden waterfill e2e runner
xutizhou Jan 23, 2026
9fae809
perf(deepep): restore local sparse redirect
xutizhou Jan 23, 2026
76a8b5e
perf(deepep): improve waterfill balance under static EPLB
xutizhou Jan 27, 2026
2e52329
perf(deepep): fix cross-source herding in waterfill shared dispatch
xutizhou Feb 7, 2026
1b93582
Revert "perf(deepep): fix cross-source herding in waterfill shared di…
xutizhou Feb 7, 2026
46d25cc
feat(bench): add waterfill benchmark skill documentation
xutizhou Feb 7, 2026
00c93fb
fix: waterfill deadlock, dp_size token capacity, sgl-kernel compat, a…
xutizhou Feb 8, 2026
a213c6a
feat(bench): enhance multi-node waterfill benchmark documentation and…
xutizhou Feb 9, 2026
1699f3a
fix(bench): update EP32 configuration and add moe_dense_tp_size support
xutizhou Feb 9, 2026
d26d61e
fix: correct topk column count in waterfill num_tokens==0 path
xutizhou Feb 10, 2026
74730a1
feat(deepep): dp-aware waterfill with fused all_reduce and corrected …
xutizhou Feb 12, 2026
81e1ad6
perf(deepep): eliminate runtime all_reduce via static EPLB weights
xutizhou Feb 13, 2026
58a6d94
perf(waterfill): eliminate GPU-CPU syncs, use local counts, LOCAL_PRE…
xutizhou Feb 13, 2026
166ff24
fix(waterfill): fix Triton type mismatch in target_total derivation
xutizhou Feb 13, 2026
7484521
fix(waterfill): always derive target_total in kernel from routed_counts
xutizhou Feb 13, 2026
ca380a2
perf: skip local_tokens_per_rank in static path, pre-alloc counts buffer
xutizhou Feb 13, 2026
f2c353c
feat: add SGLANG_DISABLE_STATIC_WATERFILL env to force dynamic all_re…
xutizhou Feb 14, 2026
2ffbb8a
refactor(waterfill): unify shared expert fusion, remove dead code, fi…
xutizhou Feb 14, 2026
c3bcbae
fix(waterfill): use precomputed_target_total in histogram kernel, rem…
xutizhou Feb 15, 2026
bdcc325
Revert "Merge branch 'main' of github.com:sgl-project/sglang"
xutizhou Feb 15, 2026
b743d3f
refactor(waterfill): remove V2 code, merge get_moe_weights with main …
xutizhou Feb 20, 2026
ed64616
refactor(waterfill): extract profiling helpers, replace evt_xxx with …
xutizhou Feb 21, 2026
1c97b69
refactor(waterfill): remove EPLB debug logging and profile timing ins…
xutizhou Feb 21, 2026
def91ef
refactor(waterfill): remove PyTorch fallbacks, condense comments, del…
xutizhou Feb 21, 2026
8518f24
docs: update waterfill benchmark skill with latest results
xutizhou Feb 21, 2026
5f10966
refactor(waterfill): delete dead sparse redirect code, unused estimat…
xutizhou Feb 21, 2026
34756b3
refactor(waterfill): condense comments, remove unused class fields, t…
xutizhou Feb 21, 2026
b6359fc
refactor(waterfill): merge forward_deepep_waterfill into forward_deepep
xutizhou Feb 22, 2026
a5968fe
refactor(waterfill): move expand_topk into DeepEPWaterfillBalancer class
xutizhou Feb 22, 2026
2144145
refactor(waterfill): trim comments, extract helpers, remove dead code
xutizhou Feb 22, 2026
53b643d
fix(waterfill): correct local preference ratio, remove unused histogram
xutizhou Feb 22, 2026
951b469
refactor(waterfill): move static weight init into DeepEPWaterfillBala…
xutizhou Feb 22, 2026
2c56f84
refactor(waterfill): simplify small-batch path, skip count_local_rout…
xutizhou Feb 22, 2026
6032996
refactor(waterfill): condense Triton kernel comments, tensor allocati…
xutizhou Feb 22, 2026
e10ce91
refactor(waterfill): remove redundant inline comments (-3 lines)
xutizhou Feb 23, 2026
f3c74e6
refactor(waterfill): inline static helpers, reuse compute_gpu_physica…
xutizhou Feb 23, 2026
147f95c
refactor(waterfill): move rank_load computation into ExpertLocationMe…
xutizhou Feb 23, 2026
4ecc183
Revert unrelated changes: restore sgl-kernel 0.3.20, io_struct blank …
xutizhou Feb 23, 2026
b271fd9
Remove benchmark and skill files from PR (keep in working directory)
xutizhou Feb 24, 2026
f80efa8
Remove Dockerfile.deepep from PR (keep in working directory)
xutizhou Feb 24, 2026
5be8b24
Remove unrelated changes from PR: revert bench_one_batch_server dp_si…
xutizhou Feb 24, 2026
8e5aea6
Merge remote-tracking branch 'origin/main' into feat/deepep-waterfill…
xutizhou Feb 26, 2026
1ab2e0c
upd
AichenF Mar 9, 2026
9d1cdc5
Merge origin/main into feat/deepep-waterfill-eplb-balance
xutizhou Apr 9, 2026
b047235
fix: EPLB dispatch OOB with fused shared experts + restore waterfill …
xutizhou Apr 10, 2026
8bd2f36
fix: add waterfill guard in _forward_shared_experts for defense-in-depth
xutizhou Apr 11, 2026
e6a9f38
Merge branch 'main' into feat/deepep-waterfill-eplb-balance
ch-wan Apr 14, 2026
6792756
refactor(waterfill): address PR review comments — simplify deepseek_v…
AichenF Apr 22, 2026
d11c57c
refactor(waterfill): address PR review comments 1, 4, 5, 6
AichenF Apr 23, 2026
75dbc1a
fix(waterfill): inherit nn.Module in WaterfillTopK
AichenF Apr 23, 2026
89ac9d4
revert(waterfill): do not auto-enable expert_distribution_recorder_mode
AichenF Apr 23, 2026
6642ba5
refactor(waterfill): trim verbose comments and help text
AichenF Apr 23, 2026
41c47b6
fix(waterfill): skip low-batch routed count
xutizhou Apr 25, 2026
144fd40
fix(waterfill): use ep allreduce for dynamic routing
xutizhou Apr 26, 2026
75899e8
chore: clarify waterfill topk variable name
xutizhou Apr 27, 2026
9e166b0
chore: revert unrelated waterfill cleanup
xutizhou Apr 27, 2026
fd0782b
chore: remove redundant waterfill mode log
xutizhou Apr 27, 2026
585122d
docs(waterfill): clarify dynamic mode env
xutizhou Apr 27, 2026
8fdf183
refactor(waterfill): integrate routing into TopK
xutizhou Apr 27, 2026
5ec5d11
fix(waterfill): sync rank load for dynamic EPLB
xutizhou Apr 28, 2026
253fc8f
refactor(waterfill): prepare TopK balancers in model runner
xutizhou Apr 29, 2026
26e625a
chore(eplb): simplify init metadata return
xutizhou Apr 29, 2026
7aec150
Experiment with waterfill topk fused shared handling
xutizhou Apr 29, 2026
015d941
Revert "Experiment with waterfill topk fused shared handling"
xutizhou Apr 29, 2026
4e76f92
Refactor DeepEP waterfill setup
xutizhou Apr 30, 2026
f08df3b
Remove unused waterfill local mask
xutizhou Apr 30, 2026
f16b38b
Refactor DeepEP waterfill boundaries
xutizhou May 6, 2026
2fa9bfd
Restore low-batch dynamic waterfill behavior
xutizhou May 6, 2026
d91bd25
Avoid dynamic low-batch dispatch plan overhead
xutizhou May 6, 2026
cfe367e
Clarify DeepEP waterfill comments
xutizhou May 7, 2026
4d2737b
Polish DeepEP waterfill expansion helpers
xutizhou May 7, 2026
170bcd6
Rename static rank load binding helper
xutizhou May 7, 2026
0389b97
docs: note one-shot static rank-load bind limitation
xutizhou May 7, 2026
782b3ee
refactor: mark _all_reduce_dynamic_rank_load as @staticmethod
xutizhou May 7, 2026
8548d70
refactor: call _all_reduce_dynamic_rank_load via class name
xutizhou May 7, 2026
43fe3d9
rm static rank load
xutizhou May 8, 2026
2f449e2
Merge branch 'main' into pr-19290
xutizhou May 11, 2026
d3cce1d
Merge remote-tracking branch 'origin/main' into pr-19290
xutizhou May 12, 2026
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions docs/advanced_features/server_arguments.md
Original file line number Diff line number Diff line change
Expand Up @@ -336,6 +336,7 @@ Please consult the documentation below and [server_args.py](https://github.com/s
| `--elastic-ep-backend` | Specify the collective communication backend for elastic EP. Currently supports 'mooncake'. | `none` | `none`, `mooncake` |
| `--enable-elastic-expert-backup` | Enable elastic EP backend to backup expert weights in DRAM feature. Currently supports 'mooncake'.| `False` | bool flag (set to enable) |
| `--mooncake-ib-device` | The InfiniBand devices for Mooncake Backend transfer, accepts multiple comma-separated devices (e.g., --mooncake-ib-device mlx5_0,mlx5_1). Default is None, which triggers automatic device detection when Mooncake Backend is enabled. | `None` | Type: str |
| `--enable-deepep-waterfill` | Enable DeepEP Waterfill: dispatch the shared expert as the 9th routed expert to the least-loaded EP rank. Automatically sets `--moe-a2a-backend deepep`, implicitly enables shared-expert fusion, and supports `--deepep-mode auto`, `normal`, or `low_latency`. Use `auto` or `low_latency` for production decode so CUDA graph remains enabled. Supported on DeepSeek-V3/R1 with EP >= 2. By default, Waterfill uses the static local-batch path; set `SGLANG_DISABLE_STATIC_WATERFILL=1` to force dynamic Waterfill with runtime EP all-reduce. | `False` | bool flag (set to enable) |
| `--elastic-ep-rejoin` | Indicates that this process is a relaunched elastic EP rank that should rejoin an existing process group during rank recovery. | `False` | bool flag (set to enable) |

## Mamba Cache
Expand Down
3 changes: 3 additions & 0 deletions python/sglang/srt/environ.py
Original file line number Diff line number Diff line change
Expand Up @@ -412,6 +412,9 @@ class Envs:
SGLANG_DEEPEP_NUM_MAX_DISPATCH_TOKENS_PER_RANK = EnvInt(128)
SGLANG_DEEPEP_LL_COMBINE_SEND_NUM_SMS = EnvInt(32)
SGLANG_BLACKWELL_OVERLAP_SHARED_EXPERTS_OUTSIDE_SBO = EnvBool(False)
# Force dynamic DeepEP Waterfill with runtime EP all-reduce instead of the
# default static local-batch path.
SGLANG_DISABLE_STATIC_WATERFILL = EnvBool(False)
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

when should we set this env variable?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

when we use dynamic waterfill.


# NIXL-EP
SGLANG_NIXL_EP_BF16_DISPATCH = EnvBool(False)
Expand Down
Loading
Loading