[2/N] Added the core structure of elastic EP and the eplb algorithm with faulty rank by HanHan009527 · Pull Request #10606 · sgl-project/sglang

HanHan009527 · 2025-09-18T10:07:38Z

Motivation

Integrating Mooncake's fault-awareness, we need to adjust the eplb algorithm and model loading logic to enable the forward pass to bypass faulty ranks.

base on #10423
check our next pr #11657 and full draft #8961 (update on 10.16) to test the effect of fault redundancy

The ut part is modified to facilitate testing on machines with different ibdev names

Modifications

Adding the core structure of the elastic ep module to bridge forward propagation and scheduling.
Add an EPLB algorithm that supports fault rank.

Accuracy Tests

Test results are available from our full draft version. #8961

Benchmarking and Profiling

Checklist

Format your code according to the Format code with pre-commit.
Add unit tests according to the Run and add unit tests.
Update documentation according to Write documentations.
Provide accuracy and speed benchmark results according to Test the accuracy and Benchmark the speed.

gemini-code-assist

Summary of Changes

Hello @HanHan009527, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly enhances the system's resilience and efficiency in distributed environments by integrating Mooncake's fault-aware Expert Parallelism (EP). It establishes a foundational elastic EP module and introduces an intelligent load balancing algorithm capable of adjusting expert distribution in the presence of faulty ranks. Furthermore, the changes enable the model loading process to dynamically adapt to the current set of active ranks, ensuring continuous operation even when some nodes are unavailable.

Highlights

Mooncake EP Integration: Introduced the core structure for Mooncake's elastic Expert Parallelism (EP) to enable fault-aware distributed computing, allowing forward passes to bypass faulty ranks.
Elasticity-Aware Load Balancing: Implemented a new Expert Parallelism Load Balancing (EPLB) algorithm that considers and adapts to faulty ranks, ensuring efficient expert distribution even with node failures.
Dynamic Model Loading and Updates: Modified model loading logic to bypass faulty ranks and dynamically update expert weights based on the current set of active ranks, enhancing system resilience.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request introduces the core structure for elastic expert parallelism (EP) and a new load-balancing algorithm (elasticity_aware) to support fault-tolerant ranks, primarily for integration with the Mooncake backend. The changes are well-structured, adding a new elastic_ep module for state management, a mooncake token dispatcher, and updating various parts of the system to be aware of the new backend and fault-tolerance logic. My review includes suggestions for code cleanup, improving configurability by removing hardcoded values, and refactoring for better maintainability and extensibility.

python/sglang/srt/eplb/eplb_algorithms/elasticity_aware.py

python/sglang/srt/layers/moe/token_dispatcher/mooncake.py

python/sglang/srt/models/deepseek_v2.py

python/sglang/srt/two_batch_overlap.py

ShangmingCai · 2025-10-14T08:03:38Z

CC: @zhihui1084

fix fix fix fix fix fix fix ut ut ut fix fit

…a` (#13)

fi fi fix fix fix fix fix fix fix fix fix fit fix

python/sglang/test/test_disaggregation_utils.py

ShangmingCai · 2025-10-19T09:15:22Z

python/sglang/srt/elastic_ep/elastic_ep.py

+    @classmethod
+    def healthy_rank_state(
+        cls, *, ep_size: Optional[int], device: Optional[torch.device]
+    ) -> torch.Tensor:
+        size = ep_size if ep_size is not None else torch.distributed.get_world_size()
+        dev = device if device is not None else cls._select_device()
+
+        return torch.ones(size, dtype=torch.int32, device=dev)


Should this dtype be changed to torch.int64 to align with the future usage?

Mooncake EP currently uses int32. BTW, what does "future usage" refer to?

@UNIDY2002 Just wonder if it should align with log2phy, which is int64. But I am not sure.

I had a quick verification. active_ranks has two usages in rebalance_experts: (1) num_active_ranks = active_ranks.sum().item(); (2) active_ranks_list = active_ranks.tolist(), so I think using int32 for active_ranks may be okay.

ShangmingCai · 2025-10-19T09:27:11Z

python/sglang/srt/elastic_ep/elastic_ep.py

+    def healthy_rank_state(
+        cls, *, ep_size: Optional[int], device: Optional[torch.device]
+    ) -> torch.Tensor:
+        size = ep_size if ep_size is not None else torch.distributed.get_world_size()
+        dev = device if device is not None else cls._select_device()


nit: maybe give ep_size and device a default value: None

ShangmingCai

LGTM. Changes are clean.

fzyzcjy

LGTM only a nit

fzyzcjy · 2025-10-20T01:36:55Z

python/sglang/srt/elastic_ep/elastic_ep.py

+
+    @classmethod
+    def init(cls, server_args: ServerArgs):
+        with cls._lock:


nit: shall we init it in one single thread and only once, then the code can be simplified here

get, I will remove this lock. This part just follows the customary writing style of singleton, which is originally single-threaded.

…ith faulty rank (sgl-project#10606) Co-authored-by: Xun Sun <UNIDY2002@outlook.com> Co-authored-by: Shangming Cai <csmthu@gmail.com>

HanHan009527 requested review from BBuf, Edwardf0t1, HaiShaw, Ying1123, ch-wan, fzyzcjy, hnyls2002, ispobock, kushanam, merrymercy, yizhang2077 and zhyncs as code owners September 18, 2025 10:07

sglang-bot added the run-ci label Sep 18, 2025

HanHan009527 marked this pull request as draft September 18, 2025 10:07

gemini-code-assist bot reviewed Sep 18, 2025

View reviewed changes

HanHan009527 removed the run-ci label Sep 18, 2025

gemini-code-assist bot reviewed Sep 18, 2025

View reviewed changes

HanHan009527 changed the title ~~[2/N] Added the core structure of elastic EP and the eplb algorithm with rank loss~~ [2/N] Added the core structure of elastic EP and the eplb algorithm with faulty rank Sep 18, 2025

HanHan009527 force-pushed the mooncake-pr-eplb branch 2 times, most recently from 23ad09f to a52e2ec Compare October 2, 2025 07:44

HanHan009527 force-pushed the mooncake-pr-eplb branch 2 times, most recently from 3b0c5fd to f671015 Compare October 9, 2025 12:30

HanHan009527 force-pushed the mooncake-pr-eplb branch from f671015 to 28b950a Compare October 15, 2025 03:13

HanHan009527 marked this pull request as ready for review October 15, 2025 14:03

HanHan009527 added the run-ci label Oct 15, 2025

HanHan009527 force-pushed the mooncake-pr-eplb branch from b47fff6 to 7271b89 Compare October 15, 2025 16:57

HanHan009527 and others added 3 commits October 16, 2025 01:00

pr2 eplb

8ae4347

fix fix fix fix fix fix fix ut ut ut fix fit

Let token_dispatcher/mooncake.py use the `global_elastic_ep_metadat…

f01ba58

…a` (#13)

fix

cd65b69

fi fi fix fix fix fix fix fix fix fix fix fit fix

HanHan009527 added 6 commits October 16, 2025 01:15

fix

9808c8d

fix

cb54875

test

7b1bd4e

t

2606322

ut

9a99351

Merge branch 'main' into mooncake-pr-eplb

06563c0

ShangmingCai reviewed Oct 16, 2025

View reviewed changes

python/sglang/test/test_disaggregation_utils.py Outdated Show resolved Hide resolved

Merge branch 'main' into mooncake-pr-eplb

fd8cc23

ShangmingCai mentioned this pull request Oct 17, 2025

[Roadmap] Distributed Serving Enhancement on 2025 H2 #8210

Open

22 tasks

HanHan009527 added 2 commits October 17, 2025 10:42

Merge branch 'main' into mooncake-pr-eplb

7b06878

Merge branch 'main' into mooncake-pr-eplb

4da41cd

ShangmingCai reviewed Oct 19, 2025

View reviewed changes

ShangmingCai approved these changes Oct 19, 2025

View reviewed changes

HanHan009527 mentioned this pull request Oct 19, 2025

[4/N]Elastic EP support deepep backend #11837

Draft

4 tasks

fzyzcjy reviewed Oct 20, 2025

View reviewed changes

root and others added 3 commits October 20, 2025 06:42

review

687ba59

Merge branch 'main' into mooncake-pr-eplb

d66c884

fix lint

fbc874e

ShangmingCai added the ready-to-merge The PR is ready to merge after the CI is green. label Oct 20, 2025

HanHan009527 and others added 6 commits October 21, 2025 11:08

Merge branch 'main' into mooncake-pr-eplb

47cb4ad

Merge branch 'main' into mooncake-pr-eplb

9168004

Merge branch 'main' into mooncake-pr-eplb

8fdffde

Merge branch 'main' into mooncake-pr-eplb

a133b2d

Merge branch 'main' into mooncake-pr-eplb

5b10b90

Merge branch 'main' into mooncake-pr-eplb

2cd15a9

ch-wan merged commit 904655c into sgl-project:main Oct 22, 2025
139 of 143 checks passed

This was referenced Oct 24, 2025

[5/N] (Elastic EP) Use GPU P2P to exchange expert weights during EPLB as much as possible #12068

Merged

Elastic EP Support (Milestone 1 & 2) #8961

Closed

Conversation

HanHan009527 commented Sep 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation

Modifications

Accuracy Tests

Benchmarking and Profiling

Checklist

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ShangmingCai commented Oct 14, 2025

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ShangmingCai left a comment

Choose a reason for hiding this comment

Uh oh!

fzyzcjy left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

HanHan009527 commented Sep 18, 2025 •

edited

Loading