[Feature] Enable return routed experts by ocss884 · Pull Request #12162 · sgl-project/sglang

ocss884 · 2025-10-26T17:27:58Z

Motivation

As per the request from the RL community, this PR enables sglang to return routed experts (topk) during fwd for later usage in training phase. Thanks the MiMo team for proprosing R3 to help stablizing MoE Reinforcement Learning. This method has been used in MiMo-V2-Flash and DeepSeek-V3.2. We also notice that this methods effectively reduce the logprob diff between training and inference output.

TODO LIST

This PR does not compatible with:

deepep usage (will be in next pr)
mtp (will be in next pr)
PP (no plan so far)

Modifications

Accuracy Tests

Benchmarking and Profiling

Checklist

Format your code according to the Format code with pre-commit.
Add unit tests according to the Run and add unit tests.
Update documentation according to Write documentations.
Provide accuracy and speed benchmark results according to Test the accuracy and Benchmark the speed.

gemini-code-assist · 2025-10-26T17:28:16Z

Summary of Changes

Hello @ocss884, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request implements a new feature that allows the system to return the IDs of the experts chosen by the router in Mixture-of-Experts (MoE) models. This functionality is particularly valuable for the reinforcement learning community, as it provides crucial insights into expert utilization during the training phase. The changes involve creating a dedicated component for capturing these expert IDs, integrating this capture into the model's forward pass, and extending the request and response data structures to propagate this information back to the user.

Highlights

New Feature: Return Routed Experts: Introduced the capability to return the IDs of routed experts for Mixture-of-Experts (MoE) models, a feature specifically requested by the RL community for training purposes.
Routed Experts Capturer: Added a new RoutedExpertsCapturer abstraction and its concrete implementation to manage the capture and storage of routed expert IDs during model inference.
Configuration and API Extension: A new server argument --enable-return-routed-experts and a corresponding enable_return_routed_experts field in ServerArgs have been added to control this feature. Request objects (GenerateReqInput, TokenizedGenerateReqInput) and output structures (BatchTokenIDOutput, BatchStrOutput) are extended to support this new data.
Integration into Inference Pipeline: The capturing mechanism is integrated into the MoE layer's forward pass, and the captured experts are processed and returned through the scheduler and output processing managers.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

ocss884 · 2025-10-26T17:29:02Z

@zhaochenyang20 @zhuzilin @yizhang2077 @fzyzcjy

gemini-code-assist

Code Review

This pull request introduces a feature to enable returning routed experts for Mixture-of-Experts models, which is useful for training and analysis. The implementation adds a RoutedExpertsCapturer and integrates it throughout the request lifecycle. While the overall approach is sound, there are several critical and high-severity issues that need to be addressed. These include a method signature mismatch that will cause a TypeError, and incorrect logic for clearing the expert data buffer which could lead to data corruption between batches. Additionally, there are opportunities to improve the code quality by removing dead code, a debug print statement, and refactoring the use of global state for better maintainability.

zhaochenyang20 · 2025-10-26T17:45:19Z

🐂🍺

fzyzcjy · 2025-10-27T01:30:27Z

seems related: #9499

given this todo list, wondering whether reusing EPLB distribution recorder like in 9499 may be good since cudagraph / ep is already supported in it.

(maybe you can collaborate w/ people in 9499)

fzyzcjy · 2025-10-27T12:42:45Z

offline synced w/ @ocss884, notes

lizipao · 2025-10-29T13:11:01Z

Hi, I tested your PR and found that there are only output routed_experts, but no input routed_experts. I believe that when doing RL, input routed_experts should be needed. Additionally, if this functionality needs to be implemented, we might need to consider sglang's radix_cache.

lizipao · 2025-10-29T14:15:49Z

_experts_capturer_host_buffer is insufficient. It is recommended to change it to dynamic expansion or directly modify it to allow custom configuration via parameters?

yizhang2077 · 2025-12-20T11:54:37Z

/rerun-failed-ci try again

zhaochenyang20 · 2025-12-21T05:09:34Z

/rerun-failed-ci

hnyls2002 · 2025-12-21T07:14:07Z

https://github.com/sgl-project/sglang/actions/runs/20393337803?pr=12162

All CUDA CIs passed.

zhaochenyang20 · 2025-12-21T08:10:10Z

GREAT JOB

merrymercy · 2025-12-25T09:16:07Z

+        if recv_obj.output_routed_experts is not None:
+            output_routed_experts = [
+                (
+                    pybase64.b64encode(output_routed_experts.numpy().tobytes()).decode(


Why do we need to use base64 to encode bytes into string? Can you just use bytes?

Co-authored-by: yizhang2077 <1109276519@qq.com> Co-authored-by: Liangsheng Yin <lsyincs@gmail.com>

ocss884 and others added 3 commits October 26, 2025 17:06

init

cf8d0fd

Merge branch 'sgl-project:main' into return_routed_expert

a720fae

more

542d490

ocss884 requested review from BBuf, Edwardf0t1, HaiShaw, Ying1123, ch-wan, hnyls2002, ispobock, kushanam, merrymercy, xiezhq-hermann and zhyncs as code owners October 26, 2025 17:27

gemini-code-assist Bot reviewed Oct 26, 2025

View reviewed changes

ocss884 added 2 commits October 26, 2025 17:32

small fix

cfc330b

small fix

1b9e9fa

fzyzcjy reviewed Oct 27, 2025

View reviewed changes

Comment thread python/sglang/srt/layers/moe/routed_experts_capturer.py Outdated

HugoZHL reviewed Oct 28, 2025

View reviewed changes

Comment thread python/sglang/srt/layers/moe/fused_moe_triton/layer.py Outdated

HugoZHL reviewed Oct 28, 2025

View reviewed changes

Comment thread python/sglang/srt/layers/moe/routed_experts_capturer.py Outdated

ocss884 added 3 commits October 28, 2025 16:54

refactor

5a70eb7

add layer_id to select_experts

6924725

more

1f40d43

ocss884 requested a review from Kangyan-Zhou as a code owner December 20, 2025 10:44

ocss884 force-pushed the return_routed_expert branch from a896e82 to ca4b3a4 Compare December 20, 2025 10:49

ocss884 and others added 2 commits December 20, 2025 18:50

Merge branch 'main' into return_routed_expert

efee10b

lint

67210c2

hnyls2002 merged commit bed301a into sgl-project:main Dec 21, 2025
400 of 430 checks passed

merrymercy reviewed Dec 25, 2025

View reviewed changes

jiaming1130 pushed a commit to zhuyijie88/sglang that referenced this pull request Dec 25, 2025

[Feature] Enable return routed experts (sgl-project#12162)

85fdb80

Co-authored-by: yizhang2077 <1109276519@qq.com> Co-authored-by: Liangsheng Yin <lsyincs@gmail.com>

ocss884 mentioned this pull request Jan 4, 2026

[Feature] [RL] Rollout routing replay (R3) Roadmap #16379

Closed

12 tasks

PrinsYin mentioned this pull request Jan 4, 2026

Mitigate content from sglang.patch to sglang THUDM/slime#1316

Closed

12 tasks

This was referenced Jan 10, 2026

[Feature] DSv32: Enable return DSA sparse token selection #16856

Closed

[DSv32] Add returning DSA topk indices #16881

Closed

ocss884 mentioned this pull request Jan 12, 2026

Reorganize topk logic to clean up code and expose logical experts #16945

Merged

5 tasks

YChange01 pushed a commit to YChange01/sglang that referenced this pull request Jan 13, 2026

[Feature] Enable return routed experts (sgl-project#12162)

64fb191

Co-authored-by: yizhang2077 <1109276519@qq.com> Co-authored-by: Liangsheng Yin <lsyincs@gmail.com>

TianyiZhao1437 mentioned this pull request Jan 23, 2026

[Feature]: Routed Experts GradientHQ/parallax#395

Closed

JD-ETH mentioned this pull request Jan 26, 2026

[Feature] Support Router replay on Gateway API #17782

Closed

2 tasks

Conversation

ocss884 commented Oct 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation

Modifications

Accuracy Tests

Benchmarking and Profiling

Checklist

Uh oh!

gemini-code-assist Bot commented Oct 26, 2025

Summary of Changes

Highlights

Footnotes

Uh oh!

ocss884 commented Oct 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

zhaochenyang20 commented Oct 26, 2025

Uh oh!

fzyzcjy commented Oct 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

fzyzcjy commented Oct 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

lizipao commented Oct 29, 2025

Uh oh!

lizipao commented Oct 29, 2025

Uh oh!

yizhang2077 commented Dec 20, 2025 • edited by zhaochenyang20 Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

zhaochenyang20 commented Dec 21, 2025

Uh oh!

hnyls2002 commented Dec 21, 2025

Uh oh!

Uh oh!

zhaochenyang20 commented Dec 21, 2025

Uh oh!

merrymercy Dec 25, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

14 participants

ocss884 commented Oct 26, 2025 •

edited

Loading

ocss884 commented Oct 26, 2025 •

edited

Loading

fzyzcjy commented Oct 27, 2025 •

edited

Loading

fzyzcjy commented Oct 27, 2025 •

edited

Loading

yizhang2077 commented Dec 20, 2025 •

edited by zhaochenyang20

Loading