[Feature] Enable return routed experts#12162
Conversation
Summary of ChangesHello @ocss884, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request implements a new feature that allows the system to return the IDs of the experts chosen by the router in Mixture-of-Experts (MoE) models. This functionality is particularly valuable for the reinforcement learning community, as it provides crucial insights into expert utilization during the training phase. The changes involve creating a dedicated component for capturing these expert IDs, integrating this capture into the model's forward pass, and extending the request and response data structures to propagate this information back to the user. Highlights
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Code Review
This pull request introduces a feature to enable returning routed experts for Mixture-of-Experts models, which is useful for training and analysis. The implementation adds a RoutedExpertsCapturer and integrates it throughout the request lifecycle. While the overall approach is sound, there are several critical and high-severity issues that need to be addressed. These include a method signature mismatch that will cause a TypeError, and incorrect logic for clearing the expert data buffer which could lead to data corruption between batches. Additionally, there are opportunities to improve the code quality by removing dead code, a debug print statement, and refactoring the use of global state for better maintainability.
|
🐂🍺 |
|
seems related: #9499 given this todo list, wondering whether reusing EPLB distribution recorder like in 9499 may be good since cudagraph / ep is already supported in it. (maybe you can collaborate w/ people in 9499)
|
|
offline synced w/ @ocss884, notes
|
|
Hi, I tested your PR and found that there are only output routed_experts, but no input routed_experts. I believe that when doing RL, input routed_experts should be needed. Additionally, if this functionality needs to be implemented, we might need to consider sglang's radix_cache. |
|
_experts_capturer_host_buffer is insufficient. It is recommended to change it to dynamic expansion or directly modify it to allow custom configuration via parameters? |
a896e82 to
ca4b3a4
Compare
|
/rerun-failed-ci try again |
|
/rerun-failed-ci |
|
GREAT JOB |
| if recv_obj.output_routed_experts is not None: | ||
| output_routed_experts = [ | ||
| ( | ||
| pybase64.b64encode(output_routed_experts.numpy().tobytes()).decode( |
There was a problem hiding this comment.
Why do we need to use base64 to encode bytes into string? Can you just use bytes?
Co-authored-by: yizhang2077 <1109276519@qq.com> Co-authored-by: Liangsheng Yin <lsyincs@gmail.com>
Co-authored-by: yizhang2077 <1109276519@qq.com> Co-authored-by: Liangsheng Yin <lsyincs@gmail.com>




Motivation
As per the request from the RL community, this PR enables sglang to return routed experts (topk) during fwd for later usage in training phase. Thanks the MiMo team for proprosing R3 to help stablizing MoE Reinforcement Learning. This method has been used in MiMo-V2-Flash and DeepSeek-V3.2. We also notice that this methods effectively reduce the logprob diff between training and inference output.
TODO LIST
This PR does not compatible with:
Modifications
Accuracy Tests
Benchmarking and Profiling
Checklist