[NVIDIA]Fix local_num_experts for EP by wenscarl · Pull Request #8779 · sgl-project/sglang

wenscarl · 2025-08-05T00:38:11Z

Motivation

This PR fixes a bug in #8552. At create_weight, the num_experts and num_loca_experts should both be passed in for EP case.

@kaixih @kushanam

cc. @kushanam

Modifications

Accuracy Test

Benchmark & Profiling

Checklist

Format your code according to the Code Formatting with Pre-Commit.
Add unit tests as outlined in the Running Unit Tests.
Update documentation / docstrings / example tutorials as needed, according to Writing Documentation.
Provide throughput / latency benchmark results and accuracy evaluation results as needed, according to Benchmark and Profiling and Accuracy Results.
For reviewers: If you haven't made any contributions to this PR and are only assisting with merging the main branch, please remove yourself as a co-author when merging the PR.
Please feel free to join our Slack channel at https://slack.sglang.ai to discuss your PR.

gemini-code-assist

Summary of Changes

Hello @wenscarl, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

I'm addressing a bug in the handling of expert parallelism (EP) within the SGLang framework. Specifically, the create_weights function was not correctly receiving or utilizing the num_local_experts parameter, leading to issues when initializing weights for MoE layers. This PR ensures that both the total number of experts and the number of local experts are properly passed and assigned during weight creation.

Highlights

Bug Fix for Expert Parallelism: I've resolved an issue where the num_local_experts was incorrectly handled during weight creation for Mixture-of-Experts (MoE) layers in expert parallelism (EP) setups.
Updated create_weights Call: The create_weights method in fused_moe_triton/layer.py now explicitly passes both num_experts and num_local_experts to the quantization method.
Modified create_weights Signature: The create_weights method within modelopt_quant.py has been updated to accept the new num_local_experts parameter.
Correct num_local_experts Assignment: Inside modelopt_quant.py, the layer.num_local_experts attribute is now correctly assigned the value of the num_local_experts parameter, rather than num_experts.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point in your pull request via creating an issue comment (i.e. comment on the pull request page) using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in issue comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments or fill out our survey to provide feedback.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request addresses a bug in expert parallelism by correctly passing both num_experts and num_local_experts during weight creation. The previous implementation incorrectly used num_local_experts for the total number of experts. The logical changes appear correct and align with the intended fix. My review includes a couple of minor suggestions to fix indentation for better code consistency.

gemini-code-assist · 2025-08-05T00:39:03Z

python/sglang/srt/layers/moe/fused_moe_triton/layer.py

The indentation for this line appears to be using a tab character, which is inconsistent with the surrounding code that uses spaces. To maintain a consistent code style, please replace the tab with spaces.

Suggested change

num_local_experts=self.num_local_experts,

num_local_experts=self.num_local_experts,

gemini-code-assist · 2025-08-05T00:39:03Z

python/sglang/srt/layers/quantization/modelopt_quant.py

This line seems to be indented with a tab character, while the rest of the function arguments use spaces. For consistency, please use spaces for indentation.

Suggested change

num_local_experts: int,

num_local_experts: int,

gemini-code-assist · 2025-08-05T05:01:17Z

Warning

You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again!

This reverts commit b01eeb8.

zhyncs · 2025-08-05T06:31:43Z

Hi @wenscarl this pr breaks unit-test-deepep-4 so I reverted #8797

kaixih · 2025-08-05T06:57:48Z

python/sglang/srt/layers/quantization/modelopt_quant.py

@@ -752,6 +752,7 @@ def create_weights(
        self,
        layer: torch.nn.Module,
        num_experts: int,


Why not simply change the num_experts to num_local_experts? I think we just need the actual num of experts when allocating weights, right?

…-project#8797)

wenscarl requested review from BBuf, HaiShaw, Ying1123, ch-wan, ispobock, kushanam, merrymercy and zhyncs as code owners August 5, 2025 00:38

gemini-code-assist bot reviewed Aug 5, 2025

View reviewed changes

Fix local_num_experts bug

97a8052

wenscarl force-pushed the fix_trtllm_moe_typo branch from 490660b to 97a8052 Compare August 5, 2025 00:44

zhyncs added bug Something isn't working high priority labels Aug 5, 2025

zhyncs assigned zhyncs and ch-wan Aug 5, 2025

zhyncs merged commit b01eeb8 into sgl-project:main Aug 5, 2025
108 of 122 checks passed

zhyncs mentioned this pull request Aug 5, 2025

Fix positional argument #8792

Closed

6 tasks

zhyncs added a commit that referenced this pull request Aug 5, 2025

Revert "[NVIDIA]Fix local_num_experts for EP (#8779)"

29f858b

This reverts commit b01eeb8.

zhyncs added a commit that referenced this pull request Aug 5, 2025

Revert "[NVIDIA]Fix local_num_experts for EP (#8779)" (#8797)

5e91fed

kaixih reviewed Aug 5, 2025

View reviewed changes

wenscarl mentioned this pull request Aug 5, 2025

[NVIDIA] Fix num_experts in modelopt_quant #8811

Merged

6 tasks

narutolhy pushed a commit to narutolhy/sglang that referenced this pull request Aug 17, 2025

[NVIDIA]Fix local_num_experts for EP (sgl-project#8779)

9afb63c

narutolhy pushed a commit to narutolhy/sglang that referenced this pull request Aug 17, 2025

Revert "[NVIDIA]Fix local_num_experts for EP (sgl-project#8779)" (sgl…

5431d55

…-project#8797)

MahmoudAshraf97 pushed a commit to MahmoudAshraf97/sglang that referenced this pull request Sep 8, 2025

[NVIDIA]Fix local_num_experts for EP (sgl-project#8779)

03061ec

MahmoudAshraf97 pushed a commit to MahmoudAshraf97/sglang that referenced this pull request Sep 8, 2025

Revert "[NVIDIA]Fix local_num_experts for EP (sgl-project#8779)" (sgl…

c7ee6b3

…-project#8797)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[NVIDIA]Fix local_num_experts for EP#8779

[NVIDIA]Fix local_num_experts for EP#8779
zhyncs merged 1 commit intosgl-project:mainfrom
wenscarl:fix_trtllm_moe_typo

wenscarl commented Aug 5, 2025

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Aug 5, 2025

Uh oh!

gemini-code-assist bot Aug 5, 2025

Uh oh!

Uh oh!

gemini-code-assist bot commented Aug 5, 2025

Uh oh!

zhyncs commented Aug 5, 2025

Uh oh!

kaixih Aug 5, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

	num_local_experts=self.num_local_experts,
	num_local_experts=self.num_local_experts,

Conversation

wenscarl commented Aug 5, 2025

Motivation

Modifications

Accuracy Test

Benchmark & Profiling

Checklist

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Aug 5, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Aug 5, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

gemini-code-assist bot commented Aug 5, 2025

Uh oh!

zhyncs commented Aug 5, 2025

Uh oh!

kaixih Aug 5, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants