Skip to content

[NVIDIA]Fix local_num_experts for EP#8779

Merged
zhyncs merged 1 commit intosgl-project:mainfrom
wenscarl:fix_trtllm_moe_typo
Aug 5, 2025
Merged

[NVIDIA]Fix local_num_experts for EP#8779
zhyncs merged 1 commit intosgl-project:mainfrom
wenscarl:fix_trtllm_moe_typo

Conversation

@wenscarl
Copy link
Copy Markdown
Collaborator

@wenscarl wenscarl commented Aug 5, 2025

Motivation

This PR fixes a bug in #8552. At create_weight, the num_experts and num_loca_experts should both be passed in for EP case.

@kaixih @kushanam

cc. @kushanam

Modifications

Accuracy Test

Benchmark & Profiling

Checklist

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Summary of Changes

Hello @wenscarl, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

I'm addressing a bug in the handling of expert parallelism (EP) within the SGLang framework. Specifically, the create_weights function was not correctly receiving or utilizing the num_local_experts parameter, leading to issues when initializing weights for MoE layers. This PR ensures that both the total number of experts and the number of local experts are properly passed and assigned during weight creation.

Highlights

  • Bug Fix for Expert Parallelism: I've resolved an issue where the num_local_experts was incorrectly handled during weight creation for Mixture-of-Experts (MoE) layers in expert parallelism (EP) setups.
  • Updated create_weights Call: The create_weights method in fused_moe_triton/layer.py now explicitly passes both num_experts and num_local_experts to the quantization method.
  • Modified create_weights Signature: The create_weights method within modelopt_quant.py has been updated to accept the new num_local_experts parameter.
  • Correct num_local_experts Assignment: Inside modelopt_quant.py, the layer.num_local_experts attribute is now correctly assigned the value of the num_local_experts parameter, rather than num_experts.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point in your pull request via creating an issue comment (i.e. comment on the pull request page) using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in issue comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments or fill out our survey to provide feedback.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request addresses a bug in expert parallelism by correctly passing both num_experts and num_local_experts during weight creation. The previous implementation incorrectly used num_local_experts for the total number of experts. The logical changes appear correct and align with the intended fix. My review includes a couple of minor suggestions to fix indentation for better code consistency.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The indentation for this line appears to be using a tab character, which is inconsistent with the surrounding code that uses spaces. To maintain a consistent code style, please replace the tab with spaces.

Suggested change
num_local_experts=self.num_local_experts,
num_local_experts=self.num_local_experts,

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

This line seems to be indented with a tab character, while the rest of the function arguments use spaces. For consistency, please use spaces for indentation.

Suggested change
num_local_experts: int,
num_local_experts: int,

@wenscarl wenscarl force-pushed the fix_trtllm_moe_typo branch from 490660b to 97a8052 Compare August 5, 2025 00:44
@zhyncs zhyncs added bug Something isn't working high priority labels Aug 5, 2025
@zhyncs zhyncs merged commit b01eeb8 into sgl-project:main Aug 5, 2025
108 of 122 checks passed
@gemini-code-assist
Copy link
Copy Markdown
Contributor

Warning

You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again!

@zhyncs zhyncs mentioned this pull request Aug 5, 2025
6 tasks
zhyncs added a commit that referenced this pull request Aug 5, 2025
@zhyncs
Copy link
Copy Markdown
Collaborator

zhyncs commented Aug 5, 2025

Hi @wenscarl this pr breaks unit-test-deepep-4 so I reverted #8797

@@ -752,6 +752,7 @@ def create_weights(
self,
layer: torch.nn.Module,
num_experts: int,
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not simply change the num_experts to num_local_experts? I think we just need the actual num of experts when allocating weights, right?

narutolhy pushed a commit to narutolhy/sglang that referenced this pull request Aug 17, 2025
MahmoudAshraf97 pushed a commit to MahmoudAshraf97/sglang that referenced this pull request Sep 8, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working high priority

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants