- 
          
- 
                Notifications
    You must be signed in to change notification settings 
- Fork 10.9k
[EPLB] Support ernie4.5-moe #22100
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[EPLB] Support ernie4.5-moe #22100
Conversation
Signed-off-by: Haisheng Chen <[email protected]>
| 👋 Hi! Thank you for contributing to the vLLM project. 💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels. Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run  Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging. To run CI, PR reviewers can either: Add  🚀 | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request enables Expert Parallelism Load Balancing (EPLB) for the ERNIE-4.5-MoE model. The changes correctly integrate EPLB configurations and adapt the model to the MixtureOfExperts interface. I've identified a few issues that could lead to runtime errors: one is a potential TypeError from an incorrect default value for moe_num_shared_experts, another is an UnboundLocalError from using a variable before it's guaranteed to be assigned, and two more are potential AttributeError from using attributes before they're properly initialized. I've provided suggestions to fix all issues. Once these are addressed, the changes look solid.
| @abmfy This PR enables EPLB on Ernie4.5-moe. | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the contribution!
Overall LGTM, except for the removal of e_score_correction_bias, which could break the weight loading process.
Please fix this, and I’ll verify the accuracy afterward.
| if "e_score_correction_bias" in name: | ||
| name = name.replace("moe_statics", "gate") | ||
| loaded_weight = loaded_weight.squeeze(0) | ||
|  | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why was e_score_correction_bias removed from this file? The ERNIE model might contain this parameter. Note that inside Ernie4_5_MoeMoE class this is also removed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for catching this! I’ve checked the commit history and saw that e_score_correction_bias was introduced at roughly the same time I began this PR.
My PR from last week failed the DCO check, and rebasing it onto main would have produced some conflicts and extra commits.
To keep the history clean, I instead created a fresh branch off main and merged last week’s work into it.
That merge temporarily dropped e_score_correction_bias here. I’ll add it back in the next commit.
Signed-off-by: Haisheng Chen <[email protected]>
Signed-off-by: Haisheng Chen <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
Accuracy tests:
w/o EPLB:
| Tasks | Version | Filter | n-shot | Metric | Value | Stderr | ||
|---|---|---|---|---|---|---|---|---|
| gsm8k | 3 | flexible-extract | 5 | exact_match | ↑ | 0.8203 | ± | 0.0106 | 
| strict-match | 5 | exact_match | ↑ | 0.7885 | ± | 0.0112 | 
w/ EPLB:
| Tasks | Version | Filter | n-shot | Metric | Value | Stderr | ||
|---|---|---|---|---|---|---|---|---|
| gsm8k | 3 | flexible-extract | 5 | exact_match | ↑ | 0.8173 | ± | 0.0106 | 
| strict-match | 5 | exact_match | ↑ | 0.7885 | ± | 0.0112 | 
Thanks for the contribution!
| This pull request has merge conflicts that must be resolved before it can be | 
Signed-off-by: Haisheng Chen <[email protected]>
| @simon-mo Could you please take a look and approve? I’ve already reviewed it and confirmed the accuracy. Thank you! | 
| @abmfy Could you take a look when have time? I'm not sure what is wrong with this PR. | 
Head branch was pushed to by a user without write access
define example_moe first Signed-off-by: Haisheng Chen <[email protected]>
d893d79    to
    0028a75      
    Compare
  
    | fix conflict and fix tests, and then it should be fine to merge. | 
Signed-off-by: Haisheng Chen <[email protected]>
Head branch was pushed to by a user without write access
Signed-off-by: Haisheng Chen <[email protected]>
Signed-off-by: Haisheng Chen <[email protected]>
Signed-off-by: Haisheng Chen <[email protected]>
Signed-off-by: Haisheng Chen <[email protected]> Signed-off-by: Haisheng Chen <[email protected]> Signed-off-by: Haisheng Chen <[email protected]> Co-authored-by: Haisheng Chen <[email protected]> Signed-off-by: 1994 <[email protected]>
Signed-off-by: Haisheng Chen <[email protected]> Signed-off-by: Haisheng Chen <[email protected]> Signed-off-by: Haisheng Chen <[email protected]> Co-authored-by: Haisheng Chen <[email protected]> Signed-off-by: Dhruvil Bhatt <[email protected]>
Signed-off-by: Haisheng Chen <[email protected]> Signed-off-by: Haisheng Chen <[email protected]> Signed-off-by: Haisheng Chen <[email protected]> Co-authored-by: Haisheng Chen <[email protected]> Signed-off-by: bbartels <[email protected]>
Signed-off-by: Haisheng Chen <[email protected]> Signed-off-by: Haisheng Chen <[email protected]> Signed-off-by: Haisheng Chen <[email protected]> Co-authored-by: Haisheng Chen <[email protected]>
Signed-off-by: Haisheng Chen <[email protected]> Signed-off-by: Haisheng Chen <[email protected]> Signed-off-by: Haisheng Chen <[email protected]> Co-authored-by: Haisheng Chen <[email protected]>
Signed-off-by: Haisheng Chen <[email protected]> Signed-off-by: Haisheng Chen <[email protected]> Signed-off-by: Haisheng Chen <[email protected]> Co-authored-by: Haisheng Chen <[email protected]> Signed-off-by: xuebwang-amd <[email protected]>
Signed-off-by: Haisheng Chen <[email protected]> Signed-off-by: Haisheng Chen <[email protected]> Signed-off-by: Haisheng Chen <[email protected]> Co-authored-by: Haisheng Chen <[email protected]> Signed-off-by: xuebwang-amd <[email protected]>
Signed-off-by: Haisheng Chen <[email protected]> Signed-off-by: Haisheng Chen <[email protected]> Signed-off-by: Haisheng Chen <[email protected]> Co-authored-by: Haisheng Chen <[email protected]> Signed-off-by: 0xrushi <[email protected]>
Signed-off-by: Haisheng Chen <[email protected]> Signed-off-by: Haisheng Chen <[email protected]> Signed-off-by: Haisheng Chen <[email protected]> Co-authored-by: Haisheng Chen <[email protected]> Signed-off-by: 0xrushi <[email protected]>
Essential Elements of an Effective PR Description Checklist
supported_models.mdandexamplesfor a new model.Purpose
#20468 Enable EPLB on ERNIE-4.5-MoE
Test Plan
Running the following test script
python test_eplb.py --mode eplb
python test_eplb.py --mode normal
Test Result
The implementation looks correct. The EPLB output begins to divert from the original output in the middle of generation.
(Optional) Documentation Update