-
Couldn't load subscription status.
- Fork 3.2k
Add ernie4.py for ERNIE-4.5 #7657
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Summary of Changes
Hello @solrex, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!
This pull request introduces the Ernie4.5 model implementation, including both standard and Mixture of Experts (MoE) variants. It provides the necessary modules and weight loading mechanisms to support inference with ERNIE-4.5 models. The author has marked this PR as a preview and states that it needs more testing before merging.
Highlights
- Model Implementation: Added
ernie4.pywhich implements the Ernie4.5 model architecture, includingErnie4_5_ForCausalLMandErnie4_5_MoeForCausalLMclasses. - Attention Layer: Implemented
Ernie4Attentionmodule with support for rotary embeddings and radix attention. - MLP Layer: Implemented
Ernie4MLPmodule with support forSiluAndMulactivation. - MoE Layer: Implemented
Ernie4Moemodule for Mixture of Experts, including a custom gate and shared experts. - Decoder Layer: Implemented
Ernie4DecoderLayerintegrating attention, MLP/MoE, and normalization. - Model Class: Implemented
Ernie4Modelintegrating embedding, decoder layers, and final normalization. - Weight Loading: Added custom weight loading logic for both causal LM and MoE variants, including handling of stacked parameters and expert parameters.
Using Gemini Code Assist
The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.
Invoking Gemini
You can request assistance from Gemini at any point in your pull request via creating an issue comment (i.e. comment on the pull request page) using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands.
| Feature | Command | Description |
|---|---|---|
| Code Review | /gemini review |
Performs a code review for the current pull request in its current state. |
| Pull Request Summary | /gemini summary |
Provides a summary of the current pull request in its current state. |
| Comment | @gemini-code-assist | Responds in comments when explicitly tagged, both in issue comments and review comments. |
| Help | /gemini help |
Displays a list of available commands. |
Customization
To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.
Limitations & Feedback
Gemini Code Assist is currently in preview and may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments to provide feedback.
You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.
Footnotes
-
Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request adds support for ERNIE-4.5 models by introducing a new model file. The implementation is comprehensive, covering both standard and Mixture-of-Experts (MoE) variants.
I've identified a critical issue in the MoE implementation that could lead to an AttributeError when a model has no shared experts. Additionally, there are a few high-severity issues in the weight loading logic that could cause KeyError exceptions or incorrect behavior. I've provided specific suggestions to fix these problems. Once these are addressed, the code should be much more robust.
|
GSM8K (200) Benchmark (using sglang/benchmark/gsm8k/bench_sglang.py)
|
|
MMLU Benchmark (using sglang/benchmark/mmlu/bench_sglang.py)
# ERNIE-4.5-21B-A3B-PT
$ python3 bench_sglang.py --nsub 10
subject: abstract_algebra, #q:100, acc: 0.540
subject: anatomy, #q:135, acc: 0.741
subject: astronomy, #q:152, acc: 0.875
subject: business_ethics, #q:100, acc: 0.730
subject: clinical_knowledge, #q:265, acc: 0.842
subject: college_biology, #q:144, acc: 0.931
subject: college_chemistry, #q:100, acc: 0.410
subject: college_computer_science, #q:100, acc: 0.710
subject: college_mathematics, #q:100, acc: 0.540
subject: college_medicine, #q:173, acc: 0.757
Total latency: 47.252
Average accuracy: 0.741
# ERNIE-4.5-21B-A3B-PT (with MTP)
$ python3 bench_sglang.py --nsub 10
subject: abstract_algebra, #q:100, acc: 0.520
subject: anatomy, #q:135, acc: 0.726
subject: astronomy, #q:152, acc: 0.868
subject: business_ethics, #q:100, acc: 0.730
subject: clinical_knowledge, #q:265, acc: 0.838
subject: college_biology, #q:144, acc: 0.931
subject: college_chemistry, #q:100, acc: 0.420
subject: college_computer_science, #q:100, acc: 0.710
subject: college_mathematics, #q:100, acc: 0.530
subject: college_medicine, #q:173, acc: 0.775
Total latency: 40.049
Average accuracy: 0.738
# ERNIE-4.5-0.3B-PT
$ python3 bench_sglang.py --nsub 10
subject: abstract_algebra, #q:100, acc: 0.000
subject: anatomy, #q:135, acc: 0.000
subject: astronomy, #q:152, acc: 0.000
subject: business_ethics, #q:100, acc: 0.000
subject: clinical_knowledge, #q:265, acc: 0.000
subject: college_biology, #q:144, acc: 0.000
subject: college_chemistry, #q:100, acc: 0.000
subject: college_computer_science, #q:100, acc: 0.000
subject: college_mathematics, #q:100, acc: 0.000
subject: college_medicine, #q:173, acc: 0.000
Total latency: 16.330
Average accuracy: 0.000 |
|
After above changes, MTP acceptance rate improved to 50-90%. Accuracy of A3B also saw a minor uplift, which aligns with ERNIE's claimed accuracy.
|
|
As of today (July 3, 2025), the weight shape for Hope Baidu will fix this weight bug in future updates. diff --git a/python/sglang/srt/models/ernie4_mtp.py b/python/sglang/srt/models/ernie4_mtp.py
index 7fdbb51e..a2cbc121 100644
--- a/python/sglang/srt/models/ernie4_mtp.py
+++ b/python/sglang/srt/models/ernie4_mtp.py
@@ -175,6 +175,8 @@ class Ernie4_5_MoeForCausalLMMTP(nn.Module):
weight_loader = getattr(
param, "weight_loader", default_weight_loader
)
+ if name.startswith("model.mtp_linear_proj"):
+ loaded_weight = loaded_weight.transpose(0, 1)
weight_loader(param, loaded_weight)
else:
raise KeyError(f"Parameter '{name}' not found in MTP model.") |
|
Hi @Hanrui-Wang, would you mind taking a look at this when you have some time? Thanks in advance! |
Hi, I will review this PR. |
Are there any problem for 0.3B model 🤔, It looks accuracy is 0 |
@yinfan98 No. I tested it with FastDeploy and VLLM. They all report 0 with the 0.3B model. The MMLU benchmark requires the response to be 1 token among A, B, C, or D. If the test script's instructions are not followed properly—for example, if the answer is not provided in the first token—the results will be very poor. Therefore, the results are highly dependent on the instructions in the test script. |
|
@yinfan98 Could you please let me know if there’s anything else you’d like me to address? |
|
Great to see this PR. Is support for the ERNIE 4.5 VL models also planned? The model performs very well, so it would be useful. |
Sorry, this PR does not plan to support the ERNIE 4.5 VL model. |
|
@zhyncs Hi, just wondering if it would be possible to reassign a reviewer, as there hasn't been any feedback for a while. Thanks! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for bringing ernie4 to SGLang. I left some quick comments.
Motivation
Modifications
Benchmark
GSM8K (200) Benchmark (using sglang/benchmark/gsm8k/bench_sglang.py)
Checklist