Skip to content

Conversation

@solrex
Copy link
Contributor

@solrex solrex commented Jun 30, 2025

Motivation

Modifications

  • Add ernie4.py for ERNIE-4.5 (Ernie4_5_ForCausalLM and Ernie4_5_MoeForCausalLM)

Benchmark

GSM8K (200) Benchmark (using sglang/benchmark/gsm8k/bench_sglang.py)

Model GSM8K (200)
ERNIE-4.5-21B-A3B-PT 92.5%
ERNIE-4.5-21B-A3B-PT (with MTP) 93.5%
ERNIE-4.5-0.3B-PT 10.5%

Checklist

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Summary of Changes

Hello @solrex, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request introduces the Ernie4.5 model implementation, including both standard and Mixture of Experts (MoE) variants. It provides the necessary modules and weight loading mechanisms to support inference with ERNIE-4.5 models. The author has marked this PR as a preview and states that it needs more testing before merging.

Highlights

  • Model Implementation: Added ernie4.py which implements the Ernie4.5 model architecture, including Ernie4_5_ForCausalLM and Ernie4_5_MoeForCausalLM classes.
  • Attention Layer: Implemented Ernie4Attention module with support for rotary embeddings and radix attention.
  • MLP Layer: Implemented Ernie4MLP module with support for SiluAndMul activation.
  • MoE Layer: Implemented Ernie4Moe module for Mixture of Experts, including a custom gate and shared experts.
  • Decoder Layer: Implemented Ernie4DecoderLayer integrating attention, MLP/MoE, and normalization.
  • Model Class: Implemented Ernie4Model integrating embedding, decoder layers, and final normalization.
  • Weight Loading: Added custom weight loading logic for both causal LM and MoE variants, including handling of stacked parameters and expert parameters.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point in your pull request via creating an issue comment (i.e. comment on the pull request page) using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in issue comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist is currently in preview and may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments to provide feedback.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request adds support for ERNIE-4.5 models by introducing a new model file. The implementation is comprehensive, covering both standard and Mixture-of-Experts (MoE) variants.

I've identified a critical issue in the MoE implementation that could lead to an AttributeError when a model has no shared experts. Additionally, there are a few high-severity issues in the weight loading logic that could cause KeyError exceptions or incorrect behavior. I've provided specific suggestions to fix these problems. Once these are addressed, the code should be much more robust.

@solrex
Copy link
Contributor Author

solrex commented Jul 1, 2025

GSM8K (200) Benchmark (using sglang/benchmark/gsm8k/bench_sglang.py)

Model GSM8K (200)
ERNIE-4.5-21B-A3B-PT 92.5%
ERNIE-4.5-21B-A3B-PT (with MTP) 93.5%
ERNIE-4.5-0.3B-PT 10.5%

@solrex
Copy link
Contributor Author

solrex commented Jul 1, 2025

MMLU Benchmark (using sglang/benchmark/mmlu/bench_sglang.py)

Model MMLU
ERNIE-4.5-21B-A3B-PT 74.1%
ERNIE-4.5-21B-A3B-PT (with MTP) 73.8%
ERNIE-4.5-0.3B-PT 0%
# ERNIE-4.5-21B-A3B-PT
$ python3 bench_sglang.py --nsub 10
subject: abstract_algebra, #q:100, acc: 0.540
subject: anatomy, #q:135, acc: 0.741
subject: astronomy, #q:152, acc: 0.875
subject: business_ethics, #q:100, acc: 0.730
subject: clinical_knowledge, #q:265, acc: 0.842
subject: college_biology, #q:144, acc: 0.931
subject: college_chemistry, #q:100, acc: 0.410
subject: college_computer_science, #q:100, acc: 0.710
subject: college_mathematics, #q:100, acc: 0.540
subject: college_medicine, #q:173, acc: 0.757
Total latency: 47.252
Average accuracy: 0.741

# ERNIE-4.5-21B-A3B-PT (with MTP)
$ python3 bench_sglang.py --nsub 10
subject: abstract_algebra, #q:100, acc: 0.520
subject: anatomy, #q:135, acc: 0.726
subject: astronomy, #q:152, acc: 0.868
subject: business_ethics, #q:100, acc: 0.730
subject: clinical_knowledge, #q:265, acc: 0.838
subject: college_biology, #q:144, acc: 0.931
subject: college_chemistry, #q:100, acc: 0.420
subject: college_computer_science, #q:100, acc: 0.710
subject: college_mathematics, #q:100, acc: 0.530
subject: college_medicine, #q:173, acc: 0.775
Total latency: 40.049
Average accuracy: 0.738

# ERNIE-4.5-0.3B-PT 
$ python3 bench_sglang.py --nsub 10
subject: abstract_algebra, #q:100, acc: 0.000
subject: anatomy, #q:135, acc: 0.000
subject: astronomy, #q:152, acc: 0.000
subject: business_ethics, #q:100, acc: 0.000
subject: clinical_knowledge, #q:265, acc: 0.000
subject: college_biology, #q:144, acc: 0.000
subject: college_chemistry, #q:100, acc: 0.000
subject: college_computer_science, #q:100, acc: 0.000
subject: college_mathematics, #q:100, acc: 0.000
subject: college_medicine, #q:173, acc: 0.000
Total latency: 16.330
Average accuracy: 0.000

@solrex
Copy link
Contributor Author

solrex commented Jul 2, 2025

After above changes, MTP acceptance rate improved to 50-90%. Accuracy of A3B also saw a minor uplift, which aligns with ERNIE's claimed accuracy.

Model MMLU
ERNIE-4.5-21B-A3B-PT 76.7%
ERNIE-4.5-21B-A3B-PT (with MTP) 76.8%

@solrex
Copy link
Contributor Author

solrex commented Jul 3, 2025

As of today (July 3, 2025), the weight shape for model.mtp_linear_proj.0.weight in the baidu/ERNIE-4.5-300B-A47B-PT model is incorrect. Therefore, when loading A47B MTP layer, you should transpose it with patch code bellow, otherwise, MTP loading will fail. (This issue did not occur on the A3B model.)

Hope Baidu will fix this weight bug in future updates.

diff --git a/python/sglang/srt/models/ernie4_mtp.py b/python/sglang/srt/models/ernie4_mtp.py
index 7fdbb51e..a2cbc121 100644
--- a/python/sglang/srt/models/ernie4_mtp.py
+++ b/python/sglang/srt/models/ernie4_mtp.py
@@ -175,6 +175,8 @@ class Ernie4_5_MoeForCausalLMMTP(nn.Module):
                     weight_loader = getattr(
                         param, "weight_loader", default_weight_loader
                     )
+                    if name.startswith("model.mtp_linear_proj"):
+                        loaded_weight = loaded_weight.transpose(0, 1)
                     weight_loader(param, loaded_weight)
                 else:
                     raise KeyError(f"Parameter '{name}' not found in MTP model.")

@solrex
Copy link
Contributor Author

solrex commented Jul 4, 2025

Hi @Hanrui-Wang, would you mind taking a look at this when you have some time? Thanks in advance!

@FlamingoPg
Copy link
Collaborator

FlamingoPg commented Jul 4, 2025

Hi @Hanrui-Wang, would you mind taking a look at this when you have some time? Thanks in advance!

Hi, I will review this PR.

@FlamingoPg
Copy link
Collaborator

MMLU Benchmark (using sglang/benchmark/mmlu/bench_sglang.py)

Model MMLU
ERNIE-4.5-21B-A3B-PT 74.1%
ERNIE-4.5-21B-A3B-PT (with MTP) 73.8%
ERNIE-4.5-0.3B-PT 0%

# ERNIE-4.5-21B-A3B-PT
$ python3 bench_sglang.py --nsub 10
subject: abstract_algebra, #q:100, acc: 0.540
subject: anatomy, #q:135, acc: 0.741
subject: astronomy, #q:152, acc: 0.875
subject: business_ethics, #q:100, acc: 0.730
subject: clinical_knowledge, #q:265, acc: 0.842
subject: college_biology, #q:144, acc: 0.931
subject: college_chemistry, #q:100, acc: 0.410
subject: college_computer_science, #q:100, acc: 0.710
subject: college_mathematics, #q:100, acc: 0.540
subject: college_medicine, #q:173, acc: 0.757
Total latency: 47.252
Average accuracy: 0.741

# ERNIE-4.5-21B-A3B-PT (with MTP)
$ python3 bench_sglang.py --nsub 10
subject: abstract_algebra, #q:100, acc: 0.520
subject: anatomy, #q:135, acc: 0.726
subject: astronomy, #q:152, acc: 0.868
subject: business_ethics, #q:100, acc: 0.730
subject: clinical_knowledge, #q:265, acc: 0.838
subject: college_biology, #q:144, acc: 0.931
subject: college_chemistry, #q:100, acc: 0.420
subject: college_computer_science, #q:100, acc: 0.710
subject: college_mathematics, #q:100, acc: 0.530
subject: college_medicine, #q:173, acc: 0.775
Total latency: 40.049
Average accuracy: 0.738

# ERNIE-4.5-0.3B-PT 
$ python3 bench_sglang.py --nsub 10
subject: abstract_algebra, #q:100, acc: 0.000
subject: anatomy, #q:135, acc: 0.000
subject: astronomy, #q:152, acc: 0.000
subject: business_ethics, #q:100, acc: 0.000
subject: clinical_knowledge, #q:265, acc: 0.000
subject: college_biology, #q:144, acc: 0.000
subject: college_chemistry, #q:100, acc: 0.000
subject: college_computer_science, #q:100, acc: 0.000
subject: college_mathematics, #q:100, acc: 0.000
subject: college_medicine, #q:173, acc: 0.000
Total latency: 16.330
Average accuracy: 0.000

Are there any problem for 0.3B model 🤔, It looks accuracy is 0

@solrex
Copy link
Contributor Author

solrex commented Jul 5, 2025

MMLU Benchmark (using sglang/benchmark/mmlu/bench_sglang.py)
Model MMLU
ERNIE-4.5-21B-A3B-PT 74.1%
ERNIE-4.5-21B-A3B-PT (with MTP) 73.8%
ERNIE-4.5-0.3B-PT 0%

# ERNIE-4.5-21B-A3B-PT
$ python3 bench_sglang.py --nsub 10
subject: abstract_algebra, #q:100, acc: 0.540
subject: anatomy, #q:135, acc: 0.741
subject: astronomy, #q:152, acc: 0.875
subject: business_ethics, #q:100, acc: 0.730
subject: clinical_knowledge, #q:265, acc: 0.842
subject: college_biology, #q:144, acc: 0.931
subject: college_chemistry, #q:100, acc: 0.410
subject: college_computer_science, #q:100, acc: 0.710
subject: college_mathematics, #q:100, acc: 0.540
subject: college_medicine, #q:173, acc: 0.757
Total latency: 47.252
Average accuracy: 0.741

# ERNIE-4.5-21B-A3B-PT (with MTP)
$ python3 bench_sglang.py --nsub 10
subject: abstract_algebra, #q:100, acc: 0.520
subject: anatomy, #q:135, acc: 0.726
subject: astronomy, #q:152, acc: 0.868
subject: business_ethics, #q:100, acc: 0.730
subject: clinical_knowledge, #q:265, acc: 0.838
subject: college_biology, #q:144, acc: 0.931
subject: college_chemistry, #q:100, acc: 0.420
subject: college_computer_science, #q:100, acc: 0.710
subject: college_mathematics, #q:100, acc: 0.530
subject: college_medicine, #q:173, acc: 0.775
Total latency: 40.049
Average accuracy: 0.738

# ERNIE-4.5-0.3B-PT 
$ python3 bench_sglang.py --nsub 10
subject: abstract_algebra, #q:100, acc: 0.000
subject: anatomy, #q:135, acc: 0.000
subject: astronomy, #q:152, acc: 0.000
subject: business_ethics, #q:100, acc: 0.000
subject: clinical_knowledge, #q:265, acc: 0.000
subject: college_biology, #q:144, acc: 0.000
subject: college_chemistry, #q:100, acc: 0.000
subject: college_computer_science, #q:100, acc: 0.000
subject: college_mathematics, #q:100, acc: 0.000
subject: college_medicine, #q:173, acc: 0.000
Total latency: 16.330
Average accuracy: 0.000

Are there any problem for 0.3B model 🤔, It looks accuracy is 0

@yinfan98 No. I tested it with FastDeploy and VLLM. They all report 0 with the 0.3B model. The MMLU benchmark requires the response to be 1 token among A, B, C, or D. If the test script's instructions are not followed properly—for example, if the answer is not provided in the first token—the results will be very poor. Therefore, the results are highly dependent on the instructions in the test script.

@solrex
Copy link
Contributor Author

solrex commented Jul 10, 2025

@yinfan98 Could you please let me know if there’s anything else you’d like me to address?

@finetunej
Copy link
Contributor

finetunej commented Jul 21, 2025

Great to see this PR. Is support for the ERNIE 4.5 VL models also planned? The model performs very well, so it would be useful.

@solrex
Copy link
Contributor Author

solrex commented Jul 21, 2025

Great to see this PR. Is support for the ERNIE 4.5 VL models also planned? The model performs very well, so it would be useful.

Sorry, this PR does not plan to support the ERNIE 4.5 VL model.

@solrex
Copy link
Contributor Author

solrex commented Jul 29, 2025

@zhyncs Hi, just wondering if it would be possible to reassign a reviewer, as there hasn't been any feedback for a while. Thanks!

Copy link
Collaborator

@ch-wan ch-wan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for bringing ernie4 to SGLang. I left some quick comments.

@ch-wan ch-wan merged commit 1132547 into sgl-project:main Aug 8, 2025
4 of 56 checks passed
narutolhy pushed a commit to narutolhy/sglang that referenced this pull request Aug 17, 2025
@solrex solrex deleted the ERNIE-4.5 branch August 21, 2025 01:11
MahmoudAshraf97 pushed a commit to MahmoudAshraf97/sglang that referenced this pull request Sep 8, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Feature] Ernie4.5 and Ernie4.5MoE Model Support

6 participants