Add ernie4.py for ERNIE-4.5 #7657

solrex · 2025-06-30T11:42:44Z

Motivation

To close: [Feature] Ernie4.5 and Ernie4.5MoE Model Support #7668

Support ERNIE-4.5 models

Modifications

Add ernie4.py for ERNIE-4.5 (Ernie4_5_ForCausalLM and Ernie4_5_MoeForCausalLM)

Benchmark

GSM8K (200) Benchmark (using sglang/benchmark/gsm8k/bench_sglang.py)

Model	GSM8K (200)
ERNIE-4.5-21B-A3B-PT	92.5%
ERNIE-4.5-21B-A3B-PT (with MTP)	93.5%
ERNIE-4.5-0.3B-PT	10.5%

Checklist

Format your code according to the Code Formatting with Pre-Commit.
Add unit tests as outlined in the Running Unit Tests.
Update documentation / docstrings / example tutorials as needed, according to Writing Documentation.
Provide throughput / latency benchmark results and accuracy evaluation results as needed, according to Benchmark and Profiling and Accuracy Results.
For reviewers: If you haven't made any contributions to this PR and are only assisting with merging the main branch, please remove yourself as a co-author when merging the PR.
Please feel free to join our Slack channel at https://slack.sglang.ai to discuss your PR.

…CausalLM)

gemini-code-assist

Summary of Changes

Hello @solrex, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request introduces the Ernie4.5 model implementation, including both standard and Mixture of Experts (MoE) variants. It provides the necessary modules and weight loading mechanisms to support inference with ERNIE-4.5 models. The author has marked this PR as a preview and states that it needs more testing before merging.

Highlights

Model Implementation: Added ernie4.py which implements the Ernie4.5 model architecture, including Ernie4_5_ForCausalLM and Ernie4_5_MoeForCausalLM classes.
Attention Layer: Implemented Ernie4Attention module with support for rotary embeddings and radix attention.
MLP Layer: Implemented Ernie4MLP module with support for SiluAndMul activation.
MoE Layer: Implemented Ernie4Moe module for Mixture of Experts, including a custom gate and shared experts.
Decoder Layer: Implemented Ernie4DecoderLayer integrating attention, MLP/MoE, and normalization.
Model Class: Implemented Ernie4Model integrating embedding, decoder layers, and final normalization.
Weight Loading: Added custom weight loading logic for both causal LM and MoE variants, including handling of stacked parameters and expert parameters.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point in your pull request via creating an issue comment (i.e. comment on the pull request page) using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in issue comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist is currently in preview and may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments to provide feedback.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request adds support for ERNIE-4.5 models by introducing a new model file. The implementation is comprehensive, covering both standard and Mixture-of-Experts (MoE) variants.

I've identified a critical issue in the MoE implementation that could lead to an AttributeError when a model has no shared experts. Additionally, there are a few high-severity issues in the weight loading logic that could cause KeyError exceptions or incorrect behavior. I've provided specific suggestions to fix these problems. Once these are addressed, the code should be much more robust.

python/sglang/srt/models/ernie4.py

…nterval".

solrex · 2025-07-01T08:27:13Z

GSM8K (200) Benchmark (using sglang/benchmark/gsm8k/bench_sglang.py)

Model	GSM8K (200)
ERNIE-4.5-21B-A3B-PT	92.5%
ERNIE-4.5-21B-A3B-PT (with MTP)	93.5%
ERNIE-4.5-0.3B-PT	10.5%

solrex · 2025-07-01T10:15:18Z

MMLU Benchmark (using sglang/benchmark/mmlu/bench_sglang.py)

Model	MMLU
ERNIE-4.5-21B-A3B-PT	74.1%
ERNIE-4.5-21B-A3B-PT (with MTP)	73.8%
ERNIE-4.5-0.3B-PT	0%

# ERNIE-4.5-21B-A3B-PT
$ python3 bench_sglang.py --nsub 10
subject: abstract_algebra, #q:100, acc: 0.540
subject: anatomy, #q:135, acc: 0.741
subject: astronomy, #q:152, acc: 0.875
subject: business_ethics, #q:100, acc: 0.730
subject: clinical_knowledge, #q:265, acc: 0.842
subject: college_biology, #q:144, acc: 0.931
subject: college_chemistry, #q:100, acc: 0.410
subject: college_computer_science, #q:100, acc: 0.710
subject: college_mathematics, #q:100, acc: 0.540
subject: college_medicine, #q:173, acc: 0.757
Total latency: 47.252
Average accuracy: 0.741

# ERNIE-4.5-21B-A3B-PT (with MTP)
$ python3 bench_sglang.py --nsub 10
subject: abstract_algebra, #q:100, acc: 0.520
subject: anatomy, #q:135, acc: 0.726
subject: astronomy, #q:152, acc: 0.868
subject: business_ethics, #q:100, acc: 0.730
subject: clinical_knowledge, #q:265, acc: 0.838
subject: college_biology, #q:144, acc: 0.931
subject: college_chemistry, #q:100, acc: 0.420
subject: college_computer_science, #q:100, acc: 0.710
subject: college_mathematics, #q:100, acc: 0.530
subject: college_medicine, #q:173, acc: 0.775
Total latency: 40.049
Average accuracy: 0.738

# ERNIE-4.5-0.3B-PT 
$ python3 bench_sglang.py --nsub 10
subject: abstract_algebra, #q:100, acc: 0.000
subject: anatomy, #q:135, acc: 0.000
subject: astronomy, #q:152, acc: 0.000
subject: business_ethics, #q:100, acc: 0.000
subject: clinical_knowledge, #q:265, acc: 0.000
subject: college_biology, #q:144, acc: 0.000
subject: college_chemistry, #q:100, acc: 0.000
subject: college_computer_science, #q:100, acc: 0.000
subject: college_mathematics, #q:100, acc: 0.000
subject: college_medicine, #q:173, acc: 0.000
Total latency: 16.330
Average accuracy: 0.000

solrex · 2025-07-02T07:46:06Z

After above changes, MTP acceptance rate improved to 50-90%. Accuracy of A3B also saw a minor uplift, which aligns with ERNIE's claimed accuracy.

Model	MMLU
ERNIE-4.5-21B-A3B-PT	76.7%
ERNIE-4.5-21B-A3B-PT (with MTP)	76.8%

solrex · 2025-07-03T06:33:03Z

As of today (July 3, 2025), the weight shape for model.mtp_linear_proj.0.weight in the baidu/ERNIE-4.5-300B-A47B-PT model is incorrect. Therefore, when loading A47B MTP layer, you should transpose it with patch code bellow, otherwise, MTP loading will fail. (This issue did not occur on the A3B model.)

Hope Baidu will fix this weight bug in future updates.

diff --git a/python/sglang/srt/models/ernie4_mtp.py b/python/sglang/srt/models/ernie4_mtp.py
index 7fdbb51e..a2cbc121 100644
--- a/python/sglang/srt/models/ernie4_mtp.py
+++ b/python/sglang/srt/models/ernie4_mtp.py
@@ -175,6 +175,8 @@ class Ernie4_5_MoeForCausalLMMTP(nn.Module):
                     weight_loader = getattr(
                         param, "weight_loader", default_weight_loader
                     )
+                    if name.startswith("model.mtp_linear_proj"):
+                        loaded_weight = loaded_weight.transpose(0, 1)
                     weight_loader(param, loaded_weight)
                 else:
                     raise KeyError(f"Parameter '{name}' not found in MTP model.")

solrex · 2025-07-04T11:38:09Z

Hi @Hanrui-Wang, would you mind taking a look at this when you have some time? Thanks in advance!

FlamingoPg · 2025-07-04T12:46:42Z

Hi @Hanrui-Wang, would you mind taking a look at this when you have some time? Thanks in advance!

Hi, I will review this PR.

FlamingoPg · 2025-07-04T13:02:59Z

MMLU Benchmark (using sglang/benchmark/mmlu/bench_sglang.py)

Model MMLU
ERNIE-4.5-21B-A3B-PT 74.1%
ERNIE-4.5-21B-A3B-PT (with MTP) 73.8%
ERNIE-4.5-0.3B-PT 0%

# ERNIE-4.5-21B-A3B-PT
$ python3 bench_sglang.py --nsub 10
subject: abstract_algebra, #q:100, acc: 0.540
subject: anatomy, #q:135, acc: 0.741
subject: astronomy, #q:152, acc: 0.875
subject: business_ethics, #q:100, acc: 0.730
subject: clinical_knowledge, #q:265, acc: 0.842
subject: college_biology, #q:144, acc: 0.931
subject: college_chemistry, #q:100, acc: 0.410
subject: college_computer_science, #q:100, acc: 0.710
subject: college_mathematics, #q:100, acc: 0.540
subject: college_medicine, #q:173, acc: 0.757
Total latency: 47.252
Average accuracy: 0.741

# ERNIE-4.5-21B-A3B-PT (with MTP)
$ python3 bench_sglang.py --nsub 10
subject: abstract_algebra, #q:100, acc: 0.520
subject: anatomy, #q:135, acc: 0.726
subject: astronomy, #q:152, acc: 0.868
subject: business_ethics, #q:100, acc: 0.730
subject: clinical_knowledge, #q:265, acc: 0.838
subject: college_biology, #q:144, acc: 0.931
subject: college_chemistry, #q:100, acc: 0.420
subject: college_computer_science, #q:100, acc: 0.710
subject: college_mathematics, #q:100, acc: 0.530
subject: college_medicine, #q:173, acc: 0.775
Total latency: 40.049
Average accuracy: 0.738

# ERNIE-4.5-0.3B-PT 
$ python3 bench_sglang.py --nsub 10
subject: abstract_algebra, #q:100, acc: 0.000
subject: anatomy, #q:135, acc: 0.000
subject: astronomy, #q:152, acc: 0.000
subject: business_ethics, #q:100, acc: 0.000
subject: clinical_knowledge, #q:265, acc: 0.000
subject: college_biology, #q:144, acc: 0.000
subject: college_chemistry, #q:100, acc: 0.000
subject: college_computer_science, #q:100, acc: 0.000
subject: college_mathematics, #q:100, acc: 0.000
subject: college_medicine, #q:173, acc: 0.000
Total latency: 16.330
Average accuracy: 0.000

Are there any problem for 0.3B model 🤔, It looks accuracy is 0

solrex · 2025-07-05T00:26:23Z

MMLU Benchmark (using sglang/benchmark/mmlu/bench_sglang.py)
Model MMLU
ERNIE-4.5-21B-A3B-PT 74.1%
ERNIE-4.5-21B-A3B-PT (with MTP) 73.8%
ERNIE-4.5-0.3B-PT 0%

# ERNIE-4.5-21B-A3B-PT
$ python3 bench_sglang.py --nsub 10
subject: abstract_algebra, #q:100, acc: 0.540
subject: anatomy, #q:135, acc: 0.741
subject: astronomy, #q:152, acc: 0.875
subject: business_ethics, #q:100, acc: 0.730
subject: clinical_knowledge, #q:265, acc: 0.842
subject: college_biology, #q:144, acc: 0.931
subject: college_chemistry, #q:100, acc: 0.410
subject: college_computer_science, #q:100, acc: 0.710
subject: college_mathematics, #q:100, acc: 0.540
subject: college_medicine, #q:173, acc: 0.757
Total latency: 47.252
Average accuracy: 0.741

# ERNIE-4.5-21B-A3B-PT (with MTP)
$ python3 bench_sglang.py --nsub 10
subject: abstract_algebra, #q:100, acc: 0.520
subject: anatomy, #q:135, acc: 0.726
subject: astronomy, #q:152, acc: 0.868
subject: business_ethics, #q:100, acc: 0.730
subject: clinical_knowledge, #q:265, acc: 0.838
subject: college_biology, #q:144, acc: 0.931
subject: college_chemistry, #q:100, acc: 0.420
subject: college_computer_science, #q:100, acc: 0.710
subject: college_mathematics, #q:100, acc: 0.530
subject: college_medicine, #q:173, acc: 0.775
Total latency: 40.049
Average accuracy: 0.738

# ERNIE-4.5-0.3B-PT 
$ python3 bench_sglang.py --nsub 10
subject: abstract_algebra, #q:100, acc: 0.000
subject: anatomy, #q:135, acc: 0.000
subject: astronomy, #q:152, acc: 0.000
subject: business_ethics, #q:100, acc: 0.000
subject: clinical_knowledge, #q:265, acc: 0.000
subject: college_biology, #q:144, acc: 0.000
subject: college_chemistry, #q:100, acc: 0.000
subject: college_computer_science, #q:100, acc: 0.000
subject: college_mathematics, #q:100, acc: 0.000
subject: college_medicine, #q:173, acc: 0.000
Total latency: 16.330
Average accuracy: 0.000

Are there any problem for 0.3B model 🤔, It looks accuracy is 0

@yinfan98 No. I tested it with FastDeploy and VLLM. They all report 0 with the 0.3B model. The MMLU benchmark requires the response to be 1 token among A, B, C, or D. If the test script's instructions are not followed properly—for example, if the answer is not provided in the first token—the results will be very poor. Therefore, the results are highly dependent on the instructions in the test script.

solrex · 2025-07-10T01:14:09Z

@yinfan98 Could you please let me know if there’s anything else you’d like me to address?

finetunej · 2025-07-21T10:15:43Z

Great to see this PR. Is support for the ERNIE 4.5 VL models also planned? The model performs very well, so it would be useful.

solrex · 2025-07-21T10:20:18Z

Great to see this PR. Is support for the ERNIE 4.5 VL models also planned? The model performs very well, so it would be useful.

Sorry, this PR does not plan to support the ERNIE 4.5 VL model.

solrex · 2025-07-29T04:04:29Z

@zhyncs Hi, just wondering if it would be possible to reassign a reviewer, as there hasn't been any feedback for a while. Thanks!

ch-wan

Thank you for bringing ernie4 to SGLang. I left some quick comments.

python/sglang/srt/models/ernie4.py

python/sglang/srt/models/ernie4_mtp.py

python/sglang/srt/models/ernie4.py

Add ernie4.py for ERNIE-4.5 (Ernie4_5_ForCausalLM and Ernie4_5_MoeFor…

cd98ef9

…CausalLM)

solrex requested review from ByronHsu, Ying1123, hnyls2002, ispobock, merrymercy, zhaochenyang20 and zhyncs as code owners June 30, 2025 11:42

gemini-code-assist bot reviewed Jun 30, 2025

View reviewed changes

python/sglang/srt/models/ernie4.py Show resolved Hide resolved

python/sglang/srt/models/ernie4.py Outdated Show resolved Hide resolved

python/sglang/srt/models/ernie4.py Show resolved Hide resolved

python/sglang/srt/models/ernie4.py Show resolved Hide resolved

EliEron mentioned this pull request Jun 30, 2025

Feature Request: Add Ernie4.5MoE support ggml-org/llama.cpp#14465

Closed

4 tasks

solrex added 4 commits July 1, 2025 09:53

Load skipped param "e_score_correction_bias", add config "moe_layer_i…

81c8248

…nterval".

Fix CR comments.

fe57c76

Make moe_num_shared_experts optional and default to 0.

819fb68

Support MTP.

09be223

Minor changes.

3adc81e

solrex force-pushed the ERNIE-4.5 branch from 3eab62a to 3adc81e Compare July 1, 2025 08:55

solrex added 3 commits July 2, 2025 10:48

Add 'moe_layer_start/end_index' for future VL model compatibility.

8793da6

Use LlamaAttention for simplicity.

5cc4c1f

Use DeepseekV2MLP for simplicity, improve the MTP modeling code.

3d60377

zhyncs assigned Hanrui-Wang Jul 2, 2025

zhyncs added the high priority label Jul 2, 2025

Fail on MTP layer not found.

627327f

solrex force-pushed the ERNIE-4.5 branch from 4f2d150 to 627327f Compare July 3, 2025 06:22

Rollback configs for VL model compatibility.

3efe9bf

zhyncs assigned FlamingoPg Jul 5, 2025

Merge branch 'main' into ERNIE-4.5

ab96e22

solrex added 4 commits July 22, 2025 15:19

Adapt to FusedMoE API change.

7f5f342

Adapt for Ernie-4.5 and transformers Jul. 22 update

624507e

Merge remote-tracking branch 'upstream/main' into ERNIE-4.5

e6032cd

Remove try blocks because sglang updated transformers to 4.54.0.

e3a39a0

Merge remote-tracking branch 'upstream/main' into ERNIE-4.5

2b78d02

ch-wan reviewed Aug 1, 2025

View reviewed changes

python/sglang/srt/models/ernie4.py Outdated Show resolved Hide resolved

python/sglang/srt/models/ernie4_mtp.py Show resolved Hide resolved

ch-wan added the new-model label Aug 1, 2025

solrex added 2 commits August 2, 2025 16:49

Merge remote-tracking branch 'upstream/main' into ERNIE-4.5

05d128f

Update based on CR feedback.

20c3cd8

ch-wan reviewed Aug 8, 2025

View reviewed changes

python/sglang/srt/models/ernie4.py Outdated Show resolved Hide resolved

Remove tp_size warning.

86a0208

ch-wan merged commit 1132547 into sgl-project:main Aug 8, 2025
4 of 56 checks passed

narutolhy pushed a commit to narutolhy/sglang that referenced this pull request Aug 17, 2025

Add ernie4.py for ERNIE-4.5 (sgl-project#7657)

8a3065f

solrex deleted the ERNIE-4.5 branch August 21, 2025 01:11

MahmoudAshraf97 pushed a commit to MahmoudAshraf97/sglang that referenced this pull request Sep 8, 2025

Add ernie4.py for ERNIE-4.5 (sgl-project#7657)

4900c5e

Uh oh!

Add ernie4.py for ERNIE-4.5 #7657

Add ernie4.py for ERNIE-4.5 #7657

Conversation

solrex commented Jun 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation

Modifications

Benchmark

Checklist

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

solrex commented Jul 1, 2025

Uh oh!

solrex commented Jul 1, 2025

Uh oh!

solrex commented Jul 2, 2025

Uh oh!

solrex commented Jul 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

solrex commented Jul 4, 2025

Uh oh!

FlamingoPg commented Jul 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

FlamingoPg commented Jul 4, 2025

Uh oh!

solrex commented Jul 5, 2025

Uh oh!

solrex commented Jul 10, 2025

Uh oh!

finetunej commented Jul 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

solrex commented Jul 21, 2025

Uh oh!

solrex commented Jul 29, 2025

Uh oh!

ch-wan left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

solrex commented Jun 30, 2025 •

edited

Loading

solrex commented Jul 3, 2025 •

edited

Loading

FlamingoPg commented Jul 4, 2025 •

edited

Loading

finetunej commented Jul 21, 2025 •

edited

Loading