[EPLB] Support ernie4.5-moe #22100

HsChen-sys · 2025-08-01T16:20:26Z

Essential Elements of an Effective PR Description Checklist

The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
The test plan, such as providing test command.
The test results, such as pasting the results comparison before and after, or e2e results
(Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.

Purpose

#20468 Enable EPLB on ERNIE-4.5-MoE

Test Plan

Running the following test script
python test_eplb.py --mode eplb
python test_eplb.py --mode normal

import json
import os
import argparse
from vllm import LLM, SamplingParams

prompt = "Explain the theory of relativity in simple terms."

RESULT_FILE = "eplb_test_output.json"

sampling_params = SamplingParams(
    temperature=0.0,
    top_p=1.0,
    top_k=1,
    max_tokens=4096
)

def run_inference(model_path: str, enable_eplb: bool, num_redundant_experts: int = 0):
    print(f"Running inference with EPLB={enable_eplb}, redundant experts={num_redundant_experts}")
    
    llm = LLM(
        model=model_path,
        tensor_parallel_size=2,
        enable_expert_parallel=True,
        enable_eplb=enable_eplb,
        num_redundant_experts=num_redundant_experts if enable_eplb else 0,
        eplb_window_size=1000,
        eplb_step_interval=100,
        enforce_eager=True,
        trust_remote_code=True
    )
    
    result = llm.generate([prompt], sampling_params)
    output_text = result[0].outputs[0].text.strip()
    
    print("Output:")
    print(output_text)
    print("-" * 50)

    return output_text

def save_result(key: str, value: list):
    if os.path.exists(RESULT_FILE):
        with open(RESULT_FILE, "r") as f:
            results = json.load(f)
    else:
        results = {}

    results[key] = value

    with open(RESULT_FILE, "w") as f:
        json.dump(results, f, indent=2)

    print(f"Output saved to {RESULT_FILE}")

def load_results():
    if os.path.exists(RESULT_FILE):
        with open(RESULT_FILE, "r") as f:
            return json.load(f)
    return {}

if __name__ == "__main__":
    parser = argparse.ArgumentParser()
    parser.add_argument("--mode", type=str, choices=["eplb", "normal", "compare"], required=True)
    args = parser.parse_args()

    MODEL_PATH = "baidu/ERNIE-4.5-21B-A3B-PT"

    if args.mode == "eplb":
        outputs = run_inference(MODEL_PATH, enable_eplb=True, num_redundant_experts=32)
        save_result("eplb", outputs)
    elif args.mode == "normal":
        outputs = run_inference(MODEL_PATH, enable_eplb=False)
        save_result("normal", outputs)

Test Result

The implementation looks correct. The EPLB output begins to divert from the original output in the middle of generation.

{
  "eplb": "The **theory of relativity**, developed by Albert Einstein in the early 20th century, has two main parts: **special relativity** and **general relativity**. Here's a simple breakdown:\n\n### **1. Special Relativity (1905)**\n- **Key Idea**: Time and space are not absolute\u2014they depend on how you're moving.\n- **Key Concepts**:\n  - **Time Dilation**: Moving clocks run slower. For example, if you're on a fast train, your watch might tick slightly slower than someone standing still.\n  - **Length Contraction**: Objects appear shorter when moving. A ruler on a train would look shorter to someone watching from the platform.\n  - **Mass-Energy Equivalence**: \"E=mc\u00b2\" means mass and energy are the same thing. A tiny amount of mass can release a huge amount of energy (like in nuclear reactions).\n- **Why It Matters**: It explains why the speed of light (300,000 km/s) is the same for everyone, no matter how fast they're moving.\n\n### **2. General Relativity (1915)**\n- **Key Idea**: Gravity isn't a force\u2014it's the curvature of spacetime caused by mass and energy.\n- **Key Concepts**:\n  - **Spacetime Curvature**: Massive objects like planets and stars warp the fabric of space and time. This curvature tells objects how to move.\n  - **Gravity as Geometry**: Instead of being pulled by a force, objects follow straight paths (geodesics) in curved spacetime. For example, Earth orbits the Sun because spacetime is curved by the Sun's mass.\n- **Why It Matters**: It explains why light bends around massive objects (like stars during eclipses) and predicts black holes.\n\n### **Simple Analogy**\nImagine spacetime as a trampoline:\n- **Special Relativity**: If you jump on the trampoline (move fast), the surface curves more, and things seem to move differently.\n- **General Relativity**: A heavy ball (massive object) on the trampoline creates a deep dent. Smaller balls (objects) roll toward it, just like gravity pulls objects toward massive bodies.\n\n### **Why It Matters Today**\n- **GPS Satellites**: Must account for time dilation (special relativity) because they move fast relative to Earth.\n- **Black Holes & Cosmology**: General relativity predicts phenomena like black holes and the Big Bang.\n- **Quantum Mechanics**: Relativity and quantum physics are still being unified, but they've already revolutionized our understanding of the universe.\n\nIn short, Einstein showed that time, space, and gravity are interconnected, and our perception of reality depends on how we're moving or where we are in the universe.",
  "normal": "The **theory of relativity**, developed by Albert Einstein in the early 20th century, has two main parts: **special relativity** and **general relativity**. Here's a simple breakdown:\n\n### **1. Special Relativity (1905)**\n- **Key Idea**: Time and space are not absolute\u2014they depend on how you're moving.\n- **Key Concepts**:\n  - **Time Dilation**: Moving clocks run slower. For example, if you're on a fast train, your watch might tick slightly slower than someone standing still.\n  - **Length Contraction**: Objects appear shorter when moving. A ruler on a train would look shorter to someone watching from the platform.\n  - **Mass-Energy Equivalence**: \"E=mc\u00b2\" means mass and energy are the same thing. A tiny amount of mass can release a huge amount of energy (like in nuclear reactions).\n- **Why It Matters**: It explains why the speed of light (300,000 km/s) is the same for everyone, no matter how fast they're moving.\n\n### **2. General Relativity (1915)**\n- **Key Idea**: Gravity isn't a force\u2014it's the curvature of spacetime caused by mass and energy.\n- **Key Concepts**:\n  - **Spacetime Curvature**: Massive objects like planets and stars warp the fabric of space and time. This curvature tells objects how to move.\n  - **Gravity as Geometry**: Instead of being pulled by a force, objects follow straight paths (geodesics) in curved spacetime. For example, Earth orbits the Sun because spacetime is curved by the Sun's mass.\n- **Why It Matters**: It explains why light bends around massive objects (like stars during eclipses) and predicts black holes.\n\n### **Simple Analogy**\nImagine spacetime as a trampoline:\n- **Special Relativity**: If you jump on the trampoline (move fast), the fabric curves more, and nearby objects (like a ball) move along the curves.\n- **General Relativity**: A heavy ball (like the Sun) sinks deeply into the trampoline, curving the fabric so that a smaller ball (like Earth) rolls around it in a straight line (but on the curved surface).\n\n### **Why It's Revolutionary**\n- It unified space and time (special relativity) and showed gravity is geometry (general relativity).\n- It predicted phenomena like black holes, gravitational waves, and the expansion of the universe.\n\nIn short, relativity reshaped our understanding of reality, showing that time, space, and gravity are interconnected in ways that defy everyday intuition. \ud83c\udf0c\ud83d\ude80"
}

(Optional) Documentation Update

Signed-off-by: Haisheng Chen <[email protected]>

github-actions · 2025-08-01T16:20:34Z

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

🚀

gemini-code-assist

Code Review

This pull request enables Expert Parallelism Load Balancing (EPLB) for the ERNIE-4.5-MoE model. The changes correctly integrate EPLB configurations and adapt the model to the MixtureOfExperts interface. I've identified a few issues that could lead to runtime errors: one is a potential TypeError from an incorrect default value for moe_num_shared_experts, another is an UnboundLocalError from using a variable before it's guaranteed to be assigned, and two more are potential AttributeError from using attributes before they're properly initialized. I've provided suggestions to fix all issues. Once these are addressed, the changes look solid.

vllm/model_executor/models/ernie45_moe.py

HsChen-sys · 2025-08-01T16:34:52Z

@abmfy This PR enables EPLB on Ernie4.5-moe.

abmfy

Thanks for the contribution!

Overall LGTM, except for the removal of e_score_correction_bias, which could break the weight loading process.

Please fix this, and I’ll verify the accuracy afterward.

abmfy · 2025-08-01T20:10:51Z

vllm/model_executor/models/ernie45_moe.py

-            if "e_score_correction_bias" in name:
-                name = name.replace("moe_statics", "gate")
-                loaded_weight = loaded_weight.squeeze(0)
-


Why was e_score_correction_bias removed from this file? The ERNIE model might contain this parameter. Note that inside Ernie4_5_MoeMoE class this is also removed.

Thanks for catching this! I’ve checked the commit history and saw that e_score_correction_bias was introduced at roughly the same time I began this PR.
My PR from last week failed the DCO check, and rebasing it onto main would have produced some conflicts and extra commits.
To keep the history clean, I instead created a fresh branch off main and merged last week’s work into it.
That merge temporarily dropped e_score_correction_bias here. I’ll add it back in the next commit.

Signed-off-by: Haisheng Chen <[email protected]>

abmfy

LGTM.

Accuracy tests:
w/o EPLB:

Tasks	Version	Filter	n-shot	Metric		Value		Stderr
gsm8k	3	flexible-extract	5	exact_match	↑	0.8203	±	0.0106
		strict-match	5	exact_match	↑	0.7885	±	0.0112

w/ EPLB:

Tasks	Version	Filter	n-shot	Metric		Value		Stderr
gsm8k	3	flexible-extract	5	exact_match	↑	0.8173	±	0.0106
		strict-match	5	exact_match	↑	0.7885	±	0.0112

Thanks for the contribution!

mergify · 2025-09-16T13:43:43Z

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @HsChen-sys.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

Signed-off-by: Haisheng Chen <[email protected]>

abmfy · 2025-09-16T20:52:47Z

@simon-mo Could you please take a look and approve? I’ve already reviewed it and confirmed the accuracy. Thank you!

HsChen-sys · 2025-09-30T22:17:02Z

@abmfy Could you take a look when have time? I'm not sure what is wrong with this PR.

abmfy · 2025-10-01T00:24:20Z

@simon-mo

define example_moe first Signed-off-by: Haisheng Chen <[email protected]>

youkaichao · 2025-10-06T14:52:17Z

fix conflict and fix tests, and then it should be fine to merge.

Signed-off-by: Haisheng Chen <[email protected]>

Signed-off-by: Haisheng Chen <[email protected]> Signed-off-by: Haisheng Chen <[email protected]> Signed-off-by: Haisheng Chen <[email protected]> Co-authored-by: Haisheng Chen <[email protected]> Signed-off-by: 1994 <[email protected]>

Signed-off-by: Haisheng Chen <[email protected]> Signed-off-by: Haisheng Chen <[email protected]> Signed-off-by: Haisheng Chen <[email protected]> Co-authored-by: Haisheng Chen <[email protected]> Signed-off-by: Dhruvil Bhatt <[email protected]>

Signed-off-by: Haisheng Chen <[email protected]> Signed-off-by: Haisheng Chen <[email protected]> Signed-off-by: Haisheng Chen <[email protected]> Co-authored-by: Haisheng Chen <[email protected]> Signed-off-by: bbartels <[email protected]>

Signed-off-by: Haisheng Chen <[email protected]> Signed-off-by: Haisheng Chen <[email protected]> Signed-off-by: Haisheng Chen <[email protected]> Co-authored-by: Haisheng Chen <[email protected]>

Signed-off-by: Haisheng Chen <[email protected]> Signed-off-by: Haisheng Chen <[email protected]> Signed-off-by: Haisheng Chen <[email protected]> Co-authored-by: Haisheng Chen <[email protected]> Signed-off-by: xuebwang-amd <[email protected]>

Signed-off-by: Haisheng Chen <[email protected]> Signed-off-by: Haisheng Chen <[email protected]> Signed-off-by: Haisheng Chen <[email protected]> Co-authored-by: Haisheng Chen <[email protected]> Signed-off-by: 0xrushi <[email protected]>

Enable EPLB on ernie4.5-moe

f70da60

Signed-off-by: Haisheng Chen <[email protected]>

gemini-code-assist bot reviewed Aug 1, 2025

View reviewed changes

vllm/model_executor/models/ernie45_moe.py Outdated Show resolved Hide resolved

vllm/model_executor/models/ernie45_moe.py Show resolved Hide resolved

vllm/model_executor/models/ernie45_moe.py Show resolved Hide resolved

vllm/model_executor/models/ernie45_moe.py Show resolved Hide resolved

abmfy suggested changes Aug 1, 2025

View reviewed changes

Haisheng Chen added 2 commits August 1, 2025 19:06

fix the deletion of e_score_correction_bias

957b0c9

Signed-off-by: Haisheng Chen <[email protected]>

Fix the omitted code block

d371c5c

Signed-off-by: Haisheng Chen <[email protected]>

HsChen-sys requested a review from abmfy August 2, 2025 07:01

abmfy approved these changes Aug 5, 2025

View reviewed changes

robertgshaw2-redhat changed the title ~~Enable EPLB on ernie4.5-moe~~ [EPLB] Support ernie4.5-moe Sep 16, 2025

robertgshaw2-redhat added the eplb label Sep 16, 2025

mergify bot added the needs-rebase label Sep 16, 2025

Merge branch 'main' into dev-ernie-eplb2

19a1812

Signed-off-by: Haisheng Chen <[email protected]>

mergify bot removed the needs-rebase label Sep 16, 2025

HsChen-sys added 2 commits September 16, 2025 11:01

Merge branch 'main' into dev-ernie-eplb2

a6a0933

Merge branch 'main' into dev-ernie-eplb2

7ccbfb6

HsChen-sys requested a review from abmfy September 30, 2025 22:16

DarkLight1337 approved these changes Oct 1, 2025

View reviewed changes

DarkLight1337 enabled auto-merge (squash) October 1, 2025 03:03

github-actions bot added the ready ONLY add when PR is ready to merge/full CI is needed label Oct 1, 2025

auto-merge was automatically disabled October 1, 2025 04:30
Head branch was pushed to by a user without write access

Update ernie45_moe.py

0028a75

define example_moe first Signed-off-by: Haisheng Chen <[email protected]>

HsChen-sys force-pushed the dev-ernie-eplb2 branch from d893d79 to 0028a75 Compare October 1, 2025 05:06

DarkLight1337 enabled auto-merge (squash) October 1, 2025 05:07

Merge branch 'main' into dev-ernie-eplb2

a6a342e

Signed-off-by: Haisheng Chen <[email protected]>

auto-merge was automatically disabled October 7, 2025 04:53
Head branch was pushed to by a user without write access

Haisheng Chen and others added 4 commits October 6, 2025 22:08

run pre-commit

e43cfab

Signed-off-by: Haisheng Chen <[email protected]>

pre-commit

c487f3a

Signed-off-by: Haisheng Chen <[email protected]>

Merge branch 'main' into dev-ernie-eplb2

120ffc0

satisfy ruff

d029b6b

Signed-off-by: Haisheng Chen <[email protected]>

DarkLight1337 merged commit c5c8f5e into vllm-project:main Oct 12, 2025
53 checks passed

CSWYF3634076 mentioned this pull request Oct 13, 2025

[Model][Bugfix]fix ernie45 load failed due to ernie45 eplb code #26684

Merged

Uh oh!

[EPLB] Support ernie4.5-moe #22100

[EPLB] Support ernie4.5-moe #22100

Uh oh!

Conversation

HsChen-sys commented Aug 1, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Essential Elements of an Effective PR Description Checklist

Purpose

Test Plan

Test Result

(Optional) Documentation Update

Uh oh!

github-actions bot commented Aug 1, 2025

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

HsChen-sys commented Aug 1, 2025

Uh oh!

abmfy left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

abmfy Aug 1, 2025

Choose a reason for hiding this comment

Uh oh!

HsChen-sys Aug 2, 2025

Choose a reason for hiding this comment

Uh oh!

abmfy left a comment

Choose a reason for hiding this comment

Uh oh!

mergify bot commented Sep 16, 2025

Uh oh!

abmfy commented Sep 16, 2025

Uh oh!

HsChen-sys commented Sep 30, 2025

Uh oh!

abmfy commented Oct 1, 2025

Uh oh!

youkaichao commented Oct 6, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

HsChen-sys commented Aug 1, 2025 •

edited by github-actions bot

Loading

abmfy left a comment •

edited

Loading