Skip to content

Conversation

@HsChen-sys
Copy link
Contributor

@HsChen-sys HsChen-sys commented Aug 1, 2025

Essential Elements of an Effective PR Description Checklist

  • The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
  • The test plan, such as providing test command.
  • The test results, such as pasting the results comparison before and after, or e2e results
  • (Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.

Purpose

#20468 Enable EPLB on ERNIE-4.5-MoE

Test Plan

Running the following test script
python test_eplb.py --mode eplb
python test_eplb.py --mode normal

import json
import os
import argparse
from vllm import LLM, SamplingParams

prompt = "Explain the theory of relativity in simple terms."

RESULT_FILE = "eplb_test_output.json"

sampling_params = SamplingParams(
    temperature=0.0,
    top_p=1.0,
    top_k=1,
    max_tokens=4096
)

def run_inference(model_path: str, enable_eplb: bool, num_redundant_experts: int = 0):
    print(f"Running inference with EPLB={enable_eplb}, redundant experts={num_redundant_experts}")
    
    llm = LLM(
        model=model_path,
        tensor_parallel_size=2,
        enable_expert_parallel=True,
        enable_eplb=enable_eplb,
        num_redundant_experts=num_redundant_experts if enable_eplb else 0,
        eplb_window_size=1000,
        eplb_step_interval=100,
        enforce_eager=True,
        trust_remote_code=True
    )
    
    result = llm.generate([prompt], sampling_params)
    output_text = result[0].outputs[0].text.strip()
    
    print("Output:")
    print(output_text)
    print("-" * 50)

    return output_text

def save_result(key: str, value: list):
    if os.path.exists(RESULT_FILE):
        with open(RESULT_FILE, "r") as f:
            results = json.load(f)
    else:
        results = {}

    results[key] = value

    with open(RESULT_FILE, "w") as f:
        json.dump(results, f, indent=2)

    print(f"Output saved to {RESULT_FILE}")

def load_results():
    if os.path.exists(RESULT_FILE):
        with open(RESULT_FILE, "r") as f:
            return json.load(f)
    return {}

if __name__ == "__main__":
    parser = argparse.ArgumentParser()
    parser.add_argument("--mode", type=str, choices=["eplb", "normal", "compare"], required=True)
    args = parser.parse_args()

    MODEL_PATH = "baidu/ERNIE-4.5-21B-A3B-PT"

    if args.mode == "eplb":
        outputs = run_inference(MODEL_PATH, enable_eplb=True, num_redundant_experts=32)
        save_result("eplb", outputs)
    elif args.mode == "normal":
        outputs = run_inference(MODEL_PATH, enable_eplb=False)
        save_result("normal", outputs)

Test Result

The implementation looks correct. The EPLB output begins to divert from the original output in the middle of generation.

{
  "eplb": "The **theory of relativity**, developed by Albert Einstein in the early 20th century, has two main parts: **special relativity** and **general relativity**. Here's a simple breakdown:\n\n### **1. Special Relativity (1905)**\n- **Key Idea**: Time and space are not absolute\u2014they depend on how you're moving.\n- **Key Concepts**:\n  - **Time Dilation**: Moving clocks run slower. For example, if you're on a fast train, your watch might tick slightly slower than someone standing still.\n  - **Length Contraction**: Objects appear shorter when moving. A ruler on a train would look shorter to someone watching from the platform.\n  - **Mass-Energy Equivalence**: \"E=mc\u00b2\" means mass and energy are the same thing. A tiny amount of mass can release a huge amount of energy (like in nuclear reactions).\n- **Why It Matters**: It explains why the speed of light (300,000 km/s) is the same for everyone, no matter how fast they're moving.\n\n### **2. General Relativity (1915)**\n- **Key Idea**: Gravity isn't a force\u2014it's the curvature of spacetime caused by mass and energy.\n- **Key Concepts**:\n  - **Spacetime Curvature**: Massive objects like planets and stars warp the fabric of space and time. This curvature tells objects how to move.\n  - **Gravity as Geometry**: Instead of being pulled by a force, objects follow straight paths (geodesics) in curved spacetime. For example, Earth orbits the Sun because spacetime is curved by the Sun's mass.\n- **Why It Matters**: It explains why light bends around massive objects (like stars during eclipses) and predicts black holes.\n\n### **Simple Analogy**\nImagine spacetime as a trampoline:\n- **Special Relativity**: If you jump on the trampoline (move fast), the surface curves more, and things seem to move differently.\n- **General Relativity**: A heavy ball (massive object) on the trampoline creates a deep dent. Smaller balls (objects) roll toward it, just like gravity pulls objects toward massive bodies.\n\n### **Why It Matters Today**\n- **GPS Satellites**: Must account for time dilation (special relativity) because they move fast relative to Earth.\n- **Black Holes & Cosmology**: General relativity predicts phenomena like black holes and the Big Bang.\n- **Quantum Mechanics**: Relativity and quantum physics are still being unified, but they've already revolutionized our understanding of the universe.\n\nIn short, Einstein showed that time, space, and gravity are interconnected, and our perception of reality depends on how we're moving or where we are in the universe.",
  "normal": "The **theory of relativity**, developed by Albert Einstein in the early 20th century, has two main parts: **special relativity** and **general relativity**. Here's a simple breakdown:\n\n### **1. Special Relativity (1905)**\n- **Key Idea**: Time and space are not absolute\u2014they depend on how you're moving.\n- **Key Concepts**:\n  - **Time Dilation**: Moving clocks run slower. For example, if you're on a fast train, your watch might tick slightly slower than someone standing still.\n  - **Length Contraction**: Objects appear shorter when moving. A ruler on a train would look shorter to someone watching from the platform.\n  - **Mass-Energy Equivalence**: \"E=mc\u00b2\" means mass and energy are the same thing. A tiny amount of mass can release a huge amount of energy (like in nuclear reactions).\n- **Why It Matters**: It explains why the speed of light (300,000 km/s) is the same for everyone, no matter how fast they're moving.\n\n### **2. General Relativity (1915)**\n- **Key Idea**: Gravity isn't a force\u2014it's the curvature of spacetime caused by mass and energy.\n- **Key Concepts**:\n  - **Spacetime Curvature**: Massive objects like planets and stars warp the fabric of space and time. This curvature tells objects how to move.\n  - **Gravity as Geometry**: Instead of being pulled by a force, objects follow straight paths (geodesics) in curved spacetime. For example, Earth orbits the Sun because spacetime is curved by the Sun's mass.\n- **Why It Matters**: It explains why light bends around massive objects (like stars during eclipses) and predicts black holes.\n\n### **Simple Analogy**\nImagine spacetime as a trampoline:\n- **Special Relativity**: If you jump on the trampoline (move fast), the fabric curves more, and nearby objects (like a ball) move along the curves.\n- **General Relativity**: A heavy ball (like the Sun) sinks deeply into the trampoline, curving the fabric so that a smaller ball (like Earth) rolls around it in a straight line (but on the curved surface).\n\n### **Why It's Revolutionary**\n- It unified space and time (special relativity) and showed gravity is geometry (general relativity).\n- It predicted phenomena like black holes, gravitational waves, and the expansion of the universe.\n\nIn short, relativity reshaped our understanding of reality, showing that time, space, and gravity are interconnected in ways that defy everyday intuition. \ud83c\udf0c\ud83d\ude80"
}

(Optional) Documentation Update

Signed-off-by: Haisheng Chen <[email protected]>
@github-actions
Copy link

github-actions bot commented Aug 1, 2025

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

🚀

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request enables Expert Parallelism Load Balancing (EPLB) for the ERNIE-4.5-MoE model. The changes correctly integrate EPLB configurations and adapt the model to the MixtureOfExperts interface. I've identified a few issues that could lead to runtime errors: one is a potential TypeError from an incorrect default value for moe_num_shared_experts, another is an UnboundLocalError from using a variable before it's guaranteed to be assigned, and two more are potential AttributeError from using attributes before they're properly initialized. I've provided suggestions to fix all issues. Once these are addressed, the changes look solid.

@HsChen-sys
Copy link
Contributor Author

@abmfy This PR enables EPLB on Ernie4.5-moe.

Copy link
Member

@abmfy abmfy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the contribution!

Overall LGTM, except for the removal of e_score_correction_bias, which could break the weight loading process.

Please fix this, and I’ll verify the accuracy afterward.

Comment on lines 468 to 493
if "e_score_correction_bias" in name:
name = name.replace("moe_statics", "gate")
loaded_weight = loaded_weight.squeeze(0)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why was e_score_correction_bias removed from this file? The ERNIE model might contain this parameter. Note that inside Ernie4_5_MoeMoE class this is also removed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for catching this! I’ve checked the commit history and saw that e_score_correction_bias was introduced at roughly the same time I began this PR.
My PR from last week failed the DCO check, and rebasing it onto main would have produced some conflicts and extra commits.
To keep the history clean, I instead created a fresh branch off main and merged last week’s work into it.
That merge temporarily dropped e_score_correction_bias here. I’ll add it back in the next commit.

Haisheng Chen added 2 commits August 1, 2025 19:06
@HsChen-sys HsChen-sys requested a review from abmfy August 2, 2025 07:01
Copy link
Member

@abmfy abmfy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

Accuracy tests:
w/o EPLB:

Tasks Version Filter n-shot Metric Value Stderr
gsm8k 3 flexible-extract 5 exact_match 0.8203 ± 0.0106
strict-match 5 exact_match 0.7885 ± 0.0112

w/ EPLB:

Tasks Version Filter n-shot Metric Value Stderr
gsm8k 3 flexible-extract 5 exact_match 0.8173 ± 0.0106
strict-match 5 exact_match 0.7885 ± 0.0112

Thanks for the contribution!

@robertgshaw2-redhat robertgshaw2-redhat changed the title Enable EPLB on ernie4.5-moe [EPLB] Support ernie4.5-moe Sep 16, 2025
@mergify
Copy link

mergify bot commented Sep 16, 2025

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @HsChen-sys.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

@mergify mergify bot added the needs-rebase label Sep 16, 2025
@mergify mergify bot removed the needs-rebase label Sep 16, 2025
@abmfy
Copy link
Member

abmfy commented Sep 16, 2025

@simon-mo Could you please take a look and approve? I’ve already reviewed it and confirmed the accuracy. Thank you!

@HsChen-sys HsChen-sys requested a review from abmfy September 30, 2025 22:16
@HsChen-sys
Copy link
Contributor Author

@abmfy Could you take a look when have time? I'm not sure what is wrong with this PR.

@abmfy
Copy link
Member

abmfy commented Oct 1, 2025

@simon-mo

@DarkLight1337 DarkLight1337 enabled auto-merge (squash) October 1, 2025 03:03
@github-actions github-actions bot added the ready ONLY add when PR is ready to merge/full CI is needed label Oct 1, 2025
auto-merge was automatically disabled October 1, 2025 04:30

Head branch was pushed to by a user without write access

define example_moe first

Signed-off-by: Haisheng Chen <[email protected]>
@DarkLight1337 DarkLight1337 enabled auto-merge (squash) October 1, 2025 05:07
@youkaichao
Copy link
Member

fix conflict and fix tests, and then it should be fine to merge.

auto-merge was automatically disabled October 7, 2025 04:53

Head branch was pushed to by a user without write access

Haisheng Chen and others added 4 commits October 6, 2025 22:08
Signed-off-by: Haisheng Chen <[email protected]>
Signed-off-by: Haisheng Chen <[email protected]>
Signed-off-by: Haisheng Chen <[email protected]>
@DarkLight1337 DarkLight1337 merged commit c5c8f5e into vllm-project:main Oct 12, 2025
53 checks passed
1994 pushed a commit to 1994/vllm that referenced this pull request Oct 14, 2025
Signed-off-by: Haisheng Chen <[email protected]>
Signed-off-by: Haisheng Chen <[email protected]>
Signed-off-by: Haisheng Chen <[email protected]>
Co-authored-by: Haisheng Chen <[email protected]>
Signed-off-by: 1994 <[email protected]>
Dhruvilbhatt pushed a commit to Dhruvilbhatt/vllm that referenced this pull request Oct 14, 2025
Signed-off-by: Haisheng Chen <[email protected]>
Signed-off-by: Haisheng Chen <[email protected]>
Signed-off-by: Haisheng Chen <[email protected]>
Co-authored-by: Haisheng Chen <[email protected]>
Signed-off-by: Dhruvil Bhatt <[email protected]>
bbartels pushed a commit to bbartels/vllm that referenced this pull request Oct 16, 2025
Signed-off-by: Haisheng Chen <[email protected]>
Signed-off-by: Haisheng Chen <[email protected]>
Signed-off-by: Haisheng Chen <[email protected]>
Co-authored-by: Haisheng Chen <[email protected]>
Signed-off-by: bbartels <[email protected]>
lywa1998 pushed a commit to lywa1998/vllm that referenced this pull request Oct 20, 2025
Signed-off-by: Haisheng Chen <[email protected]>
Signed-off-by: Haisheng Chen <[email protected]>
Signed-off-by: Haisheng Chen <[email protected]>
Co-authored-by: Haisheng Chen <[email protected]>
alhridoy pushed a commit to alhridoy/vllm that referenced this pull request Oct 24, 2025
Signed-off-by: Haisheng Chen <[email protected]>
Signed-off-by: Haisheng Chen <[email protected]>
Signed-off-by: Haisheng Chen <[email protected]>
Co-authored-by: Haisheng Chen <[email protected]>
xuebwang-amd pushed a commit to xuebwang-amd/vllm that referenced this pull request Oct 24, 2025
Signed-off-by: Haisheng Chen <[email protected]>
Signed-off-by: Haisheng Chen <[email protected]>
Signed-off-by: Haisheng Chen <[email protected]>
Co-authored-by: Haisheng Chen <[email protected]>
Signed-off-by: xuebwang-amd <[email protected]>
xuebwang-amd pushed a commit to xuebwang-amd/vllm that referenced this pull request Oct 24, 2025
Signed-off-by: Haisheng Chen <[email protected]>
Signed-off-by: Haisheng Chen <[email protected]>
Signed-off-by: Haisheng Chen <[email protected]>
Co-authored-by: Haisheng Chen <[email protected]>
Signed-off-by: xuebwang-amd <[email protected]>
0xrushi pushed a commit to 0xrushi/vllm that referenced this pull request Oct 26, 2025
Signed-off-by: Haisheng Chen <[email protected]>
Signed-off-by: Haisheng Chen <[email protected]>
Signed-off-by: Haisheng Chen <[email protected]>
Co-authored-by: Haisheng Chen <[email protected]>
Signed-off-by: 0xrushi <[email protected]>
0xrushi pushed a commit to 0xrushi/vllm that referenced this pull request Oct 26, 2025
Signed-off-by: Haisheng Chen <[email protected]>
Signed-off-by: Haisheng Chen <[email protected]>
Signed-off-by: Haisheng Chen <[email protected]>
Co-authored-by: Haisheng Chen <[email protected]>
Signed-off-by: 0xrushi <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

eplb ready ONLY add when PR is ready to merge/full CI is needed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants