[MoE Refactor] EPLB refactoring for FusedMoE by bnellnm · Pull Request #41055 · vllm-project/vllm

bnellnm · 2026-04-27T21:53:14Z

Purpose

Use eplb_state | None instead of enable_eplb flag + eplb_state in FusedMoE and router classes.
Add set method to EplbLayerState.
Update tests

Test Plan

CI

Test Result

cc @yzong-rh

Essential Elements of an Effective PR Description Checklist

The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
The test plan, such as providing test command.
The test results, such as pasting the results comparison before and after, or e2e results
(Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.

Signed-off-by: Bill Nell <bnell@redhat.com>

claude

Claude Code Review

This pull request is from a fork — automated review is disabled. A repository maintainer can comment @claude review to run a one-time review.

gemini-code-assist

Code Review

This pull request introduces the EplbManager class to centralize Expert Parallelism Load Balancing (EPLB) logic, refactoring MoE layers and routers to delegate state management and weight collection. Feedback recommends using p.detach() instead of p.data for safer tensor access and replacing assertions with explicit RuntimeErrors for critical validation to prevent issues in optimized Python environments.

Signed-off-by: Bill Nell <bnell@redhat.com>

yzong-rh · 2026-05-06T01:21:09Z

https://github.com/neuralmagic/vllm/blob/2bc4adcc0e4d5a58cf2b69cab9d6126ba8882641/vllm/distributed/eplb/eplb_state.py#L642-L647
Still expects layer.eplb_state.

https://github.com/neuralmagic/vllm/blob/2bc4adcc0e4d5a58cf2b69cab9d6126ba8882641/vllm/model_executor/layers/quantization/modelopt.py#L1934-L1937
Still expects layer.enable_eplb.

AI also found:

tests/model_executor/test_routed_experts_capture.py

uses router.enable_eplb

uses router.eplb_state.*

tests/kernels/moe/test_routing.py

calls create_fused_moe_router(..., enable_eplb=..., eplb_state=...)

tests/distributed/test_eplb_fused_moe_layer_dep_nvfp4.py

sets fml.enable_eplb = True

Not caused by this refactor but likely a bug:
https://github.com/neuralmagic/vllm/blob/2bc4adcc0e4d5a58cf2b69cab9d6126ba8882641/vllm/model_executor/models/sarvam.py#L664-L668
which uses an incorrect set_eplb_state signature.

yzong-rh · 2026-05-06T02:11:32Z

Instead of creating a EplbManager wrapper, what if we flesh out EplbLayerState with set_state and get_expert_weights instead?
This moves the Eplb handling logic out of FusedMoE without introducing a manager class.

Signed-off-by: Bill Nell <bnell@redhat.com>

bnellnm · 2026-05-06T18:26:24Z

Instead of creating a EplbManager wrapper, what if we flesh out EplbLayerState with set_state and get_expert_weights instead? This moves the Eplb handling logic out of FusedMoE without introducing a manager class.

Good point. I've redone the PR so that there's still only EplbLayerState. It's now mostly removing the flag and using the presence of the state as an indicator of whether or not EPLB is enabled. I ended up moving the static methods on the defunct manager to other places in a later PR anyway.

Signed-off-by: Bill Nell <bnell@redhat.com>

yzong-rh

LGTM, thanks!

robertgshaw2-redhat · 2026-05-11T15:45:47Z

            )

-        if self.enable_eplb and not self.quant_method.supports_eplb:
+        if enable_eplb and not self.quant_method.supports_eplb:


note to self, it seems weird that It would be the quant method that determines this

Shoudlnt it be the kernel?

mergify · 2026-05-11T15:57:59Z

Hi @bnellnm, the pre-commit checks have failed. Please run:

uv pip install pre-commit>=4.5.1
pre-commit install
pre-commit run --all-files

Then, commit the changes and push to your branch.

For future commits, pre-commit will run automatically on changed files before each commit.

Tip

Is mypy failing?

mypy is run differently in CI. If the failure is related to this check, please use the following command to run it locally:

# For mypy (substitute "3.10" with the failing version if needed)
pre-commit run --hook-stage manual mypy-3.10

Signed-off-by: Bill Nell <bnell@redhat.com>

ilmarkov

LGTM. Just small nits.

Signed-off-by: Bill Nell <bnell@redhat.com>

mergify · 2026-05-12T13:52:23Z

Hi @bnellnm, the pre-commit checks have failed. Please run:

uv pip install pre-commit>=4.5.1
pre-commit install
pre-commit run --all-files

Then, commit the changes and push to your branch.

For future commits, pre-commit will run automatically on changed files before each commit.

Tip

Is mypy failing?

mypy is run differently in CI. If the failure is related to this check, please use the following command to run it locally:

# For mypy (substitute "3.10" with the failing version if needed)
pre-commit run --hook-stage manual mypy-3.10

Signed-off-by: Bill Nell <bnell@redhat.com>

…oject#41055 API PR vllm-project#41055 ([MoE Refactor] EPLB refactoring for FusedMoE) removed the `enable_eplb` parameter from `BaseRouter.__init__`; the new API uses `eplb_state=None` (disabled) vs. populated `eplb_state` (enabled). Reverting vllm-project#39917 restored the pre-vllm-project#39917 test file that still passed `enable_eplb=False`, causing TypeError on import/instantiation. Align the test helper with the current API: `_make_router` now takes an optional `eplb_state` (defaults to None), and the EPLB-enabled test builds a fully-populated state and passes it in. Signed-off-by: Ao Shen <aoshen@inferact.ai> Signed-off-by: aoshen02 <aoshen@inferact.ai>

### What this PR does / why we need it? 1. fix vllm-project/vllm#33322 overwrite `gpu_modelrunner.sync_and_gather_intermediate_tensors`, for the sceniro `pp+sp+tp`, skip scatter the residual for ascend 2. vllm-project/vllm#35520 Adapted to the modifications of `ModelRunner v2` for hybrid attn in interface level, . Todo: Added support for Mamba in ModelRunner in Ascend. any pull_request is welcome 3. vllm-project/vllm#40711 4. vllm-project/vllm#42121 5. vllm-project/vllm#41706 6. vllm-project/vllm#39917 Disable `async_schedule` when `enable_return_routed_experts=True` 7. vllm-project/vllm#41046 8. vllm-project/vllm#41055 9. vllm-project/vllm#41035 10. vllm-project/vllm#42434 ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? - vLLM version: v0.20.1 - vLLM main: vllm-project/vllm@c7aa186 --------- Signed-off-by: wangli <wangli858794774@gmail.com>

### What this PR does / why we need it? 1. fix vllm-project/vllm#33322 overwrite `gpu_modelrunner.sync_and_gather_intermediate_tensors`, for the sceniro `pp+sp+tp`, skip scatter the residual for ascend 2. vllm-project/vllm#35520 Adapted to the modifications of `ModelRunner v2` for hybrid attn in interface level, . Todo: Added support for Mamba in ModelRunner in Ascend. any pull_request is welcome 3. vllm-project/vllm#40711 4. vllm-project/vllm#42121 5. vllm-project/vllm#41706 6. vllm-project/vllm#39917 Disable `async_schedule` when `enable_return_routed_experts=True` 7. vllm-project/vllm#41046 8. vllm-project/vllm#41055 9. vllm-project/vllm#41035 10. vllm-project/vllm#42434 ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? - vLLM version: v0.20.1 - vLLM main: vllm-project/vllm@c7aa186 --------- Signed-off-by: wangli <wangli858794774@gmail.com> Signed-off-by: 李少鹏 <lishaopeng21@huawei.com>

…ltiModelEngineClient, Qwen3.5 compilation, and EPLB refactoring (#1436) Fix upstream regressions affecting hourly CI: 1. **MultiModelEngineClient**: Added missing `notify_kv_transfer_request_rejected` abstract method (upstream PR vllm-project/vllm#41269) 2. **Qwen3.5 test harness**: Updated `test_common.py` to read `enforce_eager` from model card config (with env var override), enabling per-model compilation control 3. **EPLB refactoring**: Removed `EMPTY_EPLB_STATE` import and `enable_eplb` parameter from `patched_create_fused_moe_router` after upstream MoE refactor (upstream PR vllm-project/vllm#41055) Note: The `enforce_eager: true` workaround for Qwen3.5 compilation has been removed — the root cause (mamba_type str-vs-Enum comparison in hybrid cache allocation) is properly fixed by #1449, which should merge first. Verified on HPU: unit tests pass on Gaudi 3 (MoE, FP8, compressed tensors). --------- Signed-off-by: Paweł Olejniczak <pawelx.olejniczak@intel.com> Signed-off-by: Pawel Olejniczak <pawelx.olejniczak@intel.com> Co-authored-by: Iryna Boiko <iryna.boiko@intel.com>

Signed-off-by: Bill Nell <bnell@redhat.com> Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com>

bnellnm added 2 commits April 27, 2026 20:54

eplb manager

9fe1392

Signed-off-by: Bill Nell <bnell@redhat.com>

eplb manager

9331477

Signed-off-by: Bill Nell <bnell@redhat.com>

bnellnm requested review from mgoin, pavanimajety, robertgshaw2-redhat, tlrmchlsmth and yewentao256 as code owners April 27, 2026 21:53

claude Bot reviewed Apr 27, 2026

View reviewed changes

gemini-code-assist Bot reviewed Apr 27, 2026

View reviewed changes

Comment thread vllm/model_executor/layers/fused_moe/eplb_manager.py Outdated

Comment thread vllm/model_executor/layers/fused_moe/eplb_manager.py Outdated

fix

3498820

Signed-off-by: Bill Nell <bnell@redhat.com>

bnellnm requested a review from WoosukKwon as a code owner April 28, 2026 19:52

fix

692912e

Signed-off-by: Bill Nell <bnell@redhat.com>

bnellnm mentioned this pull request Apr 29, 2026

[MoE Refactor] FusedMoE/MoERunner inversion refactor #41184

Open

4 tasks

bnellnm added 2 commits May 5, 2026 15:12

move mapping fn back to FusedMoE

90c74a8

Signed-off-by: Bill Nell <bnell@redhat.com>

Merge remote-tracking branch 'origin/main' into eplb-manager

2bc4adc

Signed-off-by: Bill Nell <bnell@redhat.com>

review comments + redo stuff

0780907

Signed-off-by: Bill Nell <bnell@redhat.com>

bnellnm changed the title ~~[MoE Refactor] Add EplbManager class to handle EPLB functionality~~ [MoE Refactor] EPLB refactoring for FusedMoE May 6, 2026

Merge remote-tracking branch 'origin/main' into eplb-manager

227c0b7

Signed-off-by: Bill Nell <bnell@redhat.com>

yzong-rh approved these changes May 10, 2026

View reviewed changes

Comment thread vllm/model_executor/layers/fused_moe/layer.py Outdated

Comment thread vllm/model_executor/layers/fused_moe/layer.py Outdated

Comment thread vllm/model_executor/layers/fused_moe/layer.py Outdated

yzong-rh reviewed May 10, 2026

View reviewed changes

Comment thread ...rs/quantization/compressed_tensors/compressed_tensors_moe/compressed_tensors_moe_w4a8_fp8.py

Comment thread ...s/quantization/compressed_tensors/compressed_tensors_moe/compressed_tensors_moe_w4a8_int8.py

robertgshaw2-redhat reviewed May 11, 2026

View reviewed changes

robertgshaw2-redhat added the ready ONLY add when PR is ready to merge/full CI is needed label May 11, 2026

Merge branch 'main' into eplb-manager

72eb09e

robertgshaw2-redhat requested a review from zyongye as a code owner May 11, 2026 15:48

bnellnm added 2 commits May 11, 2026 16:35

Merge remote-tracking branch 'origin/main' into eplb-manager

9b669b3

review comments + fix merge

33db853

Signed-off-by: Bill Nell <bnell@redhat.com>

Merge remote-tracking branch 'nm-vllm/eplb-manager' into eplb-manager

199a105

bnellnm requested a review from tjtanaa as a code owner May 11, 2026 16:39

ilmarkov reviewed May 11, 2026

View reviewed changes

Comment thread vllm/model_executor/layers/fused_moe/router/base_router.py Outdated

Comment thread vllm/model_executor/layers/fused_moe/router/aiter_shared_routed_fused_moe_router.py Outdated

review comments

0e79f80

Signed-off-by: Bill Nell <bnell@redhat.com>

bnellnm requested review from ilmarkov and robertgshaw2-redhat May 11, 2026 17:26

ilmarkov approved these changes May 11, 2026

View reviewed changes

bnellnm added 2 commits May 12, 2026 13:45

Merge remote-tracking branch 'origin/main' into eplb-manager

cddd11e

Signed-off-by: Bill Nell <bnell@redhat.com>

fix merge

6f2a267

Signed-off-by: Bill Nell <bnell@redhat.com>

fix merge issue

be9d8ef

Signed-off-by: Bill Nell <bnell@redhat.com>

robertgshaw2-redhat approved these changes May 12, 2026

View reviewed changes

robertgshaw2-redhat merged commit d9b4990 into vllm-project:main May 12, 2026
92 checks passed

pawel-olejniczak mentioned this pull request May 13, 2026

[FIX_FOR_VLLM_CUSTOM=dcacdf9a8860a86401127d1c8f93ebf3cfbfd026] Fix MultiModelEngineClient, Qwen3.5 compilation, and EPLB refactoring vllm-project/vllm-gaudi#1436

Merged

This was referenced May 13, 2026

[CI] Upgrade vllm commit to 0512 vllm-project/vllm-ascend#9054

Closed

[CI] Main2main 0513 vllm-project/vllm-ascend#9137

Closed

aoshen02 mentioned this pull request May 14, 2026

Revert "[Core] Replace routing replay with device cache and async D2H pipeline" (#39917) #42434

Merged

Potabk mentioned this pull request May 14, 2026

[CI] Main2main 0514 vllm-project/vllm-ascend#9155

Merged

Uh oh!

Conversation

bnellnm commented Apr 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Test Plan

Test Result

Uh oh!

claude Bot left a comment

Choose a reason for hiding this comment

Claude Code Review

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

yzong-rh commented May 6, 2026

Uh oh!

yzong-rh commented May 6, 2026

Uh oh!

bnellnm commented May 6, 2026

Uh oh!

yzong-rh left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

robertgshaw2-redhat May 11, 2026

Choose a reason for hiding this comment

Uh oh!

mergify Bot commented May 11, 2026

Uh oh!

ilmarkov left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

mergify Bot commented May 12, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

bnellnm commented Apr 27, 2026 •

edited

Loading