[MoE Refactor] Move expert map related code into ExpertMapManager class by bnellnm · Pull Request #41046 · vllm-project/vllm

bnellnm · 2026-04-27T20:13:53Z

Purpose

Create ExpertMapManager class to handle all expert map related functionality in the FusedMoE layer.

Test Plan

CI

Test Result

cc @yzong-rh

Essential Elements of an Effective PR Description Checklist

The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
The test plan, such as providing test command.
The test results, such as pasting the results comparison before and after, or e2e results
(Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.

Signed-off-by: Bill Nell <bnell@redhat.com>

claude

Claude Code Review

This pull request is from a fork — automated review is disabled. A repository maintainer can comment @claude review to run a one-time review.

gemini-code-assist

Code Review

This pull request introduces the ExpertMapManager class to centralize and manage expert ID mappings, placement strategies, and routing tables for MoE layers, refactoring this logic out of layer.py. The feedback identifies critical state management issues in the update and update_expert_map methods, specifically regarding the synchronization of local expert counts when shared experts are present and the need to keep configuration objects in sync during dynamic updates. Additionally, the round-robin routing table logic requires correction to include shared experts to prevent potential out-of-bounds errors in kernels.

Signed-off-by: Bill Nell <bnell@redhat.com>

…-map-manager

Signed-off-by: Bill Nell <bnell@redhat.com>

…-map-manager

mergify · 2026-05-06T19:54:13Z

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @bnellnm.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

Signed-off-by: Bill Nell <bnell@redhat.com>

yzong-rh

Three remaining concerns with ExpertMapManager

self.device might throw an error if expert_maps are not initialized (when ep_size == 1 for example).
Usage of self.device in _ensure_routing_tables_initialized is redundant thanks to with self.device:.
In order to ensure idempotency _ensure_routing_tables_initialized never override already set routing_tables. This could be problematic in update.

Signed-off-by: Bill Nell <bnell@redhat.com>

yzong-rh

Other than that LGTM. Thanks!

Signed-off-by: Bill Nell <bnell@redhat.com>

robertgshaw2-redhat · 2026-05-11T15:17:24Z

+    return expert_placement_strategy
+
+
+class ExpertMapManager:


I think what is confusing about ExpertMap is that it is used ** sometimes **

It would help a lot for this class to explain when and when it is not used

@varun-sundar-rabindranath might have the best view of this

IIUC it's used for EP to map from global<->local expert ids. Not all backends/experts require it though. Some all2alls handle it internally and pass that along to the experts.

Signed-off-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com>

mergify · 2026-05-11T15:23:59Z

Hi @bnellnm, the pre-commit checks have failed. Please run:

uv pip install pre-commit>=4.5.1
pre-commit install
pre-commit run --all-files

Then, commit the changes and push to your branch.

For future commits, pre-commit will run automatically on changed files before each commit.

Tip

Is mypy failing?

mypy is run differently in CI. If the failure is related to this check, please use the following command to run it locally:

# For mypy (substitute "3.10" with the failing version if needed)
pre-commit run --hook-stage manual mypy-3.10

mergify · 2026-05-11T15:34:12Z

Hi @bnellnm, the pre-commit checks have failed. Please run:

uv pip install pre-commit>=4.5.1
pre-commit install
pre-commit run --all-files

Then, commit the changes and push to your branch.

For future commits, pre-commit will run automatically on changed files before each commit.

Tip

Is mypy failing?

mypy is run differently in CI. If the failure is related to this check, please use the following command to run it locally:

# For mypy (substitute "3.10" with the failing version if needed)
pre-commit run --hook-stage manual mypy-3.10

Signed-off-by: Robert Shaw <robertgshaw2@gmail.com>

Signed-off-by: Bill Nell <bnell@redhat.com>

…-map-manager

Signed-off-by: Bill Nell <bnell@redhat.com>

…ss (vllm-project#41046) Signed-off-by: Bill Nell <bnell@redhat.com> Signed-off-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com> Signed-off-by: Robert Shaw <robertgshaw2@gmail.com> Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com> Co-authored-by: Robert Shaw <robertgshaw2@gmail.com>

### What this PR does / why we need it? 1. fix vllm-project/vllm#33322 overwrite `gpu_modelrunner.sync_and_gather_intermediate_tensors`, for the sceniro `pp+sp+tp`, skip scatter the residual for ascend 2. vllm-project/vllm#35520 Adapted to the modifications of `ModelRunner v2` for hybrid attn in interface level, . Todo: Added support for Mamba in ModelRunner in Ascend. any pull_request is welcome 3. vllm-project/vllm#40711 4. vllm-project/vllm#42121 5. vllm-project/vllm#41706 6. vllm-project/vllm#39917 Disable `async_schedule` when `enable_return_routed_experts=True` 7. vllm-project/vllm#41046 8. vllm-project/vllm#41055 9. vllm-project/vllm#41035 10. vllm-project/vllm#42434 ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? - vLLM version: v0.20.1 - vLLM main: vllm-project/vllm@c7aa186 --------- Signed-off-by: wangli <wangli858794774@gmail.com>

### What this PR does / why we need it? 1. fix vllm-project/vllm#33322 overwrite `gpu_modelrunner.sync_and_gather_intermediate_tensors`, for the sceniro `pp+sp+tp`, skip scatter the residual for ascend 2. vllm-project/vllm#35520 Adapted to the modifications of `ModelRunner v2` for hybrid attn in interface level, . Todo: Added support for Mamba in ModelRunner in Ascend. any pull_request is welcome 3. vllm-project/vllm#40711 4. vllm-project/vllm#42121 5. vllm-project/vllm#41706 6. vllm-project/vllm#39917 Disable `async_schedule` when `enable_return_routed_experts=True` 7. vllm-project/vllm#41046 8. vllm-project/vllm#41055 9. vllm-project/vllm#41035 10. vllm-project/vllm#42434 ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? - vLLM version: v0.20.1 - vLLM main: vllm-project/vllm@c7aa186 --------- Signed-off-by: wangli <wangli858794774@gmail.com> Signed-off-by: 李少鹏 <lishaopeng21@huawei.com>

…ss (vllm-project#41046) Signed-off-by: Bill Nell <bnell@redhat.com> Signed-off-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com> Signed-off-by: Robert Shaw <robertgshaw2@gmail.com> Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com> Co-authored-by: Robert Shaw <robertgshaw2@gmail.com>

bnellnm added 4 commits April 27, 2026 20:01

expert map manager

ced1799

Signed-off-by: Bill Nell <bnell@redhat.com>

wip

ba52a86

Signed-off-by: Bill Nell <bnell@redhat.com>

update

64cf4ac

Signed-off-by: Bill Nell <bnell@redhat.com>

merge

42c7fc4

Signed-off-by: Bill Nell <bnell@redhat.com>

bnellnm requested review from WoosukKwon, mgoin, pavanimajety, tlrmchlsmth and yewentao256 as code owners April 27, 2026 20:13

claude Bot reviewed Apr 27, 2026

View reviewed changes

gemini-code-assist Bot reviewed Apr 27, 2026

View reviewed changes

Comment thread vllm/model_executor/layers/fused_moe/expert_map_manager.py Outdated

Comment thread vllm/model_executor/layers/fused_moe/expert_map_manager.py

Comment thread vllm/model_executor/layers/fused_moe/layer.py

fix num_local_expert update

2332fd7

Signed-off-by: Bill Nell <bnell@redhat.com>

bnellnm requested a review from robertgshaw2-redhat April 27, 2026 22:41

bnellnm and others added 3 commits April 28, 2026 16:21

Merge branch 'main' into expert-map-manager

afb772c

fix

aa210d3

Signed-off-by: Bill Nell <bnell@redhat.com>

Merge remote-tracking branch 'nm-vllm/expert-map-manager' into expert…

914f53c

…-map-manager

bnellnm mentioned this pull request Apr 29, 2026

[MoE Refactor] FusedMoE/MoERunner inversion refactor #41184

Open

4 tasks

bnellnm and others added 2 commits April 29, 2026 19:53

try to fix doc

c74f285

Signed-off-by: Bill Nell <bnell@redhat.com>

Merge branch 'main' into expert-map-manager

1efaddb

yzong-rh reviewed May 5, 2026

View reviewed changes

Comment thread vllm/model_executor/layers/fused_moe/expert_map_manager.py Outdated

Comment thread vllm/model_executor/layers/fused_moe/layer.py

yzong-rh reviewed May 5, 2026

View reviewed changes

Comment thread vllm/model_executor/layers/fused_moe/layer.py Outdated

Comment thread vllm/model_executor/layers/fused_moe/layer.py

Comment thread vllm/model_executor/layers/fused_moe/expert_map_manager.py Outdated

bnellnm added 3 commits May 6, 2026 19:01

review comments

c1a332c

Signed-off-by: Bill Nell <bnell@redhat.com>

cleanup routing table initialization and updating

4a3d996

Signed-off-by: Bill Nell <bnell@redhat.com>

Merge remote-tracking branch 'nm-vllm/expert-map-manager' into expert…

f96b5cb

…-map-manager

bnellnm requested a review from tjtanaa as a code owner May 6, 2026 19:53

mergify Bot added the needs-rebase label May 6, 2026

bnellnm added 2 commits May 6, 2026 19:54

Merge remote-tracking branch 'origin/main' into expert-map-manager

c038c7e

Signed-off-by: Bill Nell <bnell@redhat.com>

fix local_num_experts

778c141

Signed-off-by: Bill Nell <bnell@redhat.com>

mergify Bot removed the needs-rebase label May 6, 2026

yzong-rh reviewed May 7, 2026

View reviewed changes

review comments

ba6118e

Signed-off-by: Bill Nell <bnell@redhat.com>

yzong-rh approved these changes May 8, 2026

View reviewed changes

Comment thread vllm/model_executor/layers/fused_moe/expert_map_manager.py Outdated

changing exception type/message for clarity

ef6bdce

Signed-off-by: Bill Nell <bnell@redhat.com>

robertgshaw2-redhat reviewed May 11, 2026

View reviewed changes

Comment thread vllm/model_executor/layers/fused_moe/layer.py Outdated

robertgshaw2-redhat reviewed May 11, 2026

View reviewed changes

Merge branch 'main' into expert-map-manager

0ad5271

Signed-off-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com>

robertgshaw2-redhat requested a review from zyongye as a code owner May 11, 2026 15:20

Add import for ExpertMapManager in layer.py

587e895

robertgshaw2-redhat added the ready ONLY add when PR is ready to merge/full CI is needed label May 11, 2026

robertgshaw2-redhat and others added 6 commits May 11, 2026 11:40

pre-commit fix

b66852f

Signed-off-by: Robert Shaw <robertgshaw2@gmail.com>

fixes

d30dde0

Signed-off-by: Bill Nell <bnell@redhat.com>

Merge remote-tracking branch 'origin/main' into expert-map-manager

70cfd20

Signed-off-by: Bill Nell <bnell@redhat.com>

fix import path

f189b62

Signed-off-by: Bill Nell <bnell@redhat.com>

Merge remote-tracking branch 'nm-vllm/expert-map-manager' into expert…

92841ef

…-map-manager

add some comments

b926c66

Signed-off-by: Bill Nell <bnell@redhat.com>

bnellnm requested a review from robertgshaw2-redhat May 11, 2026 17:27

robertgshaw2-redhat merged commit 206eaed into vllm-project:main May 12, 2026
84 of 85 checks passed

This was referenced May 13, 2026

[CI] Upgrade vllm commit to 0512 vllm-project/vllm-ascend#9054

Closed

[CI] Main2main 0513 vllm-project/vllm-ascend#9137

Closed

[CI] Main2main 0514 vllm-project/vllm-ascend#9155

Merged

Uh oh!

Conversation

bnellnm commented Apr 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Test Plan

Test Result

Uh oh!

claude Bot left a comment

Choose a reason for hiding this comment

Claude Code Review

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

mergify Bot commented May 6, 2026

Uh oh!

yzong-rh left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

yzong-rh left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

robertgshaw2-redhat May 11, 2026

Choose a reason for hiding this comment

Uh oh!

bnellnm May 11, 2026

Choose a reason for hiding this comment

Uh oh!

mergify Bot commented May 11, 2026

Uh oh!

mergify Bot commented May 11, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

bnellnm commented Apr 27, 2026 •

edited

Loading