[MoE Refactor] Move expert map related code into ExpertMapManager class#41046
Conversation
There was a problem hiding this comment.
Code Review
This pull request introduces the ExpertMapManager class to centralize and manage expert ID mappings, placement strategies, and routing tables for MoE layers, refactoring this logic out of layer.py. The feedback identifies critical state management issues in the update and update_expert_map methods, specifically regarding the synchronization of local expert counts when shared experts are present and the need to keep configuration objects in sync during dynamic updates. Additionally, the round-robin routing table logic requires correction to include shared experts to prevent potential out-of-bounds errors in kernels.
Signed-off-by: Bill Nell <bnell@redhat.com>
Signed-off-by: Bill Nell <bnell@redhat.com>
Signed-off-by: Bill Nell <bnell@redhat.com>
Signed-off-by: Bill Nell <bnell@redhat.com>
|
This pull request has merge conflicts that must be resolved before it can be |
Signed-off-by: Bill Nell <bnell@redhat.com>
Signed-off-by: Bill Nell <bnell@redhat.com>
yzong-rh
left a comment
There was a problem hiding this comment.
Three remaining concerns with ExpertMapManager
self.devicemight throw an error ifexpert_mapsare not initialized (whenep_size == 1for example).- Usage of
self.devicein_ensure_routing_tables_initializedis redundant thanks towith self.device:. - In order to ensure idempotency
_ensure_routing_tables_initializednever override already setrouting_tables. This could be problematic inupdate.
Signed-off-by: Bill Nell <bnell@redhat.com>
yzong-rh
left a comment
There was a problem hiding this comment.
Other than that LGTM. Thanks!
Signed-off-by: Bill Nell <bnell@redhat.com>
| return expert_placement_strategy | ||
|
|
||
|
|
||
| class ExpertMapManager: |
There was a problem hiding this comment.
I think what is confusing about ExpertMap is that it is used ** sometimes **
It would help a lot for this class to explain when and when it is not used
@varun-sundar-rabindranath might have the best view of this
There was a problem hiding this comment.
IIUC it's used for EP to map from global<->local expert ids. Not all backends/experts require it though. Some all2alls handle it internally and pass that along to the experts.
Signed-off-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com>
|
Hi @bnellnm, the pre-commit checks have failed. Please run: uv pip install pre-commit>=4.5.1
pre-commit install
pre-commit run --all-filesThen, commit the changes and push to your branch. For future commits, Tip Is
|
|
Hi @bnellnm, the pre-commit checks have failed. Please run: uv pip install pre-commit>=4.5.1
pre-commit install
pre-commit run --all-filesThen, commit the changes and push to your branch. For future commits, Tip Is
|
Signed-off-by: Robert Shaw <robertgshaw2@gmail.com>
Signed-off-by: Bill Nell <bnell@redhat.com>
Signed-off-by: Bill Nell <bnell@redhat.com>
Signed-off-by: Bill Nell <bnell@redhat.com>
206eaed
into
vllm-project:main
…ss (vllm-project#41046) Signed-off-by: Bill Nell <bnell@redhat.com> Signed-off-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com> Signed-off-by: Robert Shaw <robertgshaw2@gmail.com> Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com> Co-authored-by: Robert Shaw <robertgshaw2@gmail.com>
### What this PR does / why we need it? 1. fix vllm-project/vllm#33322 overwrite `gpu_modelrunner.sync_and_gather_intermediate_tensors`, for the sceniro `pp+sp+tp`, skip scatter the residual for ascend 2. vllm-project/vllm#35520 Adapted to the modifications of `ModelRunner v2` for hybrid attn in interface level, . Todo: Added support for Mamba in ModelRunner in Ascend. any pull_request is welcome 3. vllm-project/vllm#40711 4. vllm-project/vllm#42121 5. vllm-project/vllm#41706 6. vllm-project/vllm#39917 Disable `async_schedule` when `enable_return_routed_experts=True` 7. vllm-project/vllm#41046 8. vllm-project/vllm#41055 9. vllm-project/vllm#41035 10. vllm-project/vllm#42434 ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? - vLLM version: v0.20.1 - vLLM main: vllm-project/vllm@c7aa186 --------- Signed-off-by: wangli <wangli858794774@gmail.com>
### What this PR does / why we need it? 1. fix vllm-project/vllm#33322 overwrite `gpu_modelrunner.sync_and_gather_intermediate_tensors`, for the sceniro `pp+sp+tp`, skip scatter the residual for ascend 2. vllm-project/vllm#35520 Adapted to the modifications of `ModelRunner v2` for hybrid attn in interface level, . Todo: Added support for Mamba in ModelRunner in Ascend. any pull_request is welcome 3. vllm-project/vllm#40711 4. vllm-project/vllm#42121 5. vllm-project/vllm#41706 6. vllm-project/vllm#39917 Disable `async_schedule` when `enable_return_routed_experts=True` 7. vllm-project/vllm#41046 8. vllm-project/vllm#41055 9. vllm-project/vllm#41035 10. vllm-project/vllm#42434 ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? - vLLM version: v0.20.1 - vLLM main: vllm-project/vllm@c7aa186 --------- Signed-off-by: wangli <wangli858794774@gmail.com>
### What this PR does / why we need it? 1. fix vllm-project/vllm#33322 overwrite `gpu_modelrunner.sync_and_gather_intermediate_tensors`, for the sceniro `pp+sp+tp`, skip scatter the residual for ascend 2. vllm-project/vllm#35520 Adapted to the modifications of `ModelRunner v2` for hybrid attn in interface level, . Todo: Added support for Mamba in ModelRunner in Ascend. any pull_request is welcome 3. vllm-project/vllm#40711 4. vllm-project/vllm#42121 5. vllm-project/vllm#41706 6. vllm-project/vllm#39917 Disable `async_schedule` when `enable_return_routed_experts=True` 7. vllm-project/vllm#41046 8. vllm-project/vllm#41055 9. vllm-project/vllm#41035 10. vllm-project/vllm#42434 ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? - vLLM version: v0.20.1 - vLLM main: vllm-project/vllm@c7aa186 --------- Signed-off-by: wangli <wangli858794774@gmail.com> Signed-off-by: 李少鹏 <lishaopeng21@huawei.com>
…ss (vllm-project#41046) Signed-off-by: Bill Nell <bnell@redhat.com> Signed-off-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com> Signed-off-by: Robert Shaw <robertgshaw2@gmail.com> Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com> Co-authored-by: Robert Shaw <robertgshaw2@gmail.com>
…ss (vllm-project#41046) Signed-off-by: Bill Nell <bnell@redhat.com> Signed-off-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com> Signed-off-by: Robert Shaw <robertgshaw2@gmail.com> Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com> Co-authored-by: Robert Shaw <robertgshaw2@gmail.com>
…ss (vllm-project#41046) Signed-off-by: Bill Nell <bnell@redhat.com> Signed-off-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com> Signed-off-by: Robert Shaw <robertgshaw2@gmail.com> Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com> Co-authored-by: Robert Shaw <robertgshaw2@gmail.com>
Purpose
Create
ExpertMapManagerclass to handle all expert map related functionality in theFusedMoElayer.Test Plan
CI
Test Result
cc @yzong-rh
Essential Elements of an Effective PR Description Checklist
supported_models.mdandexamplesfor a new model.