Bugfix: Align expert map shapes with redundant experts in EPLB adjustment#5285
Bugfix: Align expert map shapes with redundant experts in EPLB adjustment#5285wangxiyuan merged 6 commits intovllm-project:mainfrom
Conversation
There was a problem hiding this comment.
Code Review
This pull request addresses a shape mismatch bug concerning redundant experts in MoE layers. The modifications correctly adjust the shapes of expert_placement_map and log2phy_expert_map, and ensure the correct number of experts is passed to the underlying operators. The test updates are consistent with these fixes. My primary feedback focuses on enhancing the implementation within token_dispatcher.py. I suggest refactoring the code to avoid passing state via an instance attribute, which will improve the code's robustness and maintainability.
| quant_mode = 0 | ||
| moe_expert_num = len(expert_map) | ||
| quant_mode = 2 if self.with_quant else 0 | ||
| self.moe_expert_num = len(expert_map) + global_redundant_expert_num |
There was a problem hiding this comment.
Storing moe_expert_num as an instance attribute self.moe_expert_num creates an implicit dependency between token_dispatch (via get_dispatch_mc2_kwargs) and token_combine (via get_combine_mc_kwargs). This makes the code fragile and harder to reason about, as token_combine now relies on token_dispatch having been called first to set this attribute. A more robust approach is to pass state explicitly using the context_metadata dictionary.
I recommend the following refactoring:
-
In
get_dispatch_mc2_kwargs: Calculatemoe_expert_numas a local variable and do not assign it toself.# In get_dispatch_mc2_kwargs moe_expert_num = len(expert_map) + global_redundant_expert_num kwargs_mc2 = { # ... "moe_expert_num": moe_expert_num, # ... }
-
In
token_dispatch: Passglobal_redundant_expert_numthroughcontext_metadata.# In token_dispatch context_metadata["global_redundant_expert_num"] = global_redundant_expert_num
-
In
get_combine_mc_kwargs: Recalculatemoe_expert_numusing the value fromcontext_metadatainstead of reading fromself.# In get_combine_mc_kwargs global_redundant_expert_num = context_metadata["global_redundant_expert_num"] moe_expert_num = len(expert_map) + global_redundant_expert_num kwargs_mc2 = { # ... "moe_expert_num": moe_expert_num, # ... }
This change will make the data flow explicit and improve the component's maintainability.
|
👋 Hi! Thank you for contributing to the vLLM Ascend project. The following points will speed up your PR merge:
If CI fails, you can run linting and testing checks locally according Contributing and Testing. |
Validation ResultWe conducted comparative tests on the Qwen model under two configurations to verify the fix for expert map shape mismatch:
ConclusionThe test results confirm that:
|
75b4896 to
701f2f7
Compare
|
This pull request has conflicts, please resolve those before we can evaluate the pull request. |
|
This pull request has conflicts, please resolve those before we can evaluate the pull request. |
|
This pull request has conflicts, please resolve those before we can evaluate the pull request. |
7f1c513 to
99ed3fd
Compare
99ed3fd to
493ed35
Compare
|
This pull request has conflicts, please resolve those before we can evaluate the pull request. |
|
This pull request has conflicts, please resolve those before we can evaluate the pull request. |
|
This pull request has conflicts, please resolve those before we can evaluate the pull request. |
ce6efb4 to
0b6a8ec
Compare
|
This pull request has conflicts, please resolve those before we can evaluate the pull request. |
Signed-off-by: Che Ruan <cr623@ic.ac.uk>
Signed-off-by: shenchuxiaofugui <1311027364@qq.com> Signed-off-by: Che Ruan <cr623@ic.ac.uk>
Signed-off-by: Che Ruan <cr623@ic.ac.uk>
Signed-off-by: Che Ruan <cr623@ic.ac.uk>
Signed-off-by: Che Ruan <cr623@ic.ac.uk> rebase
d525d59 to
163cdd1
Compare
…ment (vllm-project#5285) #### Overview This PR fixes a shape mismatch bug between `expert_placement_map` and `log2phy_expert_map` when **redundant experts** are enabled in the vLLM-Ascend platform. The issue occurred during the initialization of expert maps and their updates via EPLB (Expert Load Balancer) adjustment, leading to potential tensor shape errors and incorrect expert routing in distributed MoE deployments. #### Key Changes 1. **Unify expert map shape calculation logic** - Ensure the shape of `expert_placement_map` and `log2phy_expert_map` strictly aligns with the total number of experts (including redundant experts) during initialization. - Update the shape adjustment logic in EPLB dynamic update process to match the initial expert map dimensions. 2. **Add shape consistency checks** - Add assertion statements to verify the shape consistency of the two maps after initialization and EPLB adjustment, preventing silent shape mismatches in subsequent operations. #### Impact - Resolves tensor shape errors when using redundant experts with EPLB on Ascend platform. - Ensures correct expert routing and load balancing for MoE models with redundant expert configurations. - No breaking changes to existing functionality; compatible with non-redundant expert deployments. - vLLM version: release/v0.13.0 - vLLM main: vllm-project/vllm@ad32e3e --------- Signed-off-by: Che Ruan <cr623@ic.ac.uk> Signed-off-by: shenchuxiaofugui <1311027364@qq.com> Co-authored-by: Che Ruan <cr623@ic.ac.uk> Co-authored-by: shenchuxiaofugui <1311027364@qq.com>
…ment (vllm-project#5285) #### Overview This PR fixes a shape mismatch bug between `expert_placement_map` and `log2phy_expert_map` when **redundant experts** are enabled in the vLLM-Ascend platform. The issue occurred during the initialization of expert maps and their updates via EPLB (Expert Load Balancer) adjustment, leading to potential tensor shape errors and incorrect expert routing in distributed MoE deployments. #### Key Changes 1. **Unify expert map shape calculation logic** - Ensure the shape of `expert_placement_map` and `log2phy_expert_map` strictly aligns with the total number of experts (including redundant experts) during initialization. - Update the shape adjustment logic in EPLB dynamic update process to match the initial expert map dimensions. 2. **Add shape consistency checks** - Add assertion statements to verify the shape consistency of the two maps after initialization and EPLB adjustment, preventing silent shape mismatches in subsequent operations. #### Impact - Resolves tensor shape errors when using redundant experts with EPLB on Ascend platform. - Ensures correct expert routing and load balancing for MoE models with redundant expert configurations. - No breaking changes to existing functionality; compatible with non-redundant expert deployments. - vLLM version: release/v0.13.0 - vLLM main: vllm-project/vllm@ad32e3e --------- Signed-off-by: Che Ruan <cr623@ic.ac.uk> Signed-off-by: shenchuxiaofugui <1311027364@qq.com> Co-authored-by: Che Ruan <cr623@ic.ac.uk> Co-authored-by: shenchuxiaofugui <1311027364@qq.com>
…ment (vllm-project#5285) #### Overview This PR fixes a shape mismatch bug between `expert_placement_map` and `log2phy_expert_map` when **redundant experts** are enabled in the vLLM-Ascend platform. The issue occurred during the initialization of expert maps and their updates via EPLB (Expert Load Balancer) adjustment, leading to potential tensor shape errors and incorrect expert routing in distributed MoE deployments. #### Key Changes 1. **Unify expert map shape calculation logic** - Ensure the shape of `expert_placement_map` and `log2phy_expert_map` strictly aligns with the total number of experts (including redundant experts) during initialization. - Update the shape adjustment logic in EPLB dynamic update process to match the initial expert map dimensions. 2. **Add shape consistency checks** - Add assertion statements to verify the shape consistency of the two maps after initialization and EPLB adjustment, preventing silent shape mismatches in subsequent operations. #### Impact - Resolves tensor shape errors when using redundant experts with EPLB on Ascend platform. - Ensures correct expert routing and load balancing for MoE models with redundant expert configurations. - No breaking changes to existing functionality; compatible with non-redundant expert deployments. - vLLM version: release/v0.13.0 - vLLM main: vllm-project/vllm@ad32e3e --------- Signed-off-by: Che Ruan <cr623@ic.ac.uk> Signed-off-by: shenchuxiaofugui <1311027364@qq.com> Co-authored-by: Che Ruan <cr623@ic.ac.uk> Co-authored-by: shenchuxiaofugui <1311027364@qq.com> Signed-off-by: zrj026 <zhangrunjiang026@gmail.com>
…ment (vllm-project#5285) #### Overview This PR fixes a shape mismatch bug between `expert_placement_map` and `log2phy_expert_map` when **redundant experts** are enabled in the vLLM-Ascend platform. The issue occurred during the initialization of expert maps and their updates via EPLB (Expert Load Balancer) adjustment, leading to potential tensor shape errors and incorrect expert routing in distributed MoE deployments. #### Key Changes 1. **Unify expert map shape calculation logic** - Ensure the shape of `expert_placement_map` and `log2phy_expert_map` strictly aligns with the total number of experts (including redundant experts) during initialization. - Update the shape adjustment logic in EPLB dynamic update process to match the initial expert map dimensions. 2. **Add shape consistency checks** - Add assertion statements to verify the shape consistency of the two maps after initialization and EPLB adjustment, preventing silent shape mismatches in subsequent operations. #### Impact - Resolves tensor shape errors when using redundant experts with EPLB on Ascend platform. - Ensures correct expert routing and load balancing for MoE models with redundant expert configurations. - No breaking changes to existing functionality; compatible with non-redundant expert deployments. - vLLM version: release/v0.13.0 - vLLM main: vllm-project/vllm@ad32e3e --------- Signed-off-by: Che Ruan <cr623@ic.ac.uk> Signed-off-by: shenchuxiaofugui <1311027364@qq.com> Co-authored-by: Che Ruan <cr623@ic.ac.uk> Co-authored-by: shenchuxiaofugui <1311027364@qq.com>
…ment (vllm-project#5285) #### Overview This PR fixes a shape mismatch bug between `expert_placement_map` and `log2phy_expert_map` when **redundant experts** are enabled in the vLLM-Ascend platform. The issue occurred during the initialization of expert maps and their updates via EPLB (Expert Load Balancer) adjustment, leading to potential tensor shape errors and incorrect expert routing in distributed MoE deployments. #### Key Changes 1. **Unify expert map shape calculation logic** - Ensure the shape of `expert_placement_map` and `log2phy_expert_map` strictly aligns with the total number of experts (including redundant experts) during initialization. - Update the shape adjustment logic in EPLB dynamic update process to match the initial expert map dimensions. 2. **Add shape consistency checks** - Add assertion statements to verify the shape consistency of the two maps after initialization and EPLB adjustment, preventing silent shape mismatches in subsequent operations. #### Impact - Resolves tensor shape errors when using redundant experts with EPLB on Ascend platform. - Ensures correct expert routing and load balancing for MoE models with redundant expert configurations. - No breaking changes to existing functionality; compatible with non-redundant expert deployments. - vLLM version: release/v0.13.0 - vLLM main: vllm-project/vllm@ad32e3e --------- Signed-off-by: Che Ruan <cr623@ic.ac.uk> Signed-off-by: shenchuxiaofugui <1311027364@qq.com> Co-authored-by: Che Ruan <cr623@ic.ac.uk> Co-authored-by: shenchuxiaofugui <1311027364@qq.com> Signed-off-by: zrj026 <zhangrunjiang026@gmail.com>
…ment (vllm-project#5285) #### Overview This PR fixes a shape mismatch bug between `expert_placement_map` and `log2phy_expert_map` when **redundant experts** are enabled in the vLLM-Ascend platform. The issue occurred during the initialization of expert maps and their updates via EPLB (Expert Load Balancer) adjustment, leading to potential tensor shape errors and incorrect expert routing in distributed MoE deployments. #### Key Changes 1. **Unify expert map shape calculation logic** - Ensure the shape of `expert_placement_map` and `log2phy_expert_map` strictly aligns with the total number of experts (including redundant experts) during initialization. - Update the shape adjustment logic in EPLB dynamic update process to match the initial expert map dimensions. 2. **Add shape consistency checks** - Add assertion statements to verify the shape consistency of the two maps after initialization and EPLB adjustment, preventing silent shape mismatches in subsequent operations. #### Impact - Resolves tensor shape errors when using redundant experts with EPLB on Ascend platform. - Ensures correct expert routing and load balancing for MoE models with redundant expert configurations. - No breaking changes to existing functionality; compatible with non-redundant expert deployments. - vLLM version: release/v0.13.0 - vLLM main: vllm-project/vllm@ad32e3e --------- Signed-off-by: Che Ruan <cr623@ic.ac.uk> Signed-off-by: shenchuxiaofugui <1311027364@qq.com> Co-authored-by: Che Ruan <cr623@ic.ac.uk> Co-authored-by: shenchuxiaofugui <1311027364@qq.com>
Overview
This PR fixes a shape mismatch bug between
expert_placement_mapandlog2phy_expert_mapwhen redundant experts are enabled in the vLLM-Ascend platform. The issue occurred during the initialization of expert maps and their updates via EPLB (Expert Load Balancer) adjustment, leading to potential tensor shape errors and incorrect expert routing in distributed MoE deployments.Key Changes
Unify expert map shape calculation logic
expert_placement_mapandlog2phy_expert_mapstrictly aligns with the total number of experts (including redundant experts) during initialization.Add shape consistency checks
Impact
Resolves tensor shape errors when using redundant experts with EPLB on Ascend platform.
Ensures correct expert routing and load balancing for MoE models with redundant expert configurations.
No breaking changes to existing functionality; compatible with non-redundant expert deployments.
vLLM version: release/v0.13.0
vLLM main: vllm-project/vllm@ad32e3e