[EPLB][refactor] Modification of the initialization logic for expert_map and log2phy（depend on pr5285） by shenchuxiaofugui · Pull Request #5311 · vllm-project/vllm-ascend

shenchuxiaofugui · 2025-12-24T02:49:58Z

What this PR does / why we need it?

Unify the loading logic for expert_map and log2phy.

The map generated when enabling the redundancy expert is incorrect.
The community generation map function only accepts the number of global experts. When we pass in the number of logical experts plus redundant experts, the local expert ID of the last card will index to an expert ID that does not exist. Now we ensure that the index points to a real existing expert ID, and each expert can be accessed. Moreover, when redundant experts are not enabled, the output of our function remains consistent with the community's function.
The map we generate is based on the length of the physical expert, but in reality, we only need to use the length of the logical expert. Later on, we will need to pad it accordingly, so we can simply generate a map with the length of the logical [expert.]
Unify the initialization logic across different scenarios and simplify the code for fused_moe.

Before refactoring

map path is not None：

expert map: get_rank_placement_map from 'expert_load_balancer.py', maintains the map for all ranks and all layers.

log2phy: get_rank_log2phy_map from 'expert_load_balancer.py', maintains the map for all ranks and all layers.
map path is None:

expert map: determine_expert_map from '_vllm.laye_r', The function does not support the redundant experts of vllm-ascend.
log2phy: determine_default_log2phy_map from 'eplb_utils.py'. The function does not support the redundant experts of vllm-ascend.

Refactoring
eplb_utils.py
    init_eplb_config
         generate placement
         generate expert map
         generate log2phy

Does this PR introduce any user-facing change?

No

How was this patch tested?

Expert Mapping Test Generation:
ep size: 16, num of experts: 256, num of redundant experts: 16
+++++++++++++++++++++++++++++++++++++++++
Expert Mapping (Non-1 indicates the expert responsible for this rank) for Rank 15:
vllm map：
[-1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1
-1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1
-1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 0 1 2 3 4 5 6 7 8
9 10 11 12 13 14 15 16]
+++++++++++++++++++++++++++++++++++++++++
Improved map：
[16 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1
-1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15]

Expert Mapping Test Generation:
ep size: 16, num of experts: 256, num of redundant experts: 0
+++++++++++++++++++++++++++++++++++++++++
Expert Mapping (Non-1 indicates the expert responsible for this rank) for Rank 15:
vllm map：
[-1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1
-1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15]
+++++++++++++++++++++++++++++++++++++++
Improved map：
[-1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1
-1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15]

dsr1 baselie：

dataset	version	metric	mode	vllm-api-general-chat
gsm8k-lite	7cd45e	accuracy	gen	100.00

dsr1 eplb：

dataset	version	metric	mode	vllm-api-general-chat
gsm8k-lite	7cd45e	accuracy	gen	100.00

vLLM version: release/v0.13.0
vLLM main: vllm-project/vllm@5fbfa8d

gemini-code-assist

Code Review

This pull request refactors the initialization logic for expert_map and log2phy to fix bugs related to redundant experts. The refactoring is a good step towards simplifying the logic. However, I've identified several critical issues in the new implementation that could lead to runtime errors. These issues are primarily related to incorrect tensor creation from lists of varying lengths and improper attribute handling, which need to be addressed.

gemini-code-assist · 2025-12-24T02:52:42Z

+    log2phy_map = torch.scatter(torch.zeros(len(log2phy_map.keys()), dtype=torch.int32),
+                                    0, 
+                                    torch.tensor(list(log2phy_map.keys()), dtype=torch.int64),
+                                    torch.tensor(list(log2phy_map.values()), dtype=torch.int32)
+                                    )


The torch.zeros tensor is initialized with a size of len(log2phy_map.keys()). If the expert indices in log2phy_map.keys() are not contiguous from 0, the maximum index could be greater than len(log2phy_map.keys()) - 1. This will cause an out-of-bounds error when torch.scatter is called. The tensor should be initialized with the total number of experts, which can be obtained from len(expert_list[0]).

Suggested change

log2phy_map = torch.scatter(torch.zeros(len(log2phy_map.keys()), dtype=torch.int32),

0,

torch.tensor(list(log2phy_map.keys()), dtype=torch.int64),

torch.tensor(list(log2phy_map.values()), dtype=torch.int32)

)

log2phy_map = torch.scatter(torch.zeros(len(expert_list[0]), dtype=torch.int32),

0,

torch.tensor(list(log2phy_map.keys()), dtype=torch.int64),

torch.tensor(list(log2phy_map.values()), dtype=torch.int32)

)

gemini-code-assist · 2025-12-24T02:52:42Z

        self.moe_config.dp_group = get_dp_group()
        self.moe_config.ep_group = get_ep_group()
        self.moe_config.mc2_group = get_mc2_group()
+        self.supports_eplb = self.quant_method.supports_eplb


The supports_eplb attribute is being set on self (the AscendFusedMoE instance), but it is later accessed from moe_config inside init_eplb_config. This will lead to an AttributeError because moe_config (an instance of FusedMoEConfig) does not have this attribute.

Suggested change

self.supports_eplb = self.quant_method.supports_eplb

self.moe_config.supports_eplb = self.quant_method.supports_eplb

gemini-code-assist · 2025-12-24T02:52:42Z

    def __init__(self):
-        pass
+        self.transpose_weight = True
+        self.supports_eplb = True


The supports_eplb attribute is being added to AscendW8A8DynamicLinearMethod, which is for general linear layers. However, EPLB (Expert Placement and Load Balancing) is a feature specific to Mixture-of-Experts (MoE) layers. This attribute should be set on AscendW8A8DynamicFusedMoEMethod instead. Please remove self.supports_eplb = True from this __init__ method and add it to the __init__ method of AscendW8A8DynamicFusedMoEMethod.

github-actions · 2025-12-24T03:40:28Z

👋 Hi! Thank you for contributing to the vLLM Ascend project. The following points will speed up your PR merge:‌‌

A PR should do only one thing, smaller PRs enable faster reviews.
Every PR should include unit tests and end-to-end tests ‌to ensure it works and is not broken by other future PRs.
Write the commit message by fulfilling the PR description to help reviewer and future developers understand.

If CI fails, you can run linting and testing checks locally according Contributing and Testing.

Signed-off-by: shenchuxiaofugui <1311027364@qq.com>

…to eplb_refactor * 'main' of https://github.com/vllm-project/vllm-ascend: (46 commits) [Feature] Support to use fullgraph with eagle (vllm-project#5118) [EPLB][refactor] Modification of the initialization logic for expert_map and log2phy（depend on pr5285） (vllm-project#5311) [Refactor]6/N Extract common code of class AscendMLAImpl (vllm-project#5314) [Refactor] cache cos/sin in mla & remove parameter model in builder. (vllm-project#5277) update vllm pin to 12.27 (vllm-project#5412) [ReleaseNote] Add release note for v0.13.0rc1 (vllm-project#5334) [Bugfix] Correctly handle the output shape in multimodal attention (vllm-project#5443) Fix nightly (vllm-project#5413) [bugfix] fix typo of _skip_all_reduce_across_dp_group (vllm-project#5435) [Doc]modify pcp tutorial doc (vllm-project#5440) [Misc] fast fail for exiting if tools/install_flash_infer_attention_score_ops_a2.sh (vllm-project#5422) [Doc] Update DeepSeek V3.1/R1 2P1D doc (vllm-project#5387) [DOC]Fix model weight download links (vllm-project#5436) [Doc] Modify DeepSeek-R1/V3.1 documentation (vllm-project#5426) Revert "[feat] enable hierarchical mc2 ops on A2 by default (vllm-project#5300)" (vllm-project#5434) [Bugfix] fix greedy temperature detection (vllm-project#5417) [doc] Update Qwen3-235B doc for reproducing latest performance (vllm-project#5323) [feat] enable hierarchical mc2 ops on A2 by default (vllm-project#5300) [Doc] delete environment variable HCCL_OP_EXPANSION_MODE in DeepSeekV3.1/R1 (vllm-project#5419) [Doc] add long_sequence feature user guide (vllm-project#5343) ...

…map and log2phy（depend on pr5285） (vllm-project#5311) ### What this PR does / why we need it? Unify the loading logic for expert_map and log2phy. 1. The map generated when enabling the redundancy expert is incorrect. The community generation map function only accepts the number of global experts. When we pass in the number of logical experts plus redundant experts, the local expert ID of the last card will index to an expert ID that does not exist. Now we ensure that the index points to a real existing expert ID, and each expert can be accessed. Moreover, when redundant experts are not enabled, the output of our function remains consistent with the community's function. 2. The map we generate is based on the length of the physical expert, but in reality, we only need to use the length of the logical expert. Later on, we will need to pad it accordingly, so we can simply generate a map with the length of the logical [expert.] 3. Unify the initialization logic across different scenarios and simplify the code for fused_moe. **Before refactoring** - map path is not None： expert map: get_rank_placement_map from _'expert_load_balancer.py'_, maintains the map for all ranks and all layers. log2phy: get_rank_log2phy_map from _'expert_load_balancer.py'_, maintains the map for all ranks and all layers. - map path is None: expert map: determine_expert_map from '_vllm.laye_r', The function does not support the redundant experts of vllm-ascend. log2phy: determine_default_log2phy_map from _'eplb_utils.py'_. The function does not support the redundant experts of vllm-ascend. **Refactoring** eplb_utils.py     init_eplb_config          generate placement          generate expert map          generate log2phy ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Expert Mapping Test Generation: ep size: 16, num of experts: 256, num of redundant experts: 16 +++++++++++++++++++++++++++++++++++++++++ Expert Mapping (Non-1 indicates the expert responsible for this rank) for Rank 15: vllm map： [-1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16] +++++++++++++++++++++++++++++++++++++++++ Improved map： [16 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15] Expert Mapping Test Generation: ep size: 16, num of experts: 256, num of redundant experts: 0 +++++++++++++++++++++++++++++++++++++++++ Expert Mapping (Non-1 indicates the expert responsible for this rank) for Rank 15: vllm map： [-1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15] +++++++++++++++++++++++++++++++++++++++ Improved map： [-1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15] dsr1 baselie： | dataset | version | metric | mode | vllm-api-general-chat | |----- | ----- | ----- | ----- | -----| | gsm8k-lite | 7cd45e | accuracy | gen | 100.00 | dsr1 eplb： | dataset | version | metric | mode | vllm-api-general-chat | |----- | ----- | ----- | ----- | -----| | gsm8k-lite | 7cd45e | accuracy | gen | 100.00 | - vLLM version: release/v0.13.0 - vLLM main: vllm-project/vllm@5fbfa8d Signed-off-by: shenchuxiaofugui <1311027364@qq.com> Co-authored-by: weijinqian0 <1184188277@qq.com> Signed-off-by: Che Ruan <cr623@ic.ac.uk>

jianzs · 2025-12-29T16:15:26Z

Please! How can I enable EPLB with redundant experts and MTP?

shenchuxiaofugui · 2025-12-30T01:32:12Z

Please! How can I enable EPLB with redundant experts and MTP?

depend on #5285 （unmerged）
The MTP layer must maintain the same number of redundant experts as the other layers. Therefore, either the incoming expert_map includes the MTP layer, or you set init_redundancy_expert to be consistent with the other layers in the table.

… expert_map and log2phy（depend on pr5285） (vllm-project#5311)" This reverts commit f81cf69.

… expert_map and log2phy（depend on pr5285） (vllm-project#5311)" This reverts commit f81cf69. Signed-off-by: hfadzxy <starmoon_zhang@163.com>

…to FIA_rebase * 'main' of https://github.com/vllm-project/vllm-ascend: (88 commits) [1/N] Refactor nightly test structure (vllm-project#5479) Docs: Remove deprecated --task parameter for embedding models (vllm-project#5257) Revert "moe_gating_top_k" (vllm-project#5512) [Doc] Fix issue link for 0.12.0 (vllm-project#5500) [CI]update triton ascend version (vllm-project#5392) moe_gating_top_k (vllm-project#5271) [refactor] refactor model runner capture model (vllm-project#5230) Update corresponding vllm commit ID to 12 29 (vllm-project#5475) [Kernel]update csrc cmakelist for open-source cann (vllm-project#5458) [OP] add custom op aclnnMoeInitRoutingCustom (vllm-project#5251) [Refactor][EAGLE] 1/N delete __init__ in mtp_proposer (vllm-project#5176) [Refactor][Triton] Move reject sample triton kernels into ops/triton (vllm-project#5324) [Feature] support eager mode in model runner v2 (vllm-project#5210) [feature] fia support sliding windows (vllm-project#5239) Optimize some rejectsampler functions to make npu op launch non-blocking (vllm-project#4587) [Feature] Support to use fullgraph with eagle (vllm-project#5118) [EPLB][refactor] Modification of the initialization logic for expert_map and log2phy（depend on pr5285） (vllm-project#5311) [Refactor]6/N Extract common code of class AscendMLAImpl (vllm-project#5314) [Refactor] cache cos/sin in mla & remove parameter model in builder. (vllm-project#5277) update vllm pin to 12.27 (vllm-project#5412) ...

…map and log2phy（depend on pr5285） (vllm-project#5311) ### What this PR does / why we need it? Unify the loading logic for expert_map and log2phy. 1. The map generated when enabling the redundancy expert is incorrect. The community generation map function only accepts the number of global experts. When we pass in the number of logical experts plus redundant experts, the local expert ID of the last card will index to an expert ID that does not exist. Now we ensure that the index points to a real existing expert ID, and each expert can be accessed. Moreover, when redundant experts are not enabled, the output of our function remains consistent with the community's function. 2. The map we generate is based on the length of the physical expert, but in reality, we only need to use the length of the logical expert. Later on, we will need to pad it accordingly, so we can simply generate a map with the length of the logical [expert.] 3. Unify the initialization logic across different scenarios and simplify the code for fused_moe. **Before refactoring** - map path is not None： expert map: get_rank_placement_map from _'expert_load_balancer.py'_, maintains the map for all ranks and all layers. log2phy: get_rank_log2phy_map from _'expert_load_balancer.py'_, maintains the map for all ranks and all layers. - map path is None: expert map: determine_expert_map from '_vllm.laye_r', The function does not support the redundant experts of vllm-ascend. log2phy: determine_default_log2phy_map from _'eplb_utils.py'_. The function does not support the redundant experts of vllm-ascend. **Refactoring** eplb_utils.py     init_eplb_config          generate placement          generate expert map          generate log2phy ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Expert Mapping Test Generation: ep size: 16, num of experts: 256, num of redundant experts: 16 +++++++++++++++++++++++++++++++++++++++++ Expert Mapping (Non-1 indicates the expert responsible for this rank) for Rank 15: vllm map： [-1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16] +++++++++++++++++++++++++++++++++++++++++ Improved map： [16 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15] Expert Mapping Test Generation: ep size: 16, num of experts: 256, num of redundant experts: 0 +++++++++++++++++++++++++++++++++++++++++ Expert Mapping (Non-1 indicates the expert responsible for this rank) for Rank 15: vllm map： [-1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15] +++++++++++++++++++++++++++++++++++++++ Improved map： [-1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15] dsr1 baselie： | dataset | version | metric | mode | vllm-api-general-chat | |----- | ----- | ----- | ----- | -----| | gsm8k-lite | 7cd45e | accuracy | gen | 100.00 | dsr1 eplb： | dataset | version | metric | mode | vllm-api-general-chat | |----- | ----- | ----- | ----- | -----| | gsm8k-lite | 7cd45e | accuracy | gen | 100.00 | - vLLM version: release/v0.13.0 - vLLM main: vllm-project/vllm@5fbfa8d Signed-off-by: shenchuxiaofugui <1311027364@qq.com> Co-authored-by: weijinqian0 <1184188277@qq.com> Signed-off-by: zrj026 <zhangrunjiang026@gmail.com>

…map and log2phy（depend on pr5285） (vllm-project#5311) ### What this PR does / why we need it? Unify the loading logic for expert_map and log2phy. 1. The map generated when enabling the redundancy expert is incorrect. The community generation map function only accepts the number of global experts. When we pass in the number of logical experts plus redundant experts, the local expert ID of the last card will index to an expert ID that does not exist. Now we ensure that the index points to a real existing expert ID, and each expert can be accessed. Moreover, when redundant experts are not enabled, the output of our function remains consistent with the community's function. 2. The map we generate is based on the length of the physical expert, but in reality, we only need to use the length of the logical expert. Later on, we will need to pad it accordingly, so we can simply generate a map with the length of the logical [expert.] 3. Unify the initialization logic across different scenarios and simplify the code for fused_moe. **Before refactoring** - map path is not None： expert map: get_rank_placement_map from _'expert_load_balancer.py'_, maintains the map for all ranks and all layers. log2phy: get_rank_log2phy_map from _'expert_load_balancer.py'_, maintains the map for all ranks and all layers. - map path is None: expert map: determine_expert_map from '_vllm.laye_r', The function does not support the redundant experts of vllm-ascend. log2phy: determine_default_log2phy_map from _'eplb_utils.py'_. The function does not support the redundant experts of vllm-ascend. **Refactoring** eplb_utils.py     init_eplb_config          generate placement          generate expert map          generate log2phy ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Expert Mapping Test Generation: ep size: 16, num of experts: 256, num of redundant experts: 16 +++++++++++++++++++++++++++++++++++++++++ Expert Mapping (Non-1 indicates the expert responsible for this rank) for Rank 15: vllm map： [-1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16] +++++++++++++++++++++++++++++++++++++++++ Improved map： [16 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15] Expert Mapping Test Generation: ep size: 16, num of experts: 256, num of redundant experts: 0 +++++++++++++++++++++++++++++++++++++++++ Expert Mapping (Non-1 indicates the expert responsible for this rank) for Rank 15: vllm map： [-1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15] +++++++++++++++++++++++++++++++++++++++ Improved map： [-1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15] dsr1 baselie： | dataset | version | metric | mode | vllm-api-general-chat | |----- | ----- | ----- | ----- | -----| | gsm8k-lite | 7cd45e | accuracy | gen | 100.00 | dsr1 eplb： | dataset | version | metric | mode | vllm-api-general-chat | |----- | ----- | ----- | ----- | -----| | gsm8k-lite | 7cd45e | accuracy | gen | 100.00 | - vLLM version: release/v0.13.0 - vLLM main: vllm-project/vllm@5fbfa8d Signed-off-by: shenchuxiaofugui <1311027364@qq.com> Co-authored-by: weijinqian0 <1184188277@qq.com>

…map and log2phy（depend on pr5285） (vllm-project#5311) ### What this PR does / why we need it? Unify the loading logic for expert_map and log2phy. 1. The map generated when enabling the redundancy expert is incorrect. The community generation map function only accepts the number of global experts. When we pass in the number of logical experts plus redundant experts, the local expert ID of the last card will index to an expert ID that does not exist. Now we ensure that the index points to a real existing expert ID, and each expert can be accessed. Moreover, when redundant experts are not enabled, the output of our function remains consistent with the community's function. 2. The map we generate is based on the length of the physical expert, but in reality, we only need to use the length of the logical expert. Later on, we will need to pad it accordingly, so we can simply generate a map with the length of the logical [expert.] 3. Unify the initialization logic across different scenarios and simplify the code for fused_moe. **Before refactoring** - map path is not None： expert map: get_rank_placement_map from _'expert_load_balancer.py'_, maintains the map for all ranks and all layers. log2phy: get_rank_log2phy_map from _'expert_load_balancer.py'_, maintains the map for all ranks and all layers. - map path is None: expert map: determine_expert_map from '_vllm.laye_r', The function does not support the redundant experts of vllm-ascend. log2phy: determine_default_log2phy_map from _'eplb_utils.py'_. The function does not support the redundant experts of vllm-ascend. **Refactoring** eplb_utils.py     init_eplb_config          generate placement          generate expert map          generate log2phy ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Expert Mapping Test Generation: ep size: 16, num of experts: 256, num of redundant experts: 16 +++++++++++++++++++++++++++++++++++++++++ Expert Mapping (Non-1 indicates the expert responsible for this rank) for Rank 15: vllm map： [-1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16] +++++++++++++++++++++++++++++++++++++++++ Improved map： [16 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15] Expert Mapping Test Generation: ep size: 16, num of experts: 256, num of redundant experts: 0 +++++++++++++++++++++++++++++++++++++++++ Expert Mapping (Non-1 indicates the expert responsible for this rank) for Rank 15: vllm map： [-1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15] +++++++++++++++++++++++++++++++++++++++ Improved map： [-1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15] dsr1 baselie： | dataset | version | metric | mode | vllm-api-general-chat | |----- | ----- | ----- | ----- | -----| | gsm8k-lite | 7cd45e | accuracy | gen | 100.00 | dsr1 eplb： | dataset | version | metric | mode | vllm-api-general-chat | |----- | ----- | ----- | ----- | -----| | gsm8k-lite | 7cd45e | accuracy | gen | 100.00 | - vLLM version: release/v0.13.0 - vLLM main: vllm-project/vllm@5fbfa8d Signed-off-by: shenchuxiaofugui <1311027364@qq.com> Co-authored-by: weijinqian0 <1184188277@qq.com> Signed-off-by: zrj026 <zhangrunjiang026@gmail.com>

gemini-code-assist bot reviewed Dec 24, 2025

View reviewed changes

shenchuxiaofugui force-pushed the config branch from 53c647c to 08477db Compare December 24, 2025 02:56

github-actions bot added module:tests module:ops module:quantization labels Dec 24, 2025

shenchuxiaofugui changed the title ~~[EPLB] Modification of the initialization logic for expert_map and log2phy~~ [EPLB][refactor] Modification of the initialization logic for expert_map and log2phy Dec 24, 2025

shenchuxiaofugui force-pushed the config branch 3 times, most recently from 7596c6b to 40ade36 Compare December 24, 2025 08:21

wangxiyuan approved these changes Dec 24, 2025

View reviewed changes

wangxiyuan added ready read for review ready-for-test start test by label for PR labels Dec 24, 2025

weijinqian0 approved these changes Dec 24, 2025

View reviewed changes

shenchuxiaofugui force-pushed the config branch 8 times, most recently from e418da5 to a131485 Compare December 25, 2025 11:58

[EPLB] Refact

acfc482

Signed-off-by: shenchuxiaofugui <1311027364@qq.com>

shenchuxiaofugui force-pushed the config branch from a131485 to acfc482 Compare December 26, 2025 01:17

shenchuxiaofugui changed the title ~~[EPLB][refactor] Modification of the initialization logic for expert_map and log2phy~~ [EPLB][refactor] Modification of the initialization logic for expert_map and log2phy（depend on pr8285） Dec 26, 2025

shenchuxiaofugui changed the title ~~[EPLB][refactor] Modification of the initialization logic for expert_map and log2phy（depend on pr8285）~~ [EPLB][refactor] Modification of the initialization logic for expert_map and log2phy（depend on pr5285） Dec 26, 2025

Merge branch 'main' into config

92ecff4

weijinqian0 merged commit f81cf69 into vllm-project:main Dec 29, 2025
8 of 10 checks passed

zhangxinyuehfad added a commit to zhangxinyuehfad/vllm-ascend that referenced this pull request Dec 30, 2025

Revert "[EPLB][refactor] Modification of the initialization logic for…

c106590

… expert_map and log2phy（depend on pr5285） (vllm-project#5311)" This reverts commit f81cf69.

zhangxinyuehfad mentioned this pull request Dec 30, 2025

Revert "[EPLB][refactor] Modification of the initialization logic for expert_map and log2phy（depend on pr5285） (#5311)" #5506

Closed

shenchuxiaofugui deleted the config branch January 12, 2026 10:51

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[EPLB][refactor] Modification of the initialization logic for expert_map and log2phy（depend on pr5285）#5311

[EPLB][refactor] Modification of the initialization logic for expert_map and log2phy（depend on pr5285）#5311
weijinqian0 merged 2 commits intovllm-project:mainfrom
shenchuxiaofugui:config

shenchuxiaofugui commented Dec 24, 2025 •

edited

Loading

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

Uh oh!

Uh oh!

gemini-code-assist bot Dec 24, 2025

Uh oh!

gemini-code-assist bot Dec 24, 2025

Uh oh!

gemini-code-assist bot Dec 24, 2025

Uh oh!

github-actions bot commented Dec 24, 2025

Uh oh!

Uh oh!

jianzs commented Dec 29, 2025

Uh oh!

shenchuxiaofugui commented Dec 30, 2025 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

	self.supports_eplb = self.quant_method.supports_eplb
	self.moe_config.supports_eplb = self.quant_method.supports_eplb

Conversation

shenchuxiaofugui commented Dec 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What this PR does / why we need it?

Does this PR introduce any user-facing change?

How was this patch tested?

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

gemini-code-assist bot Dec 24, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Dec 24, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Dec 24, 2025

Choose a reason for hiding this comment

Uh oh!

github-actions bot commented Dec 24, 2025

Uh oh!

Uh oh!

jianzs commented Dec 29, 2025

Uh oh!

shenchuxiaofugui commented Dec 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

shenchuxiaofugui commented Dec 24, 2025 •

edited

Loading

shenchuxiaofugui commented Dec 30, 2025 •

edited

Loading