[EPLB][Nightly][Bugfix] Get expert from moe layer only by shenchuxiaofugui · Pull Request #5908 · vllm-project/vllm-ascend

shenchuxiaofugui · 2026-01-15T01:50:59Z

What this PR does / why we need it?

If the model has dense layers, the current code will attempt to obtain the routing experts of the dense layers, which will cause an error. This should be fixed by modifying the code to skip the dense layers when obtaining the routing experts.
The global_expert_map that the function directly outputs a affects the performance of dsv3.2.

Does this PR introduce any user-facing change?

How was this patch tested?

DeepSeek V3.1 conversation is normal.
del":"dsr1","choices":[{"index":0,"message":{"role":"assistant","content":"Of course. Here is a clear and comprehensive explanation of deep learning.\n\n### What is Deep Learning? (In a Nutshell)\n\nDeep learning is a subset of machine learning that uses artificial neural networks with many layers ("deep" networks) to learn and make intelligent decisions from vast amounts of data.\n\nThink of it as teaching a computer to recognize patterns by example, much like how a human child learns to identify a cat by seeing many pictures of cats.\n\n---\n\n### The Core Idea: Mimicking the Brain (Very Loosely)\n\nAt the heart of deep learning are Artificial Neural Networks (ANNs), which are inspired by the structure of the human brain.\n\n* Neurons: The building blocks are artificial neurons (or nodes), which are simple computational units.\n* Layers: These neurons are arranged in layers:\n * Input Layer: Receives the raw data (e.g., pixels of an image, words of a sentence).\n * Hidden Layers: These are the "deep" part. A network can have dozens or even hundreds of these layers. Each layer processes the input from the previous layer, extracts increasingly complex features, and passes it on.\n * Output Layer: Produces the final result (e.g., the label "cat," the translated sentence, a probability score).\n\n\nA simple visualization of a deep neural network with multiple hidden layers.\n\n### How Does it Learn?\n\nThe "learning" happens through a process","refus

aime precision test (dsv3.1)

baseline without eplb

dataset	version	metric	mode	vllm-api-general-chat
aime2024	604a78	accuracy	gen	66.67

eplb

dataset	version	metric	mode	vllm-api-general-chat
aime2024	604a78	accuracy	gen	70.00

dsv3.2 performance

vLLM version: v0.13.0
vLLM main: vllm-project/vllm@11b6af5

github-actions · 2026-01-15T01:51:13Z

👋 Hi! Thank you for contributing to the vLLM Ascend project. The following points will speed up your PR merge:‌‌

A PR should do only one thing, smaller PRs enable faster reviews.
Every PR should include unit tests and end-to-end tests ‌to ensure it works and is not broken by other future PRs.
Write the commit message by fulfilling the PR description to help reviewer and future developers understand.

If CI fails, you can run linting and testing checks locally according Contributing and Testing.

gemini-code-assist

Code Review

This pull request correctly fixes a bug in get_global_expert_map where an incorrect layer index was used to access MoE layers. The change ensures that the correct offset (num_dense_layers) is applied, making the layer access consistent with other parts of the code. I have added a suggestion to further improve the code's readability and maintainability by refactoring the loop to avoid code duplication, which will help prevent similar bugs in the future.

Signed-off-by: shenchuxiaofugui <1311027364@qq.com>

…to FIA_rebase * 'main' of https://github.com/vllm-project/vllm-ascend: (110 commits) [Performance] Remove index opetation when VLLM_ASCEND_FLASHCOMM2_PARALLEL_SIZE=1 (vllm-project#5936) [main][bugfix] fix mooncake kv cache transfer when one P has multi nodes (vllm-project#5960) [Feature] Adapt DispathGmmCombineDecode opertor to align with weight scale dtype of small operators. [RFC: issue 5476] (vllm-project#5755) [Refactor] Move AttentionSpec initialization to Attention module (vllm-project#5834) [EPLB][Bugfix] policy_swift_balancer bugfix and renaming (vllm-project#5897) [CI]fix for lint CI (vllm-project#5982) [Fusion] [Graph]Add Matmul Allreduce Rmsnorm fusion Pass (vllm-project#5034) [Refactor] Migrate profiler config from env vars to explicit ProfilerConfig (vllm-project#5928) [EPLB][Bugfix] Dispatch Allgather use log2phy if enable eplb (vllm-project#5933) [EPLB][Nightly][Bugfix] Get expert from moe layer only (vllm-project#5908) [Bugfix][MM] Fix multi-modal inference OOM issues by setting `expandable_segments:True` (vllm-project#5855) [doc]Table split (vllm-project#5929) [Doc] Upgrade outdated ut doc (vllm-project#5937) [Lint]Style: Convert `vllm-ascend/` to ruff format(Batch vllm-project#2) (vllm-project#5977) Eagle3 mm support, enablement on qwen3vl (vllm-project#4848) [Doc] Remove Chinese characters from the icons in the doc. (vllm-project#5959) [P/D]The issue of solving the force-free secondary release request, which causes the node to crash. (vllm-project#5968) [Feature] Support fine-grained shared expert overlap (vllm-project#5482) [Bugfix] fix cpu offload hang with tp=1 (vllm-project#5963) [Feature]: Support 310P device run qwen2.5/3 dense and qwen2.5vl models (vllm-project#5776) ...

…to qwen3next_rebase * 'main' of https://github.com/vllm-project/vllm-ascend: (637 commits) [Performance] Remove index opetation when VLLM_ASCEND_FLASHCOMM2_PARALLEL_SIZE=1 (vllm-project#5936) [main][bugfix] fix mooncake kv cache transfer when one P has multi nodes (vllm-project#5960) [Feature] Adapt DispathGmmCombineDecode opertor to align with weight scale dtype of small operators. [RFC: issue 5476] (vllm-project#5755) [Refactor] Move AttentionSpec initialization to Attention module (vllm-project#5834) [EPLB][Bugfix] policy_swift_balancer bugfix and renaming (vllm-project#5897) [CI]fix for lint CI (vllm-project#5982) [Fusion] [Graph]Add Matmul Allreduce Rmsnorm fusion Pass (vllm-project#5034) [Refactor] Migrate profiler config from env vars to explicit ProfilerConfig (vllm-project#5928) [EPLB][Bugfix] Dispatch Allgather use log2phy if enable eplb (vllm-project#5933) [EPLB][Nightly][Bugfix] Get expert from moe layer only (vllm-project#5908) [Bugfix][MM] Fix multi-modal inference OOM issues by setting `expandable_segments:True` (vllm-project#5855) [doc]Table split (vllm-project#5929) [Doc] Upgrade outdated ut doc (vllm-project#5937) [Lint]Style: Convert `vllm-ascend/` to ruff format(Batch vllm-project#2) (vllm-project#5977) Eagle3 mm support, enablement on qwen3vl (vllm-project#4848) [Doc] Remove Chinese characters from the icons in the doc. (vllm-project#5959) [P/D]The issue of solving the force-free secondary release request, which causes the node to crash. (vllm-project#5968) [Feature] Support fine-grained shared expert overlap (vllm-project#5482) [Bugfix] fix cpu offload hang with tp=1 (vllm-project#5963) [Feature]: Support 310P device run qwen2.5/3 dense and qwen2.5vl models (vllm-project#5776) ...

…5908) ### What this PR does / why we need it? 1. If the model has dense layers, the current code will attempt to obtain the routing experts of the dense layers, which will cause an error. This should be fixed by modifying the code to skip the dense layers when obtaining the routing experts. 2. The global_expert_map that the function directly outputs a affects the performance of dsv3.2. ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? DeepSeek V3.1 conversation is normal. #### aime precision test (dsv3.1) baseline without eplb | dataset | version | metric | mode | vllm-api-general-chat | |----- | ----- | ----- | ----- | -----| | aime2024 | 604a78 | accuracy | gen | 66.67 | eplb | dataset | version | metric | mode | vllm-api-general-chat | |----- | ----- | ----- | ----- | -----| | aime2024 | 604a78 | accuracy | gen | 70.00 | - vLLM version: v0.13.0 - vLLM main: vllm-project/vllm@11b6af5 Signed-off-by: shenchuxiaofugui <1311027364@qq.com>

…5908) ### What this PR does / why we need it? 1. If the model has dense layers, the current code will attempt to obtain the routing experts of the dense layers, which will cause an error. This should be fixed by modifying the code to skip the dense layers when obtaining the routing experts. 2. The global_expert_map that the function directly outputs a affects the performance of dsv3.2. ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? DeepSeek V3.1 conversation is normal. #### aime precision test (dsv3.1) baseline without eplb | dataset | version | metric | mode | vllm-api-general-chat | |----- | ----- | ----- | ----- | -----| | aime2024 | 604a78 | accuracy | gen | 66.67 | eplb | dataset | version | metric | mode | vllm-api-general-chat | |----- | ----- | ----- | ----- | -----| | aime2024 | 604a78 | accuracy | gen | 70.00 | - vLLM version: v0.13.0 - vLLM main: vllm-project/vllm@11b6af5 Signed-off-by: shenchuxiaofugui <1311027364@qq.com> Signed-off-by: zrj026 <zhangrunjiang026@gmail.com>

…5908) ### What this PR does / why we need it? 1. If the model has dense layers, the current code will attempt to obtain the routing experts of the dense layers, which will cause an error. This should be fixed by modifying the code to skip the dense layers when obtaining the routing experts. 2. The global_expert_map that the function directly outputs a affects the performance of dsv3.2. ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? DeepSeek V3.1 conversation is normal. #### aime precision test (dsv3.1) baseline without eplb | dataset | version | metric | mode | vllm-api-general-chat | |----- | ----- | ----- | ----- | -----| | aime2024 | 604a78 | accuracy | gen | 66.67 | eplb | dataset | version | metric | mode | vllm-api-general-chat | |----- | ----- | ----- | ----- | -----| | aime2024 | 604a78 | accuracy | gen | 70.00 | - vLLM version: v0.13.0 - vLLM main: vllm-project/vllm@11b6af5 Signed-off-by: shenchuxiaofugui <1311027364@qq.com>

…5908) ### What this PR does / why we need it? 1. If the model has dense layers, the current code will attempt to obtain the routing experts of the dense layers, which will cause an error. This should be fixed by modifying the code to skip the dense layers when obtaining the routing experts. 2. The global_expert_map that the function directly outputs a affects the performance of dsv3.2. ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? DeepSeek V3.1 conversation is normal. #### aime precision test (dsv3.1) baseline without eplb | dataset | version | metric | mode | vllm-api-general-chat | |----- | ----- | ----- | ----- | -----| | aime2024 | 604a78 | accuracy | gen | 66.67 | eplb | dataset | version | metric | mode | vllm-api-general-chat | |----- | ----- | ----- | ----- | -----| | aime2024 | 604a78 | accuracy | gen | 70.00 | - vLLM version: v0.13.0 - vLLM main: vllm-project/vllm@11b6af5 Signed-off-by: shenchuxiaofugui <1311027364@qq.com> Signed-off-by: zrj026 <zhangrunjiang026@gmail.com>

…5908) ### What this PR does / why we need it? 1. If the model has dense layers, the current code will attempt to obtain the routing experts of the dense layers, which will cause an error. This should be fixed by modifying the code to skip the dense layers when obtaining the routing experts. 2. The global_expert_map that the function directly outputs a affects the performance of dsv3.2. ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? DeepSeek V3.1 conversation is normal. #### aime precision test (dsv3.1) baseline without eplb | dataset | version | metric | mode | vllm-api-general-chat | |----- | ----- | ----- | ----- | -----| | aime2024 | 604a78 | accuracy | gen | 66.67 | eplb | dataset | version | metric | mode | vllm-api-general-chat | |----- | ----- | ----- | ----- | -----| | aime2024 | 604a78 | accuracy | gen | 70.00 | - vLLM version: v0.13.0 - vLLM main: vllm-project/vllm@11b6af5 Signed-off-by: shenchuxiaofugui <1311027364@qq.com>

shenchuxiaofugui requested a review from wangxiyuan as a code owner January 15, 2026 01:51

gemini-code-assist bot reviewed Jan 15, 2026

View reviewed changes

Comment thread vllm_ascend/eplb/adaptor/vllm_adaptor.py

shenchuxiaofugui mentioned this pull request Jan 15, 2026

[RFC]: Expert Parallelism Load Balancer #5633

Open

18 tasks

shenchuxiaofugui force-pushed the 0115 branch from 3ccd31d to a868cad Compare January 15, 2026 07:21

shenchuxiaofugui requested review from realliujiaxu and zzzzwwjj as code owners January 15, 2026 07:21

shenchuxiaofugui force-pushed the 0115 branch from a868cad to ce65f4a Compare January 15, 2026 08:14

wangxiyuan approved these changes Jan 15, 2026

View reviewed changes

wangxiyuan added ready read for review ready-for-test start test by label for PR labels Jan 15, 2026

wangxiyuan enabled auto-merge (squash) January 15, 2026 08:51

auto-merge was automatically disabled January 15, 2026 11:43
Head branch was pushed to by a user without write access

shenchuxiaofugui force-pushed the 0115 branch 2 times, most recently from 85013ea to 712ac66 Compare January 16, 2026 01:38

[EPLB][Nightly][Bugfix] Get expert from moe layer only

d6e194f

Signed-off-by: shenchuxiaofugui <1311027364@qq.com>

shenchuxiaofugui force-pushed the 0115 branch from 712ac66 to d6e194f Compare January 18, 2026 15:21

wangxiyuan merged commit 9fed263 into vllm-project:main Jan 19, 2026
20 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[EPLB][Nightly][Bugfix] Get expert from moe layer only#5908

[EPLB][Nightly][Bugfix] Get expert from moe layer only#5908
wangxiyuan merged 1 commit intovllm-project:mainfrom
shenchuxiaofugui:0115

shenchuxiaofugui commented Jan 15, 2026 •

edited

Loading

Uh oh!

github-actions bot commented Jan 15, 2026

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

shenchuxiaofugui commented Jan 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What this PR does / why we need it?

Does this PR introduce any user-facing change?

How was this patch tested?

aime precision test (dsv3.1)

Uh oh!

github-actions bot commented Jan 15, 2026

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

shenchuxiaofugui commented Jan 15, 2026 •

edited

Loading