[Feature] Support fine-grained shared expert overlap by jianzs · Pull Request #5482 · vllm-project/vllm-ascend

jianzs · 2025-12-29T10:13:52Z

What this PR does / why we need it?

Fine-grained control over shared expert overlap to prevent resource contention.

Depends on #5481

Does this PR introduce any user-facing change?

No

How was this patch tested?

vLLM version: v0.13.0
vLLM main: vllm-project/vllm@5326c89

gemini-code-assist

Code Review

This pull request introduces support for fine-grained shared expert overlap in the MC2 codepath, which is a significant feature for performance optimization. The changes involve refactoring the MoE communication path to use dataclasses for return types instead of dictionaries, which improves code clarity and structure. The core logic for overlapping computations using NPU streams and events seems correct. However, I've found a critical issue with duplicated dataclass definitions that will cause a runtime error and must be fixed.

gemini-code-assist · 2025-12-29T10:16:42Z

vllm_ascend/ops/fused_moe/token_dispatcher.py

+@dataclass
+class TokenDispatchResult:
+    hidden_states: torch.Tensor
+    group_list: torch.Tensor
+    group_list_type: int
+    dynamic_scale: torch.Tensor | None = field(default=None)
+    topk_scales: torch.Tensor | None = field(default=None)
+    context_metadata: dict = field(default_factory=dict)
+
+
+@dataclass
+class TokenCombineResult:
+    routed_out: torch.Tensor


The dataclasses TokenDispatchResult and TokenCombineResult are defined twice. The second definition of TokenCombineResult at line 69 overwrites the first one, and it's missing the shared_out field. This will cause a TypeError at runtime when TokenDispatcherWithMC2.token_combine tries to instantiate TokenCombineResult with the shared_out argument. Please remove the duplicate definitions from lines 58 to 70.

github-actions · 2025-12-29T10:58:18Z

👋 Hi! Thank you for contributing to the vLLM Ascend project. The following points will speed up your PR merge:‌‌

A PR should do only one thing, smaller PRs enable faster reviews.
Every PR should include unit tests and end-to-end tests ‌to ensure it works and is not broken by other future PRs.
Write the commit message by fulfilling the PR description to help reviewer and future developers understand.

If CI fails, you can run linting and testing checks locally according Contributing and Testing.

vllm_ascend/ops/fused_moe/token_dispatcher.py

vllm_ascend/ops/fused_moe/fused_moe.py

github-actions · 2025-12-30T11:22:51Z

This pull request has conflicts, please resolve those before we can evaluate the pull request.

yiz-liu

Looking good, can we have a RFC about this?

github-actions · 2026-01-06T09:31:38Z

This pull request has conflicts, please resolve those before we can evaluate the pull request.

jianzs · 2026-01-08T03:03:43Z

Looking good, can we have a RFC about this?

Please take a look #5708

whx-sjtu

LGTM

Signed-off-by: Jade Zheng <zheng.shoujian@outlook.com>

…to FIA_rebase * 'main' of https://github.com/vllm-project/vllm-ascend: (110 commits) [Performance] Remove index opetation when VLLM_ASCEND_FLASHCOMM2_PARALLEL_SIZE=1 (vllm-project#5936) [main][bugfix] fix mooncake kv cache transfer when one P has multi nodes (vllm-project#5960) [Feature] Adapt DispathGmmCombineDecode opertor to align with weight scale dtype of small operators. [RFC: issue 5476] (vllm-project#5755) [Refactor] Move AttentionSpec initialization to Attention module (vllm-project#5834) [EPLB][Bugfix] policy_swift_balancer bugfix and renaming (vllm-project#5897) [CI]fix for lint CI (vllm-project#5982) [Fusion] [Graph]Add Matmul Allreduce Rmsnorm fusion Pass (vllm-project#5034) [Refactor] Migrate profiler config from env vars to explicit ProfilerConfig (vllm-project#5928) [EPLB][Bugfix] Dispatch Allgather use log2phy if enable eplb (vllm-project#5933) [EPLB][Nightly][Bugfix] Get expert from moe layer only (vllm-project#5908) [Bugfix][MM] Fix multi-modal inference OOM issues by setting `expandable_segments:True` (vllm-project#5855) [doc]Table split (vllm-project#5929) [Doc] Upgrade outdated ut doc (vllm-project#5937) [Lint]Style: Convert `vllm-ascend/` to ruff format(Batch vllm-project#2) (vllm-project#5977) Eagle3 mm support, enablement on qwen3vl (vllm-project#4848) [Doc] Remove Chinese characters from the icons in the doc. (vllm-project#5959) [P/D]The issue of solving the force-free secondary release request, which causes the node to crash. (vllm-project#5968) [Feature] Support fine-grained shared expert overlap (vllm-project#5482) [Bugfix] fix cpu offload hang with tp=1 (vllm-project#5963) [Feature]: Support 310P device run qwen2.5/3 dense and qwen2.5vl models (vllm-project#5776) ...

### What this PR does / why we need it? Same with #5482 ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? --------- Signed-off-by: Jade Zheng <zheng.shoujian@outlook.com>

…to qwen3next_rebase * 'main' of https://github.com/vllm-project/vllm-ascend: (637 commits) [Performance] Remove index opetation when VLLM_ASCEND_FLASHCOMM2_PARALLEL_SIZE=1 (vllm-project#5936) [main][bugfix] fix mooncake kv cache transfer when one P has multi nodes (vllm-project#5960) [Feature] Adapt DispathGmmCombineDecode opertor to align with weight scale dtype of small operators. [RFC: issue 5476] (vllm-project#5755) [Refactor] Move AttentionSpec initialization to Attention module (vllm-project#5834) [EPLB][Bugfix] policy_swift_balancer bugfix and renaming (vllm-project#5897) [CI]fix for lint CI (vllm-project#5982) [Fusion] [Graph]Add Matmul Allreduce Rmsnorm fusion Pass (vllm-project#5034) [Refactor] Migrate profiler config from env vars to explicit ProfilerConfig (vllm-project#5928) [EPLB][Bugfix] Dispatch Allgather use log2phy if enable eplb (vllm-project#5933) [EPLB][Nightly][Bugfix] Get expert from moe layer only (vllm-project#5908) [Bugfix][MM] Fix multi-modal inference OOM issues by setting `expandable_segments:True` (vllm-project#5855) [doc]Table split (vllm-project#5929) [Doc] Upgrade outdated ut doc (vllm-project#5937) [Lint]Style: Convert `vllm-ascend/` to ruff format(Batch vllm-project#2) (vllm-project#5977) Eagle3 mm support, enablement on qwen3vl (vllm-project#4848) [Doc] Remove Chinese characters from the icons in the doc. (vllm-project#5959) [P/D]The issue of solving the force-free secondary release request, which causes the node to crash. (vllm-project#5968) [Feature] Support fine-grained shared expert overlap (vllm-project#5482) [Bugfix] fix cpu offload hang with tp=1 (vllm-project#5963) [Feature]: Support 310P device run qwen2.5/3 dense and qwen2.5vl models (vllm-project#5776) ...

Fine-grained control over shared expert overlap to prevent resource contention. - vLLM version: v0.13.0 - vLLM main: vllm-project/vllm@5326c89 --------- Signed-off-by: Jade Zheng <zheng.shoujian@outlook.com>

…oject#5962) ### What this PR does / why we need it? Same with vllm-project#5482 ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? --------- Signed-off-by: Jade Zheng <zheng.shoujian@outlook.com>

Fine-grained control over shared expert overlap to prevent resource contention. - vLLM version: v0.13.0 - vLLM main: vllm-project/vllm@5326c89 --------- Signed-off-by: Jade Zheng <zheng.shoujian@outlook.com>

Fine-grained control over shared expert overlap to prevent resource contention. - vLLM version: v0.13.0 - vLLM main: vllm-project/vllm@5326c89 --------- Signed-off-by: Jade Zheng <zheng.shoujian@outlook.com> Signed-off-by: zrj026 <zhangrunjiang026@gmail.com>

Fine-grained control over shared expert overlap to prevent resource contention. - vLLM version: v0.13.0 - vLLM main: vllm-project/vllm@5326c89 --------- Signed-off-by: Jade Zheng <zheng.shoujian@outlook.com>

Fine-grained control over shared expert overlap to prevent resource contention. - vLLM version: v0.13.0 - vLLM main: vllm-project/vllm@5326c89 --------- Signed-off-by: Jade Zheng <zheng.shoujian@outlook.com> Signed-off-by: zrj026 <zhangrunjiang026@gmail.com>

Fine-grained control over shared expert overlap to prevent resource contention. - vLLM version: v0.13.0 - vLLM main: vllm-project/vllm@5326c89 --------- Signed-off-by: Jade Zheng <zheng.shoujian@outlook.com>

jianzs marked this pull request as draft December 29, 2025 10:14

gemini-code-assist bot reviewed Dec 29, 2025

View reviewed changes

github-actions bot added module:tests module:ops labels Dec 29, 2025

realliujiaxu reviewed Dec 29, 2025

View reviewed changes

vllm_ascend/ops/fused_moe/token_dispatcher.py Outdated Show resolved Hide resolved

jianzs force-pushed the feat/mc2-overlap-sharedexp branch from d2175a7 to b8ed98c Compare December 29, 2025 12:05

jianzs added ready read for review ready-for-test start test by label for PR labels Dec 29, 2025

jianzs changed the title ~~[Feature] Support fine-grained shared expert overlap in MC2 codepath~~ [Feature] Support fine-grained shared expert overlap Dec 29, 2025

lidenghui1110 reviewed Dec 30, 2025

View reviewed changes

vllm_ascend/ops/fused_moe/fused_moe.py Show resolved Hide resolved

vllm_ascend/ops/fused_moe/fused_moe.py Show resolved Hide resolved

vllm_ascend/ops/fused_moe/fused_moe.py Show resolved Hide resolved

vllm_ascend/ops/fused_moe/fused_moe.py Show resolved Hide resolved

github-actions bot added the merge-conflicts label Dec 30, 2025

jianzs force-pushed the feat/mc2-overlap-sharedexp branch from 8652cfe to 4f19344 Compare December 30, 2025 12:56

github-actions bot removed the merge-conflicts label Dec 30, 2025

jianzs force-pushed the feat/mc2-overlap-sharedexp branch from 4f19344 to b5660b4 Compare December 31, 2025 06:26

jianzs marked this pull request as ready for review December 31, 2025 06:28

yiz-liu reviewed Dec 31, 2025

View reviewed changes

github-actions bot added the merge-conflicts label Jan 6, 2026

jianzs force-pushed the feat/mc2-overlap-sharedexp branch from 3703ac5 to 6800efa Compare January 6, 2026 09:35

github-actions bot removed the merge-conflicts label Jan 6, 2026

jianzs force-pushed the feat/mc2-overlap-sharedexp branch 3 times, most recently from e653b76 to e9c48a1 Compare January 8, 2026 02:33

jianzs added this to the v0.14.0rc1 milestone Jan 8, 2026

jianzs linked an issue Jan 8, 2026 that may be closed by this pull request

[RFC]: Building an Event Callback Mechanism to Achieve Fine-Grained Overlap of Shared Experts #5708

Closed

jianzs force-pushed the feat/mc2-overlap-sharedexp branch from e9c48a1 to cce38d8 Compare January 8, 2026 08:12

whx-sjtu approved these changes Jan 8, 2026

View reviewed changes

jianzs force-pushed the feat/mc2-overlap-sharedexp branch 3 times, most recently from 44e109c to 8b49f80 Compare January 9, 2026 05:35

jianzs added 5 commits January 16, 2026 17:52

[Feature] Support fine-grained shared expert overlap

50d91c8

Signed-off-by: Jade Zheng <zheng.shoujian@outlook.com>

fix

939ac7c

Signed-off-by: Jade Zheng <zheng.shoujian@outlook.com>

update

7c8bc23

Signed-off-by: Jade Zheng <zheng.shoujian@outlook.com>

lint code

97c5c2e

Signed-off-by: Jade Zheng <zheng.shoujian@outlook.com>

lint code

7b55a43

Signed-off-by: Jade Zheng <zheng.shoujian@outlook.com>

jianzs force-pushed the feat/mc2-overlap-sharedexp branch from 1f1a81c to 7b55a43 Compare January 16, 2026 09:52

jianzs requested review from wangxiyuan and zzzzwwjj as code owners January 16, 2026 09:52

update

501d4de

Signed-off-by: Jade Zheng <zheng.shoujian@outlook.com>

jianzs mentioned this pull request Jan 16, 2026

[0.13.0][Feature] Support fine-grained shared expert overlap #5962

Merged

jianzs enabled auto-merge (squash) January 17, 2026 02:59

wangxiyuan approved these changes Jan 17, 2026

View reviewed changes

jianzs merged commit 22f2531 into vllm-project:main Jan 17, 2026
20 checks passed

gcanlin mentioned this pull request Jan 27, 2026

[NPU] Upgrade to v0.14.0 vllm-project/vllm-omni#820

Merged

14 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature] Support fine-grained shared expert overlap#5482

[Feature] Support fine-grained shared expert overlap#5482
jianzs merged 6 commits intovllm-project:mainfrom
jianzs:feat/mc2-overlap-sharedexp

jianzs commented Dec 29, 2025 •

edited

Loading

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Dec 29, 2025

Uh oh!

github-actions bot commented Dec 29, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

github-actions bot commented Dec 30, 2025

Uh oh!

yiz-liu left a comment

Uh oh!

github-actions bot commented Jan 6, 2026

Uh oh!

jianzs commented Jan 8, 2026

Uh oh!

whx-sjtu left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

Conversation

jianzs commented Dec 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What this PR does / why we need it?

Does this PR introduce any user-facing change?

How was this patch tested?

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Dec 29, 2025

Choose a reason for hiding this comment

Uh oh!

github-actions bot commented Dec 29, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

github-actions bot commented Dec 30, 2025

Uh oh!

yiz-liu left a comment

Choose a reason for hiding this comment

Uh oh!

github-actions bot commented Jan 6, 2026

Uh oh!

jianzs commented Jan 8, 2026

Uh oh!

whx-sjtu left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

jianzs commented Dec 29, 2025 •

edited

Loading