[Feat] Multi-stream for eplb heat collection and aggregation by dsxsteven · Pull Request #4214 · vllm-project/vllm-ascend

dsxsteven · 2025-11-17T01:59:33Z

What this PR does / why we need it?

This PR optimizes multistream for eplb heat collection and aggregation

Does this PR introduce any user-facing change?

No

How was this patch tested?

vLLM version: v0.12.0
vLLM main: vllm-project/vllm@ad32e3e

Co-authored-by: Skywalker-EP 173723846@qq.com，walterchenchn@outlook.com

Signed-off-by: daishixun <dsxsteven@sina.com>

github-actions · 2025-11-17T01:59:42Z

👋 Hi! Thank you for contributing to the vLLM Ascend project. The following points will speed up your PR merge:‌‌

A PR should do only one thing, smaller PRs enable faster reviews.
Every PR should include unit tests and end-to-end tests ‌to ensure it works and is not broken by other future PRs.
Write the commit message by fulfilling the PR description to help reviewer and future developers understand.

If CI fails, you can run linting and testing checks locally according Contributing and Testing.

gemini-code-assist

Code Review

This pull request introduces an asynchronous stream to handle MoE expert load (heat) collection, aiming to optimize performance by overlapping it with other computations. The changes involve creating a dedicated stream and switching to it for MoE load accumulation and gathering.

My review has identified a critical race condition and a high-severity performance issue in vllm_ascend/eplb/eplb_updator.py. The race condition is due to missing synchronization between the main computation stream and the new asynchronous stream, which could lead to incorrect load balancing. The performance issue is related to a buffer being re-allocated on every call, which should be addressed for efficiency. Please see the detailed comments for suggestions on how to fix these issues.

gemini-code-assist · 2025-11-17T02:01:26Z

+            with npu_stream_switch(moe_load_async_stream()):
+                self.world_size = dist.get_world_size()
+                self.device = local_load.device
+                if self._gather_buffer is None:
+                    shape = (self.world_size, *local_load.shape)
+                    self._gather_buffer = torch.empty(shape,
+                                                    dtype=local_load.dtype,
+                                                    device=self.device)
+
+                dist.all_gather_into_tensor(self._gather_buffer, local_load)
+
+                moe_load = self._gather_buffer.permute(1, 0, 2)
+                self.shared_dict["moe_load"] = moe_load.cpu()
+                logger.debug(
+                    f"[ModelRunner] Updated shared_dict['moe_load'] shape={moe_load.shape}"
            )


There is a race condition here. The moe_load tensors are updated asynchronously on moe_load_async_stream in fused_moe.py. However, self.adaptor.get_rank_expert_workload() is called on the default stream (on line 152, before this block) to read these tensors without any synchronization. This can lead to reading stale or incomplete data, causing incorrect load balancing. To fix this, you must synchronize the streams before reading moe_load. For example, you could add moe_load_async_stream().synchronize() before the call to self.adaptor.get_rank_expert_workload() on line 152.

gemini-code-assist · 2025-11-17T02:01:26Z

+                if self._gather_buffer is None:
+                    shape = (self.world_size, *local_load.shape)
+                    self._gather_buffer = torch.empty(shape,
+                                                    dtype=local_load.dtype,
+                                                    device=self.device)


The self._gather_buffer is reset to None on every call to compute_and_set_moe_load (on line 154). This makes this condition always true, causing the buffer to be re-allocated on every invocation, which is inefficient. To avoid this performance issue, self._gather_buffer should be initialized to None in the __init__ method of the EplbUpdator class, and the line self._gather_buffer = None should be removed from this method.

Signed-off-by: daishixun <dsxsteven@sina.com>

realliujiaxu · 2025-11-18T07:31:17Z

-            logger.debug(
-                f"[ModelRunner] Updated shared_dict['moe_load'] shape={moe_load.shape}"
-            )
+            with npu_stream_switch(moe_load_async_stream()):


maybe better to set moe_load_async_stream as class attribute of EplbUpdator

already move this function to eplb module, since other file would also call this stream, so move to eplb utils is better

github-actions · 2025-11-24T09:10:46Z

This pull request has conflicts, please resolve those before we can evaluate the pull request.

Signed-off-by: daishixun <dsxsteven@sina.com>

wangxiyuan · 2025-12-04T08:26:45Z

    return _SHARED_EXPERTS_CALCULATION_STREAM


+def moe_load_async_stream() -> torch_npu.npu.Stream:


move this function to eplb module

whx-sjtu

LGTM

Signed-off-by: daishixun <dsxsteven@sina.com>

MengqingCao

LGTM, thx!

…oject#4214) ### What this PR does / why we need it? This PR optimizes multistream for eplb heat collection and aggregation - vLLM version: v0.12.0 - vLLM main: https://github.com/vllm-project/vllm/commit/v0.12.0 --------- Signed-off-by: daishixun <dsxsteven@sina.com> Co-authored-by: Mengqing Cao <cmq0113@163.com>

…llm-project#4214)" This reverts commit 9a885d0.

…llm-project#4214)" This reverts commit 9a885d0. Signed-off-by: Wangyibo1005 <2633333316@qq.com>

multistream for eplb heat aggr

f316716

Signed-off-by: daishixun <dsxsteven@sina.com>

github-actions bot added module:ops module:core labels Nov 17, 2025

gemini-code-assist bot reviewed Nov 17, 2025

View reviewed changes

dsxsteven added 3 commits November 17, 2025 10:09

fix pre-commit

7b05287

Signed-off-by: daishixun <dsxsteven@sina.com>

fix pre-commit

e014822

Signed-off-by: daishixun <dsxsteven@sina.com>

fix pre-commit

b2a0571

Signed-off-by: daishixun <dsxsteven@sina.com>

dsxsteven closed this Nov 18, 2025

dsxsteven reopened this Nov 18, 2025

dsxsteven changed the title ~~multistream for eplb heat collection~~ [Feat]multistream for eplb heat collection and aggregation Nov 18, 2025

dsxsteven changed the title ~~multistream for eplb heat collection~~ [Feat] Multi-stream for eplb heat collection and aggregation Nov 18, 2025

realliujiaxu reviewed Nov 18, 2025

View reviewed changes

whx-sjtu added ready read for review ready-for-test start test by label for PR labels Nov 19, 2025

github-actions bot added the merge-conflicts label Nov 24, 2025

dsxsteven added 2 commits December 4, 2025 11:11

correctly use multistream in acl graph

59c9b0e

Signed-off-by: daishixun <dsxsteven@sina.com>

rebase main

e09650a

Signed-off-by: daishixun <dsxsteven@sina.com>

dsxsteven force-pushed the main_1114_multistream-heataggr branch from 6c7c895 to e09650a Compare December 4, 2025 03:39

github-actions bot removed the merge-conflicts label Dec 4, 2025

wangxiyuan reviewed Dec 4, 2025

View reviewed changes

whx-sjtu approved these changes Dec 4, 2025

View reviewed changes

move multi-stream to eplb utils

d5b59ad

Signed-off-by: daishixun <dsxsteven@sina.com>

dsxsteven force-pushed the main_1114_multistream-heataggr branch from cf110ae to d5b59ad Compare December 4, 2025 11:20

github-actions bot removed the module:core label Dec 4, 2025

weijinqian0 approved these changes Dec 8, 2025

View reviewed changes

MengqingCao approved these changes Dec 9, 2025

View reviewed changes

Merge branch 'main' into main_1114_multistream-heataggr

9eaa03d

MengqingCao merged commit 9a885d0 into vllm-project:main Dec 9, 2025
16 of 18 checks passed

gemini-code-assist bot mentioned this pull request Dec 30, 2025

[Bugfix] Revert pr4214 multi-stream collect expert hotpot #5529

Merged

wangyibo1005 added a commit to wangyibo1005/vllm-ascend that referenced this pull request Dec 31, 2025

Revert "[Feat] Multi-stream for eplb heat collection and aggregation (v…

89f5f98

…llm-project#4214)" This reverts commit 9a885d0.

wangyibo1005 added a commit to wangyibo1005/vllm-ascend that referenced this pull request Dec 31, 2025

Revert "[Feat] Multi-stream for eplb heat collection and aggregation (v…

3e1c7d6

…llm-project#4214)" This reverts commit 9a885d0.

wangyibo1005 added a commit to wangyibo1005/vllm-ascend that referenced this pull request Dec 31, 2025

Revert "[Feat] Multi-stream for eplb heat collection and aggregation (v…

93eacbd

…llm-project#4214)" This reverts commit 9a885d0. Signed-off-by: Wangyibo1005 <2633333316@qq.com>

wangyibo1005 added a commit to wangyibo1005/vllm-ascend that referenced this pull request Dec 31, 2025

Revert "[Feat] Multi-stream for eplb heat collection and aggregation (v…

91aba71

…llm-project#4214)" This reverts commit 9a885d0. Signed-off-by: Wangyibo1005 <2633333316@qq.com>

dsxsteven deleted the main_1114_multistream-heataggr branch March 10, 2026 03:34

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feat] Multi-stream for eplb heat collection and aggregation#4214

[Feat] Multi-stream for eplb heat collection and aggregation#4214
MengqingCao merged 8 commits intovllm-project:mainfrom
dsxsteven:main_1114_multistream-heataggr

dsxsteven commented Nov 17, 2025 •

edited

Loading

Uh oh!

github-actions bot commented Nov 17, 2025

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Nov 17, 2025

Uh oh!

gemini-code-assist bot Nov 17, 2025

Uh oh!

realliujiaxu Nov 18, 2025

Uh oh!

dsxsteven Dec 9, 2025

Uh oh!

github-actions bot commented Nov 24, 2025

Uh oh!

wangxiyuan Dec 4, 2025

Uh oh!

whx-sjtu left a comment

Uh oh!

MengqingCao left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

		return _SHARED_EXPERTS_CALCULATION_STREAM


		def moe_load_async_stream() -> torch_npu.npu.Stream:

Conversation

dsxsteven commented Nov 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What this PR does / why we need it?

Does this PR introduce any user-facing change?

How was this patch tested?

Uh oh!

github-actions bot commented Nov 17, 2025

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Nov 17, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Nov 17, 2025

Choose a reason for hiding this comment

Uh oh!

realliujiaxu Nov 18, 2025

Choose a reason for hiding this comment

Uh oh!

dsxsteven Dec 9, 2025

Choose a reason for hiding this comment

Uh oh!

github-actions bot commented Nov 24, 2025

Uh oh!

wangxiyuan Dec 4, 2025

Choose a reason for hiding this comment

Uh oh!

whx-sjtu left a comment

Choose a reason for hiding this comment

Uh oh!

MengqingCao left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

dsxsteven commented Nov 17, 2025 •

edited

Loading