[Disaggregated Prefill] P2P Disaggregated Prefill based on llm_datadist by whx-sjtu · Pull Request #694 · vllm-project/vllm-ascend

whx-sjtu · 2025-04-28T03:21:49Z

What this PR does / why we need it?

This PR proposes a P2P version of Disaggregated Prefill based on llm_datadist which manages data transfer.
This solution reconstructs previous offline single-node Disaggregated Prefill solution, and supports multi-node and online serveing now.
Currently this solution supports 1P1D situation of Deepseek hybrid parallelism (P: TP+EP, D: DP+EP). Note that xPyD situation is considered in the solution design, and will be supported soon within v1 engine.

requirements.txt

vllm_ascend/distributed/kv_transfer/llmdatadist_connector.py

vllm_ascend/distributed/kv_transfer/llmdatadist_buffer.py

vllm_ascend/distributed/kv_transfer/llmdatadist_pipe.py

wuhuikx

Can we have a readme.md in examples/disaggreated_prefill to guide the users for running? may including:

disaggreated_prefill_offline.sh
disaggreated_prefill_online.sh

examples/disaggregated_prefill/disaggregated_prefill_offline.py

vllm_ascend/distributed/kv_transfer/llmdatadist_pipe.py

LCAIZJ · 2025-05-01T03:08:20Z

vllm_ascend/distributed/kv_transfer/simple_pipe.py

+            options["ge.exec.deviceId"] = str(self.rank)
+        print(f"prepare datadist, options: {options}")
+        self.data_dist.init(options)
+        self.kv_transfer = self.data_dist.kv_cache_manager


Why not use datadist cache_manager, cache_manger also has allocate_cache? Is there any comparison between cache_manger and kv_cache_manger in transfer performance?

In fact, I do not have a deep understanding of llm datadist. In the current version, functions are streamlined based on kv_cache_manager. In the future, I will consider using cache manager to implement and compare the performance between them.

@LCAIZJ Would you be able to share any resources detailing how cache_manager differs from kv_cache_manager?

Signed-off-by: hw_whx <wanghexiang7@huawei.com>

Signed-off-by: ganyi <pleaplusone.gy@gmail.com>

Signed-off-by: hw_whx <wanghexiang7@huawei.com>

Signed-off-by: ganyi <pleaplusone.gy@gmail.com>

Signed-off-by: hw_whx <wanghexiang7@huawei.com>

…e same path Signed-off-by: ganyi <pleaplusone.gy@gmail.com>

Signed-off-by: ganyi <pleaplusone.gy@gmail.com>

vllm_ascend/distributed/kv_transfer/simple_buffer.py

Signed-off-by: ganyi <pleaplusone.gy@gmail.com>

jianzs · 2025-05-01T07:28:15Z

vllm_ascend/distributed/kv_transfer/simple_pipe.py

+        # Free cache_id of buffer, actual deallocate will happen after consumer performing pull_cache.
+        self.kv_transfer.deallocate_cache(buffer)


When the decode tensor parallelism size exceeds the prefill tensor parallelism size, the same buffer in a prefill node may receive multiple pull requests. However, there appears to be a potential issue: the buffer gets deallocated after the first pull request is processed. Could this be causing errors?

Indeed, current version might fail in this situation. Maybe we can maintain a buffer list in simple_buffer.py with FIFO.

Could you clarify your idea? For hybrid parallelism, I think we either need to manually manage buffer lifecycles or accept storage redundancy.

ganyi1996ppo · 2025-05-01T07:56:31Z

This PR also supports dummy run in engine v0, when one dp rank receives a request, the other dp ranks will also receives a idle request to make sure all dp comm group can reach to same collective call thus no rank will be blocked on it.

However, this support is limited and can only be valid under below 3 scenarios:

When target model dose not do collective call over dp group, the service is expected to be fine at anything time.
When target model may enter both decode and prefill phase(normal vllm serve case) and needs to do collective call over dp group, both prefill and decode path should always have the same collective call within dp group and happens at same position.
If prefill and decode path have different collective call over the same dp group inside the target model, then only disaggregate prefill and decode can promise the functionality (with the strong assumption that the kv_cache and hidden_states can always get properly received if its a vaild request which have already processed by prefill node)

…om during e2e test Signed-off-by: ganyi <pleaplusone.gy@gmail.com>

Signed-off-by: ganyi <pleaplusone.gy@gmail.com>

ganyi1996ppo · 2025-05-01T14:31:27Z

This disaggregate prefill feature is still under experimental phase. Further test and development is required in the future, please update the tutorial and unittest about this feature after this PR merged @whx-sjtu .

### What this PR does / why we need it? #### 1. fix spec ut in vllm-ascend main and vllm main As #694 and #749 verify, Now, vllm-ascend main and vllm 0.8.5, spec UT is happy, but vllm-ascend main and vllm main, CI is fail. I found the reason is a triton bug triton-lang/triton#2266, but i I didn't figure it out that why the bug did not effect vllm-ascend main and vllm 0.8.5, maybe the usage of triton have changed when vllm 0.8.5 to latest main As the bug describe, I changed the minimum block_size in UT from 8 to 16, and the modification is verified locally to be effective. #### 2. modify some case skip form. I modified some commented out cases to skipif form, which is more standardized. ### Does this PR introduce _any_ user-facing change? None ### How was this patch tested? CI Signed-off-by: mengwei805 <mengwei25@huawei.com>

…st (vllm-project#694) ### What this PR does / why we need it? - This PR proposes a P2P version of Disaggregated Prefill based on llm_datadist which manages data transfer. - This solution reconstructs previous offline single-node Disaggregated Prefill solution, and supports multi-node and online serveing now. - Currently this solution supports 1P1D situation of Deepseek hybrid parallelism (P: TP+EP, D: DP+EP). Note that xPyD situation is considered in the solution design, and will be supported soon within v1 engine. --------- Signed-off-by: hw_whx <wanghexiang7@huawei.com> Signed-off-by: ganyi <pleaplusone.gy@gmail.com> Co-authored-by: hw_whx <wanghexiang7@huawei.com> Co-authored-by: ganyi <pleaplusone.gy@gmail.com>

### What this PR does / why we need it? #### 1. fix spec ut in vllm-ascend main and vllm main As vllm-project#694 and vllm-project#749 verify, Now, vllm-ascend main and vllm 0.8.5, spec UT is happy, but vllm-ascend main and vllm main, CI is fail. I found the reason is a triton bug triton-lang/triton#2266, but i I didn't figure it out that why the bug did not effect vllm-ascend main and vllm 0.8.5, maybe the usage of triton have changed when vllm 0.8.5 to latest main As the bug describe, I changed the minimum block_size in UT from 8 to 16, and the modification is verified locally to be effective. #### 2. modify some case skip form. I modified some commented out cases to skipif form, which is more standardized. ### Does this PR introduce _any_ user-facing change? None ### How was this patch tested? CI Signed-off-by: mengwei805 <mengwei25@huawei.com>

…st (vllm-project#694) ### What this PR does / why we need it? - This PR proposes a P2P version of Disaggregated Prefill based on llm_datadist which manages data transfer. - This solution reconstructs previous offline single-node Disaggregated Prefill solution, and supports multi-node and online serveing now. - Currently this solution supports 1P1D situation of Deepseek hybrid parallelism (P: TP+EP, D: DP+EP). Note that xPyD situation is considered in the solution design, and will be supported soon within v1 engine. --------- Signed-off-by: hw_whx <wanghexiang7@huawei.com> Signed-off-by: ganyi <pleaplusone.gy@gmail.com> Co-authored-by: hw_whx <wanghexiang7@huawei.com> Co-authored-by: ganyi <pleaplusone.gy@gmail.com>

### What this PR does / why we need it? #### 1. fix spec ut in vllm-ascend main and vllm main As vllm-project#694 and vllm-project#749 verify, Now, vllm-ascend main and vllm 0.8.5, spec UT is happy, but vllm-ascend main and vllm main, CI is fail. I found the reason is a triton bug triton-lang/triton#2266, but i I didn't figure it out that why the bug did not effect vllm-ascend main and vllm 0.8.5, maybe the usage of triton have changed when vllm 0.8.5 to latest main As the bug describe, I changed the minimum block_size in UT from 8 to 16, and the modification is verified locally to be effective. #### 2. modify some case skip form. I modified some commented out cases to skipif form, which is more standardized. ### Does this PR introduce _any_ user-facing change? None ### How was this patch tested? CI Signed-off-by: mengwei805 <mengwei25@huawei.com>

wuhuikx reviewed Apr 28, 2025

View reviewed changes

requirements.txt Show resolved Hide resolved

vllm_ascend/distributed/kv_transfer/llmdatadist_connector.py Show resolved Hide resolved

wuhuikx reviewed Apr 28, 2025

View reviewed changes

vllm_ascend/distributed/kv_transfer/llmdatadist_buffer.py Outdated Show resolved Hide resolved

vllm_ascend/distributed/kv_transfer/llmdatadist_pipe.py Outdated Show resolved Hide resolved

vllm_ascend/distributed/kv_transfer/llmdatadist_pipe.py Outdated Show resolved Hide resolved

wuhuikx reviewed Apr 28, 2025

View reviewed changes

vllm_ascend/distributed/kv_transfer/llmdatadist_pipe.py Show resolved Hide resolved

vllm_ascend/distributed/kv_transfer/llmdatadist_pipe.py Show resolved Hide resolved

wuhuikx reviewed Apr 28, 2025

View reviewed changes

jianzs mentioned this pull request Apr 28, 2025

[Bug]: Cannot use PD separation feature with v0.8.4rc1 #696

Open

wuhuikx reviewed Apr 29, 2025

View reviewed changes

examples/disaggregated_prefill/disaggregated_prefill_offline.py Show resolved Hide resolved

wuhuikx reviewed Apr 29, 2025

View reviewed changes

examples/disaggregated_prefill/disaggregated_prefill_offline.py Show resolved Hide resolved

jianzs reviewed Apr 29, 2025

View reviewed changes

vllm_ascend/distributed/kv_transfer/llmdatadist_pipe.py Outdated Show resolved Hide resolved

whx-sjtu force-pushed the p2p_pd_sep branch 5 times, most recently from e76fddc to ae658ea Compare April 30, 2025 12:04

LCAIZJ reviewed May 1, 2025

View reviewed changes

whx-sjtu force-pushed the p2p_pd_sep branch 2 times, most recently from f57ae1c to 1007fa7 Compare May 1, 2025 05:55

ganyi1996ppo force-pushed the p2p_pd_sep branch from 9b0aabc to a997cc6 Compare May 1, 2025 06:32

whx-sjtu force-pushed the p2p_pd_sep branch 2 times, most recently from 24f8d07 to 3975e7b Compare May 1, 2025 06:49

hw_whx and others added 10 commits May 1, 2025 14:50

rebase1

a17fcb8

Signed-off-by: hw_whx <wanghexiang7@huawei.com>

fix problems with dp proxy and register to dp proxy in worker

77d7e32

Signed-off-by: hw_whx <wanghexiang7@huawei.com>

fix bugs in model runner

553329e

Signed-off-by: hw_whx <wanghexiang7@huawei.com>

fix rebase problems

c04d9a9

Signed-off-by: hw_whx <wanghexiang7@huawei.com>

fix review problems

98559d7

Signed-off-by: hw_whx <wanghexiang7@huawei.com>

enable dummy run for disaggregate prefill

c6578ce

Signed-off-by: ganyi <pleaplusone.gy@gmail.com>

modify requirements.txt

f21df58

Signed-off-by: hw_whx <wanghexiang7@huawei.com>

format issue fix

b1e7ffe

Signed-off-by: ganyi <pleaplusone.gy@gmail.com>

change the choosing strategy of P-D device pairs

cae2607

Signed-off-by: hw_whx <wanghexiang7@huawei.com>

fix dummy run with the assumption that decode node always get into th…

2ee56e9

…e same path Signed-off-by: ganyi <pleaplusone.gy@gmail.com>

whx-sjtu force-pushed the p2p_pd_sep branch from 3975e7b to 2ee56e9 Compare May 1, 2025 06:51

mypy issue fix

7748e10

Signed-off-by: ganyi <pleaplusone.gy@gmail.com>

jianzs reviewed May 1, 2025

View reviewed changes

vllm_ascend/distributed/kv_transfer/simple_buffer.py Show resolved Hide resolved

ganyi1996ppo added 2 commits May 1, 2025 15:26

fix for non dp scenario

9ddd6c8

Signed-off-by: ganyi <pleaplusone.gy@gmail.com>

fix for non dp scenario

69a0969

Signed-off-by: ganyi <pleaplusone.gy@gmail.com>

jianzs reviewed May 1, 2025

View reviewed changes

update the gpu_memory_utilization to 0.8 for mtp test to preventing o…

1f32c3d

…om during e2e test Signed-off-by: ganyi <pleaplusone.gy@gmail.com>

github-actions bot added the module:tests label May 1, 2025

ganyi1996ppo added 2 commits May 1, 2025 18:06

disable spec decode for ci which may cause oom issue

606a8c0

Signed-off-by: ganyi <pleaplusone.gy@gmail.com>

disable spec decode for ci which may cause oom issue

0a6eb73

Signed-off-by: ganyi <pleaplusone.gy@gmail.com>

ganyi1996ppo approved these changes May 1, 2025

View reviewed changes

ganyi1996ppo changed the title ~~[Disaggregated Prefill][WIP] P2P Disaggregated Prefill based on llm_datadist~~ [Disaggregated Prefill] P2P Disaggregated Prefill based on llm_datadist May 1, 2025

ganyi1996ppo merged commit 8b194ad into vllm-project:main May 1, 2025
16 checks passed

mengwei805 mentioned this pull request May 6, 2025

[CI/UT] fix spec ut in vllm-ascend main and vllm main #759

Merged

MengqingCao mentioned this pull request May 14, 2025

[RFC]: P/D Disaggregation Support #841

Closed

13 tasks

Yikun mentioned this pull request Jun 13, 2025

Add ShouJian Zheng (@jianzs) as vLLM Ascend maintainer #1203

Merged

wangxiyuan mentioned this pull request Jan 26, 2026

[Community] Nominate whx-sjtu as maintainer #6268

Merged

		# Free cache_id of buffer, actual deallocate will happen after consumer performing pull_cache.
		self.kv_transfer.deallocate_cache(buffer)

Conversation

whx-sjtu commented Apr 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What this PR does / why we need it?

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

wuhuikx left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

LCAIZJ May 1, 2025

Choose a reason for hiding this comment

Uh oh!

whx-sjtu May 1, 2025

Choose a reason for hiding this comment

Uh oh!

jianzs May 1, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

jianzs May 1, 2025

Choose a reason for hiding this comment

Uh oh!

whx-sjtu May 1, 2025

Choose a reason for hiding this comment

Uh oh!

jianzs May 1, 2025

Choose a reason for hiding this comment

Uh oh!

ganyi1996ppo commented May 1, 2025

Uh oh!

ganyi1996ppo commented May 1, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

whx-sjtu commented Apr 28, 2025 •

edited

Loading