[Feature][Disaggregated] Support XpYd disaggregated prefill with MooncakeStore #12957

ShangmingCai · 2025-02-08T10:42:16Z

This is the initial PR to implement XpYd as described in the design doc, which has gone through several rounds of discussions in the #feat-prefill-disaggregation slack channel and reached a preliminary consensus.

This PR mainly includes the changes related to KVTransferParams. In the implementation of XpYd, we will use this parameter to coordinate metadata among all vllm instances (prefill nodes and decode nodes), the KVStore Server, and the global proxy. With this param, the global proxy can attach the disaggregated prefill metadata info in the CompletionRequest or ChatCompletionRequest, and all kv-transfer-related modules can access it through the model_input in function model_runner.execute_model and use these params to interact with the KVStore Server at the right place under current 1P1D implementation.

Based on this initial PR, the Mooncake team will implement the KVStoreConnector in the next PR, which utilizes KVTransferParams and supports several third-party key-value stores to act as the KVLookupBuffer such as MooncakeStore, Valkey, LMCache, etc. With KVStoreConnector, we can decouple the connections between the Prefill nodes and the Decode nodes, making the entire XpYd implementation simpler and more stable.

The metadata info required in KVTransferParams will be provided by a global proxy (or we can call it the XpYd orchestration mechanism). Therefore, a global proxy interface will be added in the next PR too. Mooncake team will implement a simple version based on round-robin. In the production stage, service providers and users can also implement corresponding global proxy strategies according to their needs.

Since some parties seem to be concerned about the OpenAI API changes, we refactor this PR to support disaggregated prefill with MooncakeStore, which is a new Distributed Object Store for XpYd PD disaggregation. We also provide a simple disaggregation proxy demo to support XpYd disaggregated prefill as an example.

Integration guide: https://github.com/kvcache-ai/Mooncake/blob/main/doc/en/vllm-integration-v1.md

Due to the communication strategy among the p node, d node, global proxy, kvstore is still under discussion, we will support layer-by-layer kvcache transfer to further optimize TTFT once we figured out how to coordinate with the proxy in the next PR.

CC list: @KuntaiDu, @youkaichao, @james0zan, @alogfans, @stmatengss, @zeroorhero, @pizhenwei

Signed-off-by: Shangming Cai <[email protected]>

github-actions · 2025-02-08T10:42:25Z

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

🚀

Signed-off-by: Shangming Cai <[email protected]>

ShangmingCai · 2025-02-11T08:10:51Z

@KuntaiDu Hello, this is the initial PR for the XpYd design as we discussed before the Chinese New Year. I think it is ready for review :)

Signed-off-by: Shangming Cai <[email protected]>

vllm/entrypoints/openai/protocol.py

vllm/entrypoints/openai/serving_chat.py

vllm/kv_transfer_params.py

Second222None · 2025-03-28T09:53:53Z

Nice work. But we have the following problems with reproducing. Can anyone give some advice?

ShangmingCai · 2025-03-28T10:12:43Z

Nice work. But we have the following problems with reproducing. Can anyone give some advice?

@Second222None Please raise a new issue in the Mooncake repo, we will find someone to help you.

ShangmingCai · 2025-03-29T11:00:28Z

@DarkLight1337 Any idea why v1-test keep failing?

DarkLight1337 · 2025-03-29T11:01:36Z

It's a broken test on main. I'll just force-merge this

vllm/distributed/kv_transfer/kv_connector/mooncake_store_connector.py

examples/online_serving/disagg_examples/disagg_proxy_demo.py

Zhou-sx · 2025-04-03T08:19:12Z

vllm/distributed/kv_transfer/kv_connector/mooncake_store_connector.py

+            load_kvcache_key = f"{load_key_prefix}_{self.local_tp_rank}"
+            remote_kv = self.kv_store.get(load_kvcache_key)
+            hidden_key = f"{load_key_prefix}_hidden_{self.local_tp_rank}"
+            hidden = self.kv_store.get(hidden_key)


Hi, can I ask a question? Do we need kv_store.remove(hidden_key) to delete the memory in mooncake?

Nope. We enable auto gc since prefix-based kvcache resuing has not been implemented yet. But it is a temporary state, we are working on the v1 design, which will include kvcache resuing and evicting.

…cakeStore (vllm-project#12957) Signed-off-by: Shangming Cai <[email protected]> Signed-off-by: xinyuxiao <[email protected]>

…cakeStore (vllm-project#12957) Signed-off-by: Shangming Cai <[email protected]> Signed-off-by: Louis Ulmer <[email protected]>

dalong2hongmei · 2025-04-15T03:29:16Z

got trouble in XpYd disaggregated prefill with MooncakeStore:
1、it works perfect following this tutorial (tcp protocol，vllm0.8.3, 1p1d)：
https://github.com/kvcache-ai/Mooncake/blob/main/doc/en/vllm-integration-v1.md
2、refer to above, i deployed 2 pods using k8s, 1pod for 1p or 1d role,tcp protocol too,it did work this time.
p pod could allways send kvcache successfully，
but d pod could not receive kvcache. an error in d pod's log i always met:
client.cpp:428] Transfer failed for task0
.....
client.cpp:177]transfer_read_failed_key=2580549111205550000_0

any clue?

ShangmingCai · 2025-04-15T03:33:59Z

any clue?

@dalong2hongmei You can provide a detailed log and open an issue in the Mooncake repo, we will help you to seek the root cause.

dalong2hongmei · 2025-04-17T02:42:24Z

any clue?

@dalong2hongmei You can provide a detailed log and open an issue in the Mooncake repo, we will help you to seek the root cause.

the logs are store on the corporate intrannet, hard to export them out to this...

i foud an issue posting the error logs exactly the same as mine
kvcache-ai/Mooncake#180

i also want to figure out if any prerequisite in docker container tcp scenario(or k8s), for example, if it need hostnetwork between p/d instances?

XiaobinZhao · 2025-04-18T02:57:39Z

@dalong2hongmei i follow the doc: https://github.com/kvcache-ai/Mooncake/blob/main/doc/en/vllm-integration-v1.md; but got error:

2025-04-17 19:46:42.566927 ERROR    [975] [coro_rpc_client.hpp:978] read rpc body failed, error msg:End of file. close the socket.
E0417 19:46:42.567027   481 master_client.cpp:192] Failed to mount segment due to rpc error
E0417 19:46:42.567034   481 store_py.cpp:191] Failed to mount segment: UNKNOWN_ERROR
E0417 19:46:42.567068   484 master_client.cpp:192] Failed to mount segment due to rpc error
E0417 19:46:42.567075   484 store_py.cpp:191] Failed to mount segment: UNKNOWN_ERROR
2025-04-17 19:46:42.566977 ERROR    [953] [coro_rpc_client.hpp:978] read rpc body failed, error msg:End of file. close the socket.
I0417 19:46:42.567740   483 client.cpp:48] transport_type=tcp
I0417 19:46:42.568464   482 client.cpp:48] transport_type=tcp
I0417 19:46:42.568614   483 tcp_transport.cpp:224] TcpTransport: listen on port 16053
I0417 19:46:42.568714   483 allocator.cpp:99] initializing_simple_allocator size=1073741824
I0417 19:46:42.568770   483 allocator.cpp:138] simple_allocator_initialized pool_id=0
I0417 19:46:42.569110   482 tcp_transport.cpp:224] TcpTransport: listen on port 16517
I0417 19:46:42.569218   482 allocator.cpp:99] initializing_simple_allocator size=1073741824
I0417 19:46:42.569280   482 allocator.cpp:138] simple_allocator_initialized pool_id=0
E0417 19:46:42.569809   483 master_client.cpp:192] Failed to mount segment due to rpc error

whats wrong ?

i test the connect of moon master,is ok

root@zp-nc05:/data/deepseek-ai# telnet -e quit 172.16.0.11 50001
Telnet escape character is 'q'.
Trying 172.16.0.11...
Connected to 172.16.0.11.
Escape character is 'q'.
@@ xterm-256color
Connection closed by foreign host.

the context is :

pull vllm latest docker images public.ecr.aws/q9t5s3a7/vllm-ci-postmerge-repo:9dbf7a2dc1448d6657adfb2daba36be270dcebcd
install mooncake in the docker pip install mooncake-transfer-engine
mooncake.json

{
    "local_hostname": "172.16.0.11",
    "metadata_server": "etcd://172.16.0.11:12379",
    "protocol": "tcp",
    "device_name": "",
    "master_server_address": "172.16.0.11:50001"
}

run etcd
run mooncake master with docker image: docker pull alogfans/mooncake
run vllm with mooncake.json

ShangmingCai · 2025-04-18T06:11:29Z

@morty-zxb, can you provide the full log of the prefill node and decode node, and open an issue in the mooncake repo?

dalong2hongmei · 2025-04-21T07:42:55Z

any clue?

@dalong2hongmei You can provide a detailed log and open an issue in the Mooncake repo, we will help you to seek the root cause.

the logs are store on the corporate intrannet, hard to export them out to this...

i foud an issue posting the error logs exactly the same as mine kvcache-ai/Mooncake#180

i also want to figure out if any prerequisite in docker container tcp scenario(or k8s), for example, if it need hostnetwork between p/d instances?

it figured out it's my fault, i mistook the ip parameters. mooncake is a great work!

i got in another trouble, when i using vllm's serving_benchmark script to benchmark vllmserver with mooncake, i often found a large prompt throughput in my decoding instance's log, it also showed some pendding requests, i doubted it was doing prefilling in the decoding instance, is it normal? @ShangmingCai

ps: i found vllm's --enable-prefix-caching in conflict with mooncake, when i set this launching parameter, the instance would crash somehow when doing benchmark;

…cakeStore (vllm-project#12957) Signed-off-by: Shangming Cai <[email protected]>

…cakeStore (vllm-project#12957) Signed-off-by: Shangming Cai <[email protected]> Signed-off-by: Mu Huai <[email protected]>

Zhangmj0621 · 2025-07-14T13:10:47Z

@ShangmingCai Hello, I found that #15343 had stalled. Do you have any more plan on mooncake store v1 intergation, for example, mooncake store v1 implementation, layerwise transfer or global kvcache reuse? I would highly appreciate if any feedbacks from you.

ShangmingCai · 2025-07-14T14:24:45Z

@ShangmingCai Hello, I found that #15343 had stalled. Do you have any more plan on mooncake store v1 intergation, for example, mooncake store v1 implementation, layerwise transfer or global kvcache reuse? I would highly appreciate if any feedbacks from you.

@Zhangmj0621 Can you check this: https://github.com/kvcache-ai/Mooncake/blob/main/doc/en/lmcacheV1-deployment.md We have already integrated some impl through LMCache.

Zhangmj0621 · 2025-07-15T01:40:05Z

@ShangmingCai Hello, I found that #15343 had stalled. Do you have any more plan on mooncake store v1 intergation, for example, mooncake store v1 implementation, layerwise transfer or global kvcache reuse? I would highly appreciate if any feedbacks from you.

@Zhangmj0621 Can you check this: https://github.com/kvcache-ai/Mooncake/blob/main/doc/en/lmcacheV1-deployment.md We have already integrated some impl through LMCache.

Thanks for your reply! It was my mistake to think that lmcache and mooncake are two different systems.

Zhangmj0621 · 2025-07-15T01:51:54Z

pp:48] transport_type=tcp
I0417 19:46:42.568464 482 client.c

I have just one remaining question regarding the kvcache transfer mechanism: When utilizing both lmcache and mooncake store, is the transfer handled by NIXL or the Mooncake Transfer Engine?
I would highly appreciate any feedbacks from you.

ShangmingCai · 2025-07-15T02:54:30Z

pp:48] transport_type=tcp
I0417 19:46:42.568464 482 client.c

I have just one remaining question regarding the kvcache transfer mechanism: When utilizing both lmcache and mooncake store, is the transfer handled by NIXL or the Mooncake Transfer Engine? I would highly appreciate any feedbacks from you.

@Zhangmj0621 Actually, both LMCache and NIXL have integrated mooncake as a backend. You can try LMCache + MooncakeStore (which utilizes Mooncake Transfer Engine to transfer KVCache) or you can try LMCache + NIXL (choose Mooncake Transfer Engine as the backend).

Zhangmj0621 · 2025-07-15T03:12:01Z

pp:48] transport_type=tcp
I0417 19:46:42.568464 482 client.c

I have just one remaining question regarding the kvcache transfer mechanism: When utilizing both lmcache and mooncake store, is the transfer handled by NIXL or the Mooncake Transfer Engine? I would highly appreciate any feedbacks from you.

@Zhangmj0621 Actually, both LMCache and NIXL have integrated mooncake as a backend. You can try LMCache + MooncakeStore (which utilizes Mooncake Transfer Engine to transfer KVCache) or you can try LMCache + NIXL (choose Mooncake Transfer Engine as the backend).

Thanks for your detailed and kind response!

ShangmingCai added 4 commits February 7, 2025 15:27

Add KVTransferParams, modify all generate interfaces.

dbb2ab9

Signed-off-by: Shangming Cai <[email protected]>

Pass kv_transfer_params to model_input in model_runner.

d031995

Signed-off-by: Shangming Cai <[email protected]>

Add test for KVTransferParams.

3c306b4

Signed-off-by: Shangming Cai <[email protected]>

Merge branch 'main' into add_kv_transfer_params

ff4b89e

ShangmingCai requested review from alexm-redhat, comaniac, njhill, youkaichao and zhuohan123 as code owners February 8, 2025 10:42

mergify bot added the frontend label Feb 8, 2025

ShangmingCai added 5 commits February 8, 2025 18:56

fix ruff

56d80c9

Signed-off-by: Shangming Cai <[email protected]>

fix mypy

4d9cc92

Signed-off-by: Shangming Cai <[email protected]>

minor

bcab623

Signed-off-by: Shangming Cai <[email protected]>

fix msgspec type error

10c20c2

Signed-off-by: Shangming Cai <[email protected]>

fix v1 error

a45736e

Signed-off-by: Shangming Cai <[email protected]>

ShangmingCai force-pushed the add_kv_transfer_params branch from 6cdcc91 to a45736e Compare February 11, 2025 04:03

add args

88f81d8

Signed-off-by: Shangming Cai <[email protected]>

ShangmingCai changed the title ~~[WIP] Add KVTransferParams for disaggregated prefill feature~~ [Feature][Frontend] Add KVTransferParams for disaggregated prefill feature Feb 11, 2025

ShangmingCai requested review from WoosukKwon, robertgshaw2-redhat and ywang96 as code owners February 12, 2025 09:22

mergify bot added the v1 label Feb 12, 2025

verified that vllm-project#12959 can fix distributed-tests-4-gpus ci

1f194db

Signed-off-by: Shangming Cai <[email protected]>

ShangmingCai force-pushed the add_kv_transfer_params branch from 70db613 to 1f194db Compare February 12, 2025 11:15

ShangmingCai added 2 commits February 12, 2025 20:14

retrigger ci

0395307

Signed-off-by: Shangming Cai <[email protected]>

retrigger ci

0d8ef2c

Signed-off-by: Shangming Cai <[email protected]>

KuntaiDu reviewed Feb 12, 2025

View reviewed changes

vllm/entrypoints/openai/protocol.py Outdated Show resolved Hide resolved

vllm/entrypoints/openai/serving_chat.py Outdated Show resolved Hide resolved

vllm/kv_transfer_params.py Outdated Show resolved Hide resolved

KuntaiDu self-assigned this Feb 12, 2025

Merge remote-tracking branch 'upstream/main' into add_kv_transfer_params

8f96f29

vllm-bot merged commit 6fa7cd3 into vllm-project:main Mar 29, 2025
37 of 39 checks passed

maobaolong reviewed Mar 29, 2025

View reviewed changes

vllm/distributed/kv_transfer/kv_connector/mooncake_store_connector.py Show resolved Hide resolved

jikunshang reviewed Apr 1, 2025

View reviewed changes

examples/online_serving/disagg_examples/disagg_proxy_demo.py Show resolved Hide resolved

Zhou-sx reviewed Apr 3, 2025

View reviewed changes

billishyahao mentioned this pull request Apr 3, 2025

[Feature] add model aware kv ops helper #16020

Merged

ckhordiasma mentioned this pull request Apr 17, 2025

[do not merge] pr test for nm changes into 2.20 red-hat-data-services/vllm#107

Closed

lk-chen pushed a commit to lk-chen/vllm that referenced this pull request Apr 29, 2025

[Feature][Disaggregated] Support XpYd disaggregated prefill with Moon…

aeed5b1

…cakeStore (vllm-project#12957) Signed-off-by: Shangming Cai <[email protected]>

shreyankg pushed a commit to shreyankg/vllm that referenced this pull request May 3, 2025

[Feature][Disaggregated] Support XpYd disaggregated prefill with Moon…

9b0442e

…cakeStore (vllm-project#12957) Signed-off-by: Shangming Cai <[email protected]>

Uh oh!

[Feature][Disaggregated] Support XpYd disaggregated prefill with MooncakeStore #12957

[Feature][Disaggregated] Support XpYd disaggregated prefill with MooncakeStore #12957

Uh oh!

Conversation

ShangmingCai commented Feb 8, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented Feb 8, 2025

Uh oh!

ShangmingCai commented Feb 11, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Second222None commented Mar 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ShangmingCai commented Mar 28, 2025

Uh oh!

ShangmingCai commented Mar 29, 2025

Uh oh!

DarkLight1337 commented Mar 29, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Zhou-sx Apr 3, 2025

Choose a reason for hiding this comment

Uh oh!

ShangmingCai Apr 3, 2025

Choose a reason for hiding this comment

Uh oh!

dalong2hongmei commented Apr 15, 2025

Uh oh!

ShangmingCai commented Apr 15, 2025

Uh oh!

dalong2hongmei commented Apr 17, 2025

Uh oh!

XiaobinZhao commented Apr 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ShangmingCai commented Apr 18, 2025

Uh oh!

dalong2hongmei commented Apr 21, 2025

Uh oh!

Zhangmj0621 commented Jul 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ShangmingCai commented Jul 14, 2025

Uh oh!

Zhangmj0621 commented Jul 15, 2025

Uh oh!

Zhangmj0621 commented Jul 15, 2025

Uh oh!

ShangmingCai commented Jul 15, 2025

Uh oh!

Zhangmj0621 commented Jul 15, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

14 participants

ShangmingCai commented Feb 8, 2025 •

edited by github-actions bot

Loading

Second222None commented Mar 28, 2025 •

edited

Loading

XiaobinZhao commented Apr 18, 2025 •

edited

Loading

Zhangmj0621 commented Jul 14, 2025 •

edited

Loading