[XPU][P/D] Add XPU support in NixlConnector by zhenwei-intel · Pull Request #22436 · vllm-project/vllm

zhenwei-intel · 2025-08-07T07:02:36Z

Support PD disaggregation on XPU based on #18293
Utilizing CPU memory as a buffer and performing point-to-point transmission via NIXL

Limitation

XPU attn kernels only support NHD layout.

Example

pip install nixl==0.3.0

# prefill node
ONEAPI_DEVICE_SELECTOR=level_zero:0 VLLM_USE_V1=1 VLLM_NIXL_SIDE_CHANNEL_HOST=localhost VLLM_NIXL_SIDE_CHANNEL_PORT=5577 VLLM_WORKER_MULTIPROC_METHOD=spawn vllm serve meta-llama/Meta-Llama-3-8B --host localhost --port 8100 --max-model-len 2048 --seed 42 --block-size 16 --enforce-eager --dtype float16 --gpu-memory-utilization 0.8 --disable-log-requests --kv-transfer-config '{"kv_connector":"NixlConnector","kv_role":"kv_both","kv_buffer_device":"cpu"}'

# decode node
ONEAPI_DEVICE_SELECTOR=level_zero:1 VLLM_USE_V1=1 VLLM_WORKER_MULTIPROC_METHOD=spawn vllm serve meta-llama/Meta-Llama-3-8B --host localhost --port 8200 --max-model-len 2048 --seed 42 --block-size 16 --enforce-eager --dtype float16 --gpu-memory-utilization 0.8 --disable-log-requests --kv-transfer-config '{"kv_connector":"NixlConnector","kv_role":"kv_both","kv_buffer_device":"cpu"}'

# start proxy server
python3 tests/v1/kv_connector/nixl_integration/toy_proxy_server.py   --prefiller-host localhost --prefiller-port 8100   --decoder-host localhost --decoder-port 8200   --host=localhost --port 8192

# send request 
curl http://localhost:8192/v1/completions -H "Content-Type: application/json" -d '{"model": "meta-llama/Meta-Llama-3-8B", "prompt": "Just making this request a little longer so that we\'re sure we\'re not hitting the small-request lower bound beneath which we don\'t actually trigger the whole kv transfer, but rather just recompute the blocks on D. ", "max_tokens": 20}'

Verify accuracy

A machine with at least two cards is required
Run bash tests/v1/kv_connector/nixl_integration/run_xpu_acc_test.sh

Test performace

Scenario:1K input & 256 output tokens & request-rate=1

python3 benchmarks/benchmark_serving.py --port 8192 --model meta-llama/Meta-Llama-3-8B --dataset-name random --random-input-len 1024 --random-output-len 256 --num-prompts 100  --request-rate 1 --ignore-eos

1P1D vs vLLM (TP=2) vs vLLM (TP=1):

gemini-code-assist

Code Review

This pull request introduces XPU support for the NixlConnector, which is a great enhancement. The changes are mostly correct and follow the existing patterns in the codebase. However, I've found a critical bug in the tensor indexing logic for KV block copying in xpu_model_runner.py, which would lead to incorrect behavior. Additionally, the new test script for accuracy verification has a process cleanup mechanism that could be made more robust and safer for shared environments. I've provided suggestions to fix these issues.

vllm/v1/worker/xpu_model_runner.py

tests/v1/kv_connector/nixl_integration/run_xpu_disagg_accuracy_test.sh

github-actions · 2025-08-07T07:24:05Z

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

🚀

vllm/platforms/xpu.py

vllm/v1/worker/xpu_model_runner.py

mergify · 2025-08-11T00:54:17Z

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @zhenwei-intel.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

tests/v1/kv_connector/nixl_integration/run_xpu_disagg_accuracy_test.sh

jikunshang

Overall LGTM. Need one more commiter to approve.

vllm/distributed/kv_transfer/kv_connector/v1/base.py

vllm/distributed/kv_transfer/kv_connector/v1/nixl_connector.py

zhenwei-intel · 2025-08-21T03:32:36Z

Hi @njhill, could you help review this pr?

vllm/v1/worker/gpu_model_runner.py

mergify · 2025-08-23T03:03:12Z

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @zhenwei-intel.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

zhenwei-intel · 2025-08-26T01:33:21Z

cc @yaochengji @juncgu

DarkLight1337 · 2025-09-01T06:17:31Z

Please fix pre-commit

Signed-off-by: zhenwei <zhenwei.liu@intel.com>

zhenwei-intel · 2025-09-01T06:50:51Z

Please fix pre-commit

fixed, thanks

Signed-off-by: zhenwei <zhenwei.liu@intel.com> Co-authored-by: Kunshang Ji <kunshang.ji@intel.com>

zhenwei-intel requested review from WoosukKwon, alexm-redhat, comaniac, jikunshang, njhill, robertgshaw2-redhat and ywang96 as code owners August 7, 2025 07:02

mergify bot added ci/build v1 labels Aug 7, 2025

gemini-code-assist bot reviewed Aug 7, 2025

View reviewed changes

vllm/v1/worker/xpu_model_runner.py Outdated Show resolved Hide resolved

vllm/v1/worker/xpu_model_runner.py Outdated Show resolved Hide resolved

tests/v1/kv_connector/nixl_integration/run_xpu_disagg_accuracy_test.sh Outdated Show resolved Hide resolved

jikunshang reviewed Aug 7, 2025

View reviewed changes

vllm/platforms/xpu.py Outdated Show resolved Hide resolved

vllm/v1/worker/xpu_model_runner.py Outdated Show resolved Hide resolved

mergify bot added the needs-rebase label Aug 11, 2025

jikunshang reviewed Aug 11, 2025

View reviewed changes

tests/v1/kv_connector/nixl_integration/run_xpu_disagg_accuracy_test.sh Outdated Show resolved Hide resolved

mergify bot added the tpu Related to Google TPUs label Aug 11, 2025

zhenwei-intel force-pushed the xpu_pd branch from e4bc380 to 04924ae Compare August 11, 2025 03:17

mergify bot removed the needs-rebase label Aug 11, 2025

yuanwu2017 mentioned this pull request Aug 18, 2025

Enable intel XPU example llm-d-incubation/llm-d-modelservice#90

Merged

jikunshang reviewed Aug 20, 2025

View reviewed changes

vllm/distributed/kv_transfer/kv_connector/v1/base.py Outdated Show resolved Hide resolved

vllm/distributed/kv_transfer/kv_connector/v1/nixl_connector.py Outdated Show resolved Hide resolved

zhenwei-intel force-pushed the xpu_pd branch from f11d6dc to 44984b6 Compare August 21, 2025 04:15

jikunshang approved these changes Aug 22, 2025

View reviewed changes

jikunshang added the ready ONLY add when PR is ready to merge/full CI is needed label Aug 22, 2025

jikunshang reviewed Aug 22, 2025

View reviewed changes

vllm/v1/worker/gpu_model_runner.py Outdated Show resolved Hide resolved

mergify bot added the needs-rebase label Aug 23, 2025

zhenwei-intel force-pushed the xpu_pd branch from cef0f73 to 5a41bc1 Compare August 23, 2025 06:40

mergify bot removed the needs-rebase label Aug 23, 2025

zhenwei-intel force-pushed the xpu_pd branch from 5a41bc1 to b5f4f68 Compare August 23, 2025 08:05

zhenwei-intel force-pushed the xpu_pd branch from b5f4f68 to a82ecf0 Compare August 26, 2025 01:31

zhenwei-intel force-pushed the xpu_pd branch from a82ecf0 to 4aaa32f Compare August 29, 2025 01:07

DarkLight1337 enabled auto-merge (squash) September 1, 2025 06:08

zhenwei-intel and others added 2 commits September 1, 2025 09:48

[XPU][P/D] Add XPU support to Nixl connector

9ebfc39

Signed-off-by: zhenwei <zhenwei.liu@intel.com>

fix pre commit

a39caed

Signed-off-by: zhenwei <zhenwei.liu@intel.com>

auto-merge was automatically disabled September 1, 2025 06:49
Head branch was pushed to by a user without write access

zhenwei-intel force-pushed the xpu_pd branch from 35de37b to a39caed Compare September 1, 2025 06:50

Merge branch 'main' into xpu_pd

766a070

jikunshang enabled auto-merge (squash) September 4, 2025 04:37

Merge branch 'main' into xpu_pd

a01c9f4

vllm-bot merged commit e599e2c into vllm-project:main Sep 5, 2025
39 of 43 checks passed

eicherseiji pushed a commit to eicherseiji/vllm that referenced this pull request Sep 9, 2025

[XPU][P/D] Add XPU support in NixlConnector (vllm-project#22436)

2ea06dd

Signed-off-by: zhenwei <zhenwei.liu@intel.com> Co-authored-by: Kunshang Ji <kunshang.ji@intel.com>

chenxi-yang mentioned this pull request Sep 11, 2025

[Cuda2CPU][P/D] Add cuda2cpu support in NixlConnector #24619

Closed

chenxi-yang mentioned this pull request Sep 11, 2025

[Cuda2CPU][P/D] Add cuda2cpu support in NixlConnector #24690

Merged

skyloevil pushed a commit to skyloevil/vllm that referenced this pull request Sep 13, 2025

[XPU][P/D] Add XPU support in NixlConnector (vllm-project#22436)

b238734

Signed-off-by: zhenwei <zhenwei.liu@intel.com> Co-authored-by: Kunshang Ji <kunshang.ji@intel.com>

mergify bot added the kv-connector label Sep 24, 2025

FeiDaLI pushed a commit to FeiDaLI/vllm that referenced this pull request Sep 25, 2025

[XPU][P/D] Add XPU support in NixlConnector (vllm-project#22436)

e1ae64e

Signed-off-by: zhenwei <zhenwei.liu@intel.com> Co-authored-by: Kunshang Ji <kunshang.ji@intel.com>

Uh oh!

Conversation

zhenwei-intel commented Aug 7, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Limitation

Example

Verify accuracy

Test performace

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

github-actions bot commented Aug 7, 2025

Uh oh!

Uh oh!

Uh oh!

mergify bot commented Aug 11, 2025

Uh oh!

Uh oh!

jikunshang left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

zhenwei-intel commented Aug 21, 2025

Uh oh!

Uh oh!

mergify bot commented Aug 23, 2025

Uh oh!

zhenwei-intel commented Aug 26, 2025

Uh oh!

DarkLight1337 commented Sep 1, 2025

Uh oh!

zhenwei-intel commented Sep 1, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

zhenwei-intel commented Aug 7, 2025 •

edited by github-actions bot

Loading