Skip to content

[XPU][P/D] Add XPU support in NixlConnector#22436

Merged
vllm-bot merged 4 commits intovllm-project:mainfrom
zhenwei-intel:xpu_pd
Sep 5, 2025
Merged

[XPU][P/D] Add XPU support in NixlConnector#22436
vllm-bot merged 4 commits intovllm-project:mainfrom
zhenwei-intel:xpu_pd

Conversation

@zhenwei-intel
Copy link
Contributor

@zhenwei-intel zhenwei-intel commented Aug 7, 2025

Support PD disaggregation on XPU based on #18293
Utilizing CPU memory as a buffer and performing point-to-point transmission via NIXL

Limitation

  • XPU attn kernels only support NHD layout.

Example

pip install nixl==0.3.0

# prefill node
ONEAPI_DEVICE_SELECTOR=level_zero:0 VLLM_USE_V1=1 VLLM_NIXL_SIDE_CHANNEL_HOST=localhost VLLM_NIXL_SIDE_CHANNEL_PORT=5577 VLLM_WORKER_MULTIPROC_METHOD=spawn vllm serve meta-llama/Meta-Llama-3-8B --host localhost --port 8100 --max-model-len 2048 --seed 42 --block-size 16 --enforce-eager --dtype float16 --gpu-memory-utilization 0.8 --disable-log-requests --kv-transfer-config '{"kv_connector":"NixlConnector","kv_role":"kv_both","kv_buffer_device":"cpu"}'

# decode node
ONEAPI_DEVICE_SELECTOR=level_zero:1 VLLM_USE_V1=1 VLLM_WORKER_MULTIPROC_METHOD=spawn vllm serve meta-llama/Meta-Llama-3-8B --host localhost --port 8200 --max-model-len 2048 --seed 42 --block-size 16 --enforce-eager --dtype float16 --gpu-memory-utilization 0.8 --disable-log-requests --kv-transfer-config '{"kv_connector":"NixlConnector","kv_role":"kv_both","kv_buffer_device":"cpu"}'

# start proxy server
python3 tests/v1/kv_connector/nixl_integration/toy_proxy_server.py   --prefiller-host localhost --prefiller-port 8100   --decoder-host localhost --decoder-port 8200   --host=localhost --port 8192

# send request 
curl http://localhost:8192/v1/completions -H "Content-Type: application/json" -d '{"model": "meta-llama/Meta-Llama-3-8B", "prompt": "Just making this request a little longer so that we\'re sure we\'re not hitting the small-request lower bound beneath which we don\'t actually trigger the whole kv transfer, but rather just recompute the blocks on D. ", "max_tokens": 20}'

Verify accuracy

  • A machine with at least two cards is required
  • Run bash tests/v1/kv_connector/nixl_integration/run_xpu_acc_test.sh

Test performace

Scenario:1K input & 256 output tokens & request-rate=1

python3 benchmarks/benchmark_serving.py --port 8192 --model meta-llama/Meta-Llama-3-8B --dataset-name random --random-input-len 1024 --random-output-len 256 --num-prompts 100  --request-rate 1 --ignore-eos

1P1D vs vLLM (TP=2) vs vLLM (TP=1):
image

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces XPU support for the NixlConnector, which is a great enhancement. The changes are mostly correct and follow the existing patterns in the codebase. However, I've found a critical bug in the tensor indexing logic for KV block copying in xpu_model_runner.py, which would lead to incorrect behavior. Additionally, the new test script for accuracy verification has a process cleanup mechanism that could be made more robust and safer for shared environments. I've provided suggestions to fix these issues.

@github-actions
Copy link

github-actions bot commented Aug 7, 2025

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

🚀

@mergify
Copy link

mergify bot commented Aug 11, 2025

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @zhenwei-intel.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

@mergify mergify bot added the needs-rebase label Aug 11, 2025
Copy link
Collaborator

@jikunshang jikunshang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall LGTM. Need one more commiter to approve.

@zhenwei-intel
Copy link
Contributor Author

Hi @njhill, could you help review this pr?

@jikunshang jikunshang added the ready ONLY add when PR is ready to merge/full CI is needed label Aug 22, 2025
@mergify
Copy link

mergify bot commented Aug 23, 2025

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @zhenwei-intel.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

@zhenwei-intel
Copy link
Contributor Author

cc @yaochengji @juncgu

@DarkLight1337
Copy link
Member

Please fix pre-commit

zhenwei-intel and others added 2 commits September 1, 2025 09:48
Signed-off-by: zhenwei <zhenwei.liu@intel.com>
Signed-off-by: zhenwei <zhenwei.liu@intel.com>
auto-merge was automatically disabled September 1, 2025 06:49

Head branch was pushed to by a user without write access

@zhenwei-intel
Copy link
Contributor Author

Please fix pre-commit

fixed, thanks

@jikunshang jikunshang enabled auto-merge (squash) September 4, 2025 04:37
@vllm-bot vllm-bot merged commit e599e2c into vllm-project:main Sep 5, 2025
39 of 43 checks passed
eicherseiji pushed a commit to eicherseiji/vllm that referenced this pull request Sep 9, 2025
Signed-off-by: zhenwei <zhenwei.liu@intel.com>
Co-authored-by: Kunshang Ji <kunshang.ji@intel.com>
skyloevil pushed a commit to skyloevil/vllm that referenced this pull request Sep 13, 2025
Signed-off-by: zhenwei <zhenwei.liu@intel.com>
Co-authored-by: Kunshang Ji <kunshang.ji@intel.com>
@mergify mergify bot added the kv-connector label Sep 24, 2025
FeiDaLI pushed a commit to FeiDaLI/vllm that referenced this pull request Sep 25, 2025
Signed-off-by: zhenwei <zhenwei.liu@intel.com>
Co-authored-by: Kunshang Ji <kunshang.ji@intel.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ci/build kv-connector ready ONLY add when PR is ready to merge/full CI is needed tpu Related to Google TPUs v1

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants