[NIXL] refine decoder side post process for heterogeneous BlockSize and kv_layout by xuechendi · Pull Request #30275 · vllm-project/vllm

xuechendi · 2025-12-08T19:34:15Z

Purpose

We have supported heterogeneous BlockSize and kv_layout in seperate post process methods.
This PR is to clean up and use single method to post_process for cases.

What is changed in this PR:

I removed permute_device_kv and blocksize_post_process, and move the logic into post_process_device_kv_on_receive as single post_process function with 3 options:

if enable_permute_local_kv and block_size_ratio > 1:
    _kv_postprocess_blksize_and_layout(
        cache, indices, block_size_ratio
    )
elif enable_permute_local_kv:
    _kv_postprocess_layout(cache, indices)
else:
    _kv_postprocess_blksize(cache, indices, block_size_ratio)

Test Plan

Test with heterogeneous KV_layout + heterogeneous block_size

DECODER_KV_LAYOUT=NHD PREFILL_BLOCK_SIZE=16 DECODE_BLOCK_SIZE=64 bash tests/v1/kv_connector/nixl_integration/run_accuracy_test.sh 2>&1 | tee nixl_hetero_layout_blksize.log

GPU_MEMORY_UTILIZATION=0.8 MODEL_NAMES="deepseek-ai/DeepSeek-V2-Lite-Chat" DECODER_KV_LAYOUT=NHD PREFILL_BLOCK_SIZE=16 DECODE_BLOCK_SIZE=64 bash tests/v1/kv_connector/nixl_integration/run_accuracy_test.sh 2>&1 | tee nixl_hetero_layout_blksize_MLA.log

=> Passed accuracy test

Test with heterogeneous KV_layout

DECODER_KV_LAYOUT=NHD PREFILL_BLOCK_SIZE=16 DECODE_BLOCK_SIZE=16 bash tests/v1/kv_connector/nixl_integration/run_accuracy_test.sh 2>&1 | tee nixl_hetero_layout_blksize.log

GPU_MEMORY_UTILIZATION=0.8 MODEL_NAMES="deepseek-ai/DeepSeek-V2-Lite-Chat" DECODER_KV_LAYOUT=NHD PREFILL_BLOCK_SIZE=16 DECODE_BLOCK_SIZE=16 bash tests/v1/kv_connector/nixl_integration/run_accuracy_test.sh 2>&1 | tee nixl_hetero_layout_blksize_MLA.log

=> Passed accuracy test

Test with heterogeneous heterogeneous block_size

PREFILL_BLOCK_SIZE=16 DECODE_BLOCK_SIZE=64 bash tests/v1/kv_connector/nixl_integration/run_accuracy_test.sh 2>&1 | tee nixl_hetero_layout_blksize.log

GPU_MEMORY_UTILIZATION=0.8 MODEL_NAMES="deepseek-ai/DeepSeek-V2-Lite-Chat" PREFILL_BLOCK_SIZE=16 DECODE_BLOCK_SIZE=64 bash tests/v1/kv_connector/nixl_integration/run_accuracy_test.sh 2>&1 | tee nixl_hetero_layout_blksize_MLA.log

=> Passed accuracy test

Test Result

Essential Elements of an Effective PR Description Checklist

The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
The test plan, such as providing test command.
The test results, such as pasting the results comparison before and after, or e2e results
(Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
(Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

Note

^{Cursor Bugbot is generating a summary for commit bff8eaa. Configure here.}

Note

Consolidates decoder-side KV post-processing into a single path with shared utils to handle heterogeneous block size and layout.

Adds kv_postprocess_blksize_on_receive, kv_postprocess_layout_on_receive, and kv_postprocess_blksize_and_layout_on_receive in utils.py
Replaces permute_device_kv and blocksize_post_process with unified post_process_device_kv_on_receive in nixl_connector.py, selecting behavior based on enable_permute_local_kv and block_size_ratio
Updates get_finished to batch block IDs per ratio and invoke the new post-process; minor logging and tensor creation tweaks

^{Written by Cursor Bugbot for commit bff8eaa. This will update automatically on new commits. Configure here.}

Note

^{Cursor Bugbot is generating a summary for commit 15ff574. Configure here.}

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

vllm/distributed/kv_transfer/kv_connector/v1/nixl_connector.py

gemini-code-assist

Code Review

This pull request refactors the post-processing logic for heterogeneous BlockSize and kv_layout, which is a good direction for code cleanup. However, the implementation introduces several issues. There are critical bugs in the tensor reshape operations within the new helper functions (_kv_postprocess_layout, _kv_postprocess_blksize, and _kv_postprocess_blksize_and_layout), which will likely lead to runtime errors or corrupted KV cache data. Additionally, there's a redundant index_select operation that should be removed to improve performance. These issues need to be addressed to ensure the correctness and efficiency of the new implementation.

vllm/distributed/kv_transfer/kv_connector/v1/nixl_connector.py

mergify · 2025-12-16T09:43:50Z

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @xuechendi.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

mergify · 2025-12-17T22:35:15Z

Hi @xuechendi, the pre-commit checks have failed. Please run:

uv pip install pre-commit
pre-commit install
pre-commit run --all-files

Then, commit the changes and push to your branch.

For future commits, pre-commit will run automatically on changed files before each commit.

Tip

Is mypy or markdownlint failing?

mypy and markdownlint are run differently in CI. If the failure is related to either of these checks, please use the following commands to run them locally:

# For mypy (substitute "3.10" with the failing version if needed)
pre-commit run --hook-stage manual mypy-3.10
# For markdownlint
pre-commit run --hook-stage manual markdownlint

mergify · 2025-12-18T12:04:40Z

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @xuechendi.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

Signed-off-by: Chendi Xue <chendi.xue@intel.com>

NickLucche

Thanks for refactoring this @xuechendi !
Left a few comments, overall this is looking pretty good.

tests/v1/kv_connector/nixl_integration/run_accuracy_test.sh

vllm/distributed/kv_transfer/kv_connector/v1/nixl_connector.py

NickLucche · 2025-12-19T16:33:47Z

vllm/distributed/kv_transfer/kv_connector/v1/nixl_connector.py

-                            blocks_to_update.permute(0, 2, 1, 3), block_size_ratio
-                        ).permute(0, 2, 1, 3)
-                        cache.index_copy_(0, indices, permuted_blocks)
+        device = sample_cache.device


isn't this self.device?

NickLucche · 2025-12-19T16:35:01Z

vllm/distributed/kv_transfer/kv_connector/v1/nixl_connector.py

-        split_k_and_v = not (
-            self.use_mla or self._use_pallas or self.kv_topo.is_kv_layout_blocks_first
-        )
+        assert block_size_ratio >= 1, "Only nP < nD supported currently."


we could probably use debug log here stating what's being post-processed

I used logger.info_once(), is that ok?

I think they serve two different purposes, a debug log would provide info on the proceeding of the transfer operation per-request which I think is ok being debug.
info_once may still be useful for the end user, although in theory we could later allow deployments where P1 block_size != P2 block_size=D block_size, hence the log info_once would fall short in reporting that.

vllm/distributed/kv_transfer/kv_connector/v1/nixl_connector.py

NickLucche · 2025-12-19T16:40:17Z

vllm/distributed/kv_transfer/kv_connector/v1/nixl_connector.py

+    ):
+        def _kv_postprocess_blksize(cache, indices, block_size_ratio):


We could add a short comment at the top of post_process_device_kv_on_receive now that the 3 functions can be moved out

vllm/distributed/kv_transfer/kv_connector/v1/nixl_connector.py

NickLucche · 2025-12-19T16:43:45Z

vllm/distributed/kv_transfer/kv_connector/v1/nixl_connector.py

            ):
                block_ids_for_blocksize_post_process[block_size_ratio].append(
-                    meta.local_block_ids
+                    meta.local_physical_block_ids


ok this was a bug then right

yes, I missed that in previous PR

vllm/distributed/kv_transfer/kv_connector/v1/nixl_connector.py

mergify · 2025-12-19T18:32:31Z

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @xuechendi.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

Signed-off-by: Chendi Xue <chendi.xue@intel.com>

…ocess Signed-off-by: Chendi Xue <chendi.xue@intel.com>

xuechendi · 2025-12-19T19:48:23Z

@NickLucche , Thanks for the review, I have resolved all comments and rebased.

mergify · 2026-01-08T07:39:53Z

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @xuechendi.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

Signed-off-by: Chendi Xue <chendi.xue@intel.com>

NickLucche

Just a nit on logging. LGTM , thanks @xuechendi !

NickLucche · 2026-01-09T09:57:52Z

vllm/distributed/kv_transfer/kv_connector/v1/nixl_connector.py

-        split_k_and_v = not (
-            self.use_mla or self._use_pallas or self.kv_topo.is_kv_layout_blocks_first
-        )
+        assert block_size_ratio >= 1, "Only nP < nD supported currently."


I think they serve two different purposes, a debug log would provide info on the proceeding of the transfer operation per-request which I think is ok being debug.
info_once may still be useful for the end user, although in theory we could later allow deployments where P1 block_size != P2 block_size=D block_size, hence the log info_once would fall short in reporting that.

Signed-off-by: Chendi Xue <chendi.xue@intel.com>

xuechendi · 2026-01-09T17:10:59Z

Just a nit on logging. LGTM , thanks @xuechendi !

Thanks, @NickLucche , you're right, I updated it to logger.debug

vllm/distributed/kv_transfer/kv_connector/utils.py

vllm/distributed/kv_transfer/kv_connector/v1/nixl_connector.py

…nd kv_layout (vllm-project#30275)

…nd kv_layout (vllm-project#30275) Signed-off-by: dsuhinin <suhinin.dmitriy@gmail.com>

…nd kv_layout (vllm-project#30275)

xuechendi requested review from ApostaC and NickLucche as code owners December 8, 2025 19:34

mergify bot added v1 kv-connector labels Dec 8, 2025

chatgpt-codex-connector bot reviewed Dec 8, 2025

View reviewed changes

vllm/distributed/kv_transfer/kv_connector/v1/nixl_connector.py Outdated Show resolved Hide resolved

gemini-code-assist bot reviewed Dec 8, 2025

View reviewed changes

xuechendi mentioned this pull request Dec 8, 2025

[RFC]: Nixl Connector Heterogeneous BlockSize support #26744

Open

1 task

xuechendi force-pushed the dev/decode_KV_post_process branch from b405900 to edc6d6e Compare December 8, 2025 19:56

heheda12345 assigned NickLucche Dec 9, 2025

mergify bot added the needs-rebase label Dec 16, 2025

xuechendi force-pushed the dev/decode_KV_post_process branch from edc6d6e to 1753de4 Compare December 17, 2025 22:30

mergify bot removed the needs-rebase label Dec 17, 2025

xuechendi force-pushed the dev/decode_KV_post_process branch from 1753de4 to 010e76a Compare December 17, 2025 22:43

mergify bot added the needs-rebase label Dec 18, 2025

xuechendi mentioned this pull request Dec 18, 2025

Add heterogeneous pd docs vllm-project/vllm-gaudi#714

Draft

Clean up post_process when received on decoder side

002a105

Signed-off-by: Chendi Xue <chendi.xue@intel.com>

xuechendi force-pushed the dev/decode_KV_post_process branch from 010e76a to 002a105 Compare December 18, 2025 18:05

mergify bot removed the needs-rebase label Dec 18, 2025

NickLucche reviewed Dec 19, 2025

View reviewed changes

mergify bot added the needs-rebase label Dec 19, 2025

xuechendi added 2 commits December 19, 2025 10:43

fix comments

bf0feba

Signed-off-by: Chendi Xue <chendi.xue@intel.com>

Merge remote-tracking branch 'origin/main' into dev/decode_KV_post_pr…

f0befd7

…ocess Signed-off-by: Chendi Xue <chendi.xue@intel.com>

xuechendi force-pushed the dev/decode_KV_post_process branch from 1d2394d to f0befd7 Compare December 19, 2025 19:43

mergify bot removed the needs-rebase label Dec 19, 2025

xuechendi requested a review from NickLucche December 19, 2025 19:46

Merge branch 'main' into dev/decode_KV_post_process

7c8104c

mergify bot added the needs-rebase label Jan 8, 2026

Merge remote-tracking branch 'origin' into dev/decode_KV_post_process

a7f6524

Signed-off-by: Chendi Xue <chendi.xue@intel.com>

mergify bot removed the needs-rebase label Jan 8, 2026

NickLucche approved these changes Jan 9, 2026

View reviewed changes

xuechendi added 2 commits January 9, 2026 08:57

update info_once to debug

bff8eaa

Signed-off-by: Chendi Xue <chendi.xue@intel.com>

Merge branch 'main' into dev/decode_KV_post_process

15ff574

cursor bot reviewed Jan 9, 2026

View reviewed changes

vllm/distributed/kv_transfer/kv_connector/utils.py Show resolved Hide resolved

vllm/distributed/kv_transfer/kv_connector/v1/nixl_connector.py Show resolved Hide resolved

NickLucche approved these changes Jan 9, 2026

View reviewed changes

NickLucche enabled auto-merge (squash) January 9, 2026 19:21

github-actions bot added the ready ONLY add when PR is ready to merge/full CI is needed label Jan 9, 2026

NickLucche merged commit 9457812 into vllm-project:main Jan 9, 2026
56 checks passed

akh64bit pushed a commit to akh64bit/vllm that referenced this pull request Jan 16, 2026

[NIXL] refine decoder side post process for heterogeneous BlockSize a…

57f611a

…nd kv_layout (vllm-project#30275)

dsuhinin pushed a commit to dsuhinin/vllm that referenced this pull request Jan 21, 2026

[NIXL] refine decoder side post process for heterogeneous BlockSize a…

fcee594

…nd kv_layout (vllm-project#30275) Signed-off-by: dsuhinin <suhinin.dmitriy@gmail.com>

NickLucche mentioned this pull request Feb 3, 2026

[Roadmap]: PD Disaggregation with NixlConnector Roadmap #33702

Open

44 tasks

ItzDEXX pushed a commit to ItzDEXX/vllm that referenced this pull request Feb 19, 2026

[NIXL] refine decoder side post process for heterogeneous BlockSize a…

c37ceb1

…nd kv_layout (vllm-project#30275)

		):
		def _kv_postprocess_blksize(cache, indices, block_size_ratio):

Uh oh!

Conversation

xuechendi commented Dec 8, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

What is changed in this PR:

Test Plan

Test Result

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

mergify bot commented Dec 16, 2025

Uh oh!

mergify bot commented Dec 17, 2025

Uh oh!

mergify bot commented Dec 18, 2025

Uh oh!

NickLucche left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

mergify bot commented Dec 19, 2025

Uh oh!

xuechendi commented Dec 19, 2025

Uh oh!

mergify bot commented Jan 8, 2026

Uh oh!

NickLucche left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

xuechendi commented Jan 9, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

xuechendi commented Dec 8, 2025 •

edited by github-actions bot

Loading