Enable Cross layers KV cache layout at NIXL Connector V2 by liranschour · Pull Request #33339 · vllm-project/vllm

liranschour · 2026-01-29T12:06:30Z

Purpose

Enable NIXL Connector to us the new continuous cross layer KV cache layout described in RFC and implemented in #27743

Demonstrate performance improvement of more the 2x in Tok/sec and TTFT due to dramatic reduction of fragmentation of transfer buffers.

Tested with P!=D with run_accuracy_test.sh P=1 D=2

branch	num reqs	input len	TTFT	ITL	tok/s	Desc/transfer
main	1000	16	18756.42	5.35	5288.41	56
kv_cross_layers	1000	16	8494.44	8.74	8572.56	1
main	128	1024	1660.84	9.40	37945.20	3528
kv_cross_layers	128	1024	686.98	9.26	55418.76	1
main	128	10240	11140.52	42.78	62339.74	34000
kv_cross_layers	128	10240	5226.71	14.41	117631.48	422

Test Plan

Test Result

Essential Elements of an Effective PR Description Checklist

The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
The test plan, such as providing test command.
The test results, such as pasting the results comparison before and after, or e2e results
(Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
(Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

Signed-off-by: Liran Schour <lirans@il.ibm.com>

Co-authored-by: Or Ozeri <or@ozery.com> Signed-off-by: liranschour <liranschour@users.noreply.github.com>

Signed-off-by: Liran Schour <lirans@il.ibm.com>

NickLucche · 2026-02-02T11:41:46Z

tests/v1/kv_connector/nixl_integration/config_sweep_accuracy_test.sh

+  "CROSS_LAYERS_BLOCKS=True GPU_MEMORY_UTILIZATION=0.8 MODEL_NAMES=deepseek-ai/deepseek-vl2-tiny" # MLA case
+  "CROSS_LAYERS_BLOCKS=True GPU_MEMORY_UTILIZATION=0.8 PREFILLER_TP_SIZE=1 DECODER_TP_SIZE=2 MODEL_NAMES=deepseek-ai/deepseek-vl2-tiny"
+  "CROSS_LAYERS_BLOCKS=True GPU_MEMORY_UTILIZATION=0.8 PREFILLER_TP_SIZE=2 DECODER_TP_SIZE=1 MODEL_NAMES=deepseek-ai/deepseek-vl2-tiny"
+)


you can refactor to just add CROSS_LAYERS_BLOCKS=True to tp_configs, assuming all above are compatible.

Signed-off-by: Liran Schour <lirans@il.ibm.com>

NickLucche

LGTM, only one nit

NickLucche · 2026-02-02T16:43:26Z

tests/v1/kv_connector/nixl_integration/config_sweep_accuracy_test.sh

+else
+  echo "CROSS_LAYERS_BLOCKS is not set, skipping --enable-cross-layers runs."
+fi


nit: no need to echo out disabled options imo

Removed that echo

NickLucche · 2026-02-02T16:52:49Z

vllm/distributed/kv_transfer/kv_connector/utils.py

+            if current_platform.device_type != "cpu"
+            else -2


qq: this is untested on cpu right?

I don't think we we need this special case.
We should be able to correctly set block_size_position using test_shape even when running on CPU.

Removed this special case.
Setting block_size_position only by kv_cache_shape.

orozery · 2026-02-03T07:52:24Z

tests/v1/kv_connector/unit/test_nixl_connector.py

+        expected_base_addrs: list[int]
+        expected_num_entries: int
+        kv_caches: dict[str, torch.Tensor]
+        if connector.prefer_cross_layer_blocks:


This assumes that connector.prefer_cross_layer_blocks was correctly parsed of the test enable_cross_layers parameter.
Can you assert that?

Added an assert for that

orozery · 2026-02-03T08:08:13Z

vllm/distributed/kv_transfer/kv_connector/utils.py

+            if current_platform.device_type != "cpu"
+            else -2


I don't think we we need this special case.
We should be able to correctly set block_size_position using test_shape even when running on CPU.

Signed-off-by: Liran Schour <lirans@il.ibm.com>

vllm/distributed/kv_transfer/kv_connector/utils.py

Signed-off-by: Liran Schour <lirans@il.ibm.com>

…t#33339) Signed-off-by: Liran Schour <lirans@il.ibm.com> Signed-off-by: liranschour <liranschour@users.noreply.github.com> Co-authored-by: Or Ozeri <or@ozery.com> Co-authored-by: Nicolò Lucchesi <nicolo.lucchesi@gmail.com> Co-authored-by: Nicolò Lucchesi <nlucches@redhat.com>

liranschour and others added 30 commits December 7, 2025 13:31

Cross layers implementation

5195265

Signed-off-by: Liran Schour <lirans@il.ibm.com>

Fix linting

0f36888

Signed-off-by: Liran Schour <lirans@il.ibm.com>

Add cross layers compatibility check

8d36b4b

Signed-off-by: Liran Schour <lirans@il.ibm.com>

Move cross_layers logic into TpKVTopology

2a20197

Signed-off-by: Liran Schour <lirans@il.ibm.com>

Code review minor fix

073b30e

Signed-off-by: Liran Schour <lirans@il.ibm.com>

Linting...

b403a9e

Signed-off-by: Liran Schour <lirans@il.ibm.com>

Code review fixes

06d3184

Signed-off-by: Liran Schour <lirans@il.ibm.com>

Update vllm/distributed/kv_transfer/kv_connector/utils.py

cd27866

Co-authored-by: Or Ozeri <or@ozery.com> Signed-off-by: liranschour <liranschour@users.noreply.github.com>

Update vllm/distributed/kv_transfer/kv_connector/utils.py

19319af

Co-authored-by: Or Ozeri <or@ozery.com> Signed-off-by: liranschour <liranschour@users.noreply.github.com>

Update vllm/distributed/kv_transfer/kv_connector/v1/nixl_connector.py

994bf1d

Co-authored-by: Or Ozeri <or@ozery.com> Signed-off-by: liranschour <liranschour@users.noreply.github.com>

Update vllm/distributed/kv_transfer/kv_connector/v1/nixl_connector.py

0efeba3

Co-authored-by: Or Ozeri <or@ozery.com> Signed-off-by: liranschour <liranschour@users.noreply.github.com>

Update vllm/distributed/kv_transfer/kv_connector/v1/nixl_connector.py

ef8e7ad

Co-authored-by: Or Ozeri <or@ozery.com> Signed-off-by: liranschour <liranschour@users.noreply.github.com>

Code review fixes

6e2b751

Signed-off-by: Liran Schour <lirans@il.ibm.com>

Code review fix

5e66e8f

Signed-off-by: Liran Schour <lirans@il.ibm.com>

Code review fix

eaf5e3d

Signed-off-by: Liran Schour <lirans@il.ibm.com>

Code review fix

e85f458

Signed-off-by: Liran Schour <lirans@il.ibm.com>

Merge remote-tracking branch 'vllm/main' into nixl_kv_cont_cross_layers

fff0935

Signed-off-by: Liran Schour <lirans@il.ibm.com>

Code review fix

9bd9598

Signed-off-by: Liran Schour <lirans@il.ibm.com>

Merge remote-tracking branch 'vllm/main' into nixl_kv_cont_cross_layers

cd57ed8

Signed-off-by: Liran Schour <lirans@il.ibm.com>

n/a

f153e83

Signed-off-by: Liran Schour <lirans@il.ibm.com>

n/a

15f2a78

Signed-off-by: Liran Schour <lirans@il.ibm.com>

n/a

0cb1825

Signed-off-by: Liran Schour <lirans@il.ibm.com>

n/a

9630c8e

Signed-off-by: Liran Schour <lirans@il.ibm.com>

n/a

c148f6d

Signed-off-by: Liran Schour <lirans@il.ibm.com>

n/a

96329f6

Signed-off-by: Liran Schour <lirans@il.ibm.com>

Merge remote-tracking branch 'vllm/main' into nixl_kv_cont_cross_layers

0394b36

Signed-off-by: Liran Schour <lirans@il.ibm.com>

Unit test fix

b4d7045

Signed-off-by: Liran Schour <lirans@il.ibm.com>

Unit test fix

52b1155

Signed-off-by: Liran Schour <lirans@il.ibm.com>

Unit test fix

e34db34

Signed-off-by: Liran Schour <lirans@il.ibm.com>

Unit test fix

edc0755

Signed-off-by: Liran Schour <lirans@il.ibm.com>

Added CI tests

858f856

Signed-off-by: Liran Schour <lirans@il.ibm.com>

NickLucche reviewed Feb 2, 2026

View reviewed changes

liranschour added 3 commits February 2, 2026 11:51

n/a

ff7e8c7

Signed-off-by: Liran Schour <lirans@il.ibm.com>

n/a

08f1ac4

Signed-off-by: Liran Schour <lirans@il.ibm.com>

CI fix

1c734a9

Signed-off-by: Liran Schour <lirans@il.ibm.com>

NickLucche approved these changes Feb 2, 2026

View reviewed changes

orozery reviewed Feb 3, 2026

View reviewed changes

liranschour added 9 commits February 3, 2026 08:34

Code review fix

e51a290

Signed-off-by: Liran Schour <lirans@il.ibm.com>

Code review fix

0ba9b05

Signed-off-by: Liran Schour <lirans@il.ibm.com>

n/a

60cf003

Signed-off-by: Liran Schour <lirans@il.ibm.com>

n/a

b95de28

Signed-off-by: Liran Schour <lirans@il.ibm.com>

n/a

6142cfb

Signed-off-by: Liran Schour <lirans@il.ibm.com>

n/a

f210774

Signed-off-by: Liran Schour <lirans@il.ibm.com>

n/a

e1832cb

Signed-off-by: Liran Schour <lirans@il.ibm.com>

n/a

b8252dd

Signed-off-by: Liran Schour <lirans@il.ibm.com>

n/a

bd9f1c4

Signed-off-by: Liran Schour <lirans@il.ibm.com>

orozery reviewed Feb 3, 2026

View reviewed changes

vllm/distributed/kv_transfer/kv_connector/utils.py Show resolved Hide resolved

orozery added the ready ONLY add when PR is ready to merge/full CI is needed label Feb 3, 2026

liranschour and others added 2 commits February 3, 2026 13:46

n/a

4652655

Signed-off-by: Liran Schour <lirans@il.ibm.com>

Merge branch 'main' into nixl_kv_cont_cross_layers

16b3d06

liranschour requested a review from orozery February 3, 2026 14:17

orozery approved these changes Feb 3, 2026

View reviewed changes

Merge branch 'main' into nixl_kv_cont_cross_layers

fa0ee18

NickLucche enabled auto-merge (squash) February 4, 2026 22:55

retrigger tests

ac8903f

Signed-off-by: Liran Schour <lirans@il.ibm.com>

NickLucche merged commit 8322d4e into vllm-project:main Feb 5, 2026
48 checks passed

ZhanqiuHu mentioned this pull request Feb 18, 2026

Revert "[Bugfix] Disable TRTLLM attention with KV transfer enabled (#33192)" #34832

Merged

5 tasks

orozery mentioned this pull request Mar 15, 2026

Revert "Enable Cross layers KV cache layout at NIXL Connector (#30207)" #33241

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Enable Cross layers KV cache layout at NIXL Connector V2#33339

Enable Cross layers KV cache layout at NIXL Connector V2#33339
NickLucche merged 94 commits intovllm-project:mainfrom
liranschour:nixl_kv_cont_cross_layers

liranschour commented Jan 29, 2026 •

edited by github-actions bot

Loading

Uh oh!

NickLucche Feb 2, 2026

Uh oh!

NickLucche left a comment

Uh oh!

NickLucche Feb 2, 2026

Uh oh!

liranschour Feb 3, 2026

Uh oh!

NickLucche Feb 2, 2026

Uh oh!

orozery Feb 3, 2026

Uh oh!

liranschour Feb 3, 2026

Uh oh!

orozery Feb 3, 2026

Uh oh!

liranschour Feb 3, 2026

Uh oh!

orozery Feb 3, 2026

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

Conversation

liranschour commented Jan 29, 2026 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Test Plan

Test Result

Uh oh!

Choose a reason for hiding this comment

Uh oh!

NickLucche left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

liranschour commented Jan 29, 2026 •

edited by github-actions bot

Loading