[Bugfix] Preserve TurboQuant sliding-window KV specs by lesj0610 · Pull Request #41497 · vllm-project/vllm

lesj0610 · 2026-05-02T08:44:48Z

Summary

Fix TurboQuant KV cache spec selection for sliding-window attention.

TurboQuant + sliding-window layers need both tq_slot_size page sizing and sliding-window manager routing. The old code could collapse this into plain SlidingWindowSpec and lose TurboQuant page-size behavior.

Changes

Add TQSlidingWindowSpec.
Return it from attention spec selection when TurboQuant and sliding-window are both enabled.
Preserve TurboQuant page sizing during hybrid KV spec unification.
Route TQSlidingWindowSpec to SlidingWindowManager.
Add focused tests for spec selection, page-size preservation, and manager routing.

Related upstream PRs

This replaces the KV-spec fix from the larger TurboQuant PRs. No kernel or model-loading changes.

Area	Existing PRs	This PR
Scope	Broader PRs mix model support, kernels, platform/config paths, and KV-spec changes (#40108, #39931).	Only fixes KV spec selection and manager routing for TurboQuant + sliding-window.
TurboQuant + sliding-window KV spec	The spec can collapse into normal `SlidingWindowSpec`, which loses TurboQuant page sizing.	Adds `TQSlidingWindowSpec` so `tq_slot_size` and sliding-window routing stay together.
Hybrid TurboQuant config	#41123 handles config/argument acceptance for hybrid TurboQuant.	This fixes the lower KV-spec object after attention spec creation and during hybrid spec unification.
Review scope	Larger PRs are harder to review as a small bugfix.	Narrow scope, direct tests, no kernel/platform changes.

Validation

ruff check on changed files: passed
pytest tests/v1/core/test_kv_cache_utils.py tests/v1/core/test_single_type_kv_cache_manager.py -q -k 'tq_sliding_window or preserves_tq_page_size or sliding_window_uses_sliding_window_manager': 2 passed

AI assistance was used (Codex, Claude, Gemini)

Keep TurboQuant page-size accounting when a layer also uses sliding-window attention by preserving tq_slot_size through the KV cache spec and manager dispatch paths. Signed-off-by: lesj0610 <lesj0610@users.noreply.github.com>

claude

Claude Code Review

This pull request is from a fork — automated review is disabled. A repository maintainer can comment @claude review to run a one-time review.

gemini-code-assist

Code Review

This pull request introduces support for TurboQuant (TQ) with sliding window attention in the v1 engine. It adds the TQSlidingWindowSpec class to handle TQ-aware page sizes and updates the attention layer, KV cache utilities, and manager mappings to support this new specification. Unit tests have been added to verify the correct behavior of TQ sliding window specs and their integration with the cache manager. I have no feedback to provide.

Signed-off-by: lesj0610 <lesj0610@users.noreply.github.com>

lesj0610 · 2026-05-08T14:21:53Z

@heheda12345 Sorry one more. This is also small fix in hybrid KV quantized path, related to #40308 but independent.

Problem: TurboQuant layer with sliding_window was created as normal SlidingWindowSpec, because get_kv_cache_spec checked sliding_window before turboquant path. After that, hybrid unification converts it through normal FullAttentionSpec route, and TQ page/slot size info is lost.

This PR adds TQSlidingWindowSpec that keeps TQ-aware page size, makes TurboQuant sliding-window attention return it, and preserves it as TQFullAttentionSpec during hybrid unification. SlidingWindowManager also handles TQSlidingWindowSpec so sliding-window cap still works. This touches kv_cache_interface.py and v1/core, so your review would help a lot. Small patch with tests included.

lesj0610 requested review from ApostaC, LucasWilkinson, MatthewBonanni, WoosukKwon, alexm-redhat, heheda12345, njhill, orozery, robertgshaw2-redhat and ywang96 as code owners May 2, 2026 08:44

claude Bot reviewed May 2, 2026

View reviewed changes

mergify Bot added v1 bug Something isn't working labels May 2, 2026

gemini-code-assist Bot reviewed May 2, 2026

View reviewed changes

Merge branch 'main' into lesj/tq-sliding-window-kv-spec-pr

220a81c

lesj0610 mentioned this pull request May 6, 2026

[Bugfix] Preserve TurboQuant sliding-window KV specs lesj0610/vllm#38

Closed

Avoid test merge conflicts with KV pool PR

0c61506

Signed-off-by: lesj0610 <lesj0610@users.noreply.github.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Bugfix] Preserve TurboQuant sliding-window KV specs#41497

[Bugfix] Preserve TurboQuant sliding-window KV specs#41497
lesj0610 wants to merge 3 commits intovllm-project:mainfrom
lesj0610:lesj/tq-sliding-window-kv-spec-pr

lesj0610 commented May 2, 2026

Uh oh!

claude Bot left a comment

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

lesj0610 commented May 8, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

lesj0610 commented May 2, 2026

Summary

Changes

Related upstream PRs

Validation

Uh oh!

claude Bot left a comment

Choose a reason for hiding this comment

Claude Code Review

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

lesj0610 commented May 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

lesj0610 commented May 8, 2026 •

edited

Loading