feat(kv-cache): support multiple sliding window groups in HybridKVCac…#31592
feat(kv-cache): support multiple sliding window groups in HybridKVCac…#31592DZADSL72-00558 wants to merge 1 commit intovllm-project:mainfrom
Conversation
…heCoordinator with tests Changes to be committed: new file: tests/v1/core/test_hybrid_kv_cache_coordinator.py modified: vllm/v1/core/kv_cache_coordinator.py Signed-off-by: bofengluo <roborluo@amazon.com>
There was a problem hiding this comment.
Code Review
This pull request introduces support for multiple sliding window groups in the HybridKVCacheCoordinator, which is a necessary feature for models like Amazon NOVA. The changes are well-implemented, refactoring the existing logic and adding a new path to handle multiple sliding window configurations. The implementation correctly identifies the common cache hit prefix across full attention and all sliding window groups by iterating through them and finding the minimum common hit length. The sliding window groups are sorted by window size, which is a good optimization for the cache hit search. The pull request also includes a comprehensive new test suite that validates the sorting logic and various cache hit scenarios, including partial hits and eviction. The code is clean, and the logic appears to be sound. Overall, this is a solid contribution that extends the functionality of the KV cache coordinator.
|
cc @heheda12345 |
heheda12345
left a comment
There was a problem hiding this comment.
It's not correct to take the min of all sliding window groups. For example, we may have request ABCDE with cache hit (assume block_size=1):
block_table for sw=2: [miss, miss, miss, D, E] -> hit length = 5
block_table for sw=3: [miss, B, C, D, miss] -> hit length = 4
the min length is 4, sw=2 needs the kv of token CD but we don't cache the kv of token C for length 4 of sw=2.
|
One possible solution is to write a new coordinator for the most general case. The main idea is to gradually reduce the length until converged: |
|
This pull request has merge conflicts that must be resolved before it can be |
|
Closing as superseded by #31707 |
…heCoordinator with tests
Changes to be committed:
new file: tests/v1/core/test_hybrid_kv_cache_coordinator.py
modified: vllm/v1/core/kv_cache_coordinator.py
Purpose
dd support for multiple sliding window groups in
HybridKVCacheCoordinator.Previously, the coordinator only supported a single sliding window configuration. This PR extends it to handle multiple sliding window groups.
Test Plan
pytest tests/v1/core/test_hybrid_kv_cache_coordinator.py -v
Test Result
collected 3 items
tests/v1/core/test_hybrid_kv_cache_coordinator.py::TestHybridKVCacheCoordinatorMultipleSlidingWindows::test_verify_and_sort_multiple_sliding_windows PASSED [ 33%]
tests/v1/core/test_hybrid_kv_cache_coordinator.py::TestHybridKVCacheCoordinatorMultipleSlidingWindows::test_cache_hit_multiple_sliding_windows PASSED [ 66%]
tests/v1/core/test_hybrid_kv_cache_coordinator.py::TestHybridKVCacheCoordinatorMultipleSlidingWindows::test_partial_cache_hit_different_sliding_windows PASSED [100%]
=========================================================== 3 passed in 0.58s ============================================================
(b
Essential Elements of an Effective PR Description Checklist
supported_models.mdandexamplesfor a new model.