[Core][Refactor]: thread scheduler_block_size into KVCacheManager and KVCacheCoordinator#44165
Merged
Merged
Conversation
ccc75d7 to
c221114
Compare
scheduler_block_size into KVCacheManager and KVCacheCoordinatorscheduler_block_size into KVCacheManager and KVCacheCoordinator
njhill
approved these changes
Jun 1, 2026
Comment on lines
+49
to
+50
| # The scheduling granularity (LCM of all group block sizes), must be a multiple | ||
| # of the hash_block_size and the block size of each group. |
Member
There was a problem hiding this comment.
Could/should we add an assert here for this?
…heCoordinator Signed-off-by: Yifan Qiao <yifanqiao@inferact.ai>
Signed-off-by: Yifan Qiao <yifanqiao@inferact.ai>
Signed-off-by: Yifan Qiao <yifanqiao@inferact.ai>
ece24fc to
8c8a6e5
Compare
mvanhorn
pushed a commit
to mvanhorn/vllm
that referenced
this pull request
Jun 4, 2026
…nd KVCacheCoordinator (vllm-project#44165) Signed-off-by: Yifan Qiao <yifanqiao@inferact.ai> Signed-off-by: Matt Van Horn <455140+mvanhorn@users.noreply.github.com>
bnellnm
pushed a commit
to neuralmagic/vllm
that referenced
this pull request
Jun 4, 2026
…nd KVCacheCoordinator (vllm-project#44165) Signed-off-by: Yifan Qiao <yifanqiao@inferact.ai>
andakai
pushed a commit
to andakai/vllm
that referenced
this pull request
Jun 4, 2026
…nd KVCacheCoordinator (vllm-project#44165) Signed-off-by: Yifan Qiao <yifanqiao@inferact.ai>
JisoLya
pushed a commit
to JisoLya/vllm
that referenced
this pull request
Jun 5, 2026
…nd KVCacheCoordinator (vllm-project#44165) Signed-off-by: Yifan Qiao <yifanqiao@inferact.ai> Signed-off-by: JisoLya <523420504@qq.com>
knight0528
pushed a commit
to knight0528/vllm
that referenced
this pull request
Jun 8, 2026
…nd KVCacheCoordinator (vllm-project#44165) Signed-off-by: Yifan Qiao <yifanqiao@inferact.ai>
waqahmed-amd-fi
pushed a commit
to waqahmed-amd-fi/vllm
that referenced
this pull request
Jun 10, 2026
…nd KVCacheCoordinator (vllm-project#44165) Signed-off-by: Yifan Qiao <yifanqiao@inferact.ai> Signed-off-by: Waqar Ahmed <waqar.ahmed@amd.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Purpose
This is a small, behavior-preserving refactor that threads an explicit
scheduler_block_sizethroughKVCacheManager→KVCacheCoordinator→SingleTypeKVCacheManager, instead of havingHybridKVCacheCoordinatorrecompute the LCM of group block sizes internally.Today the scheduler already resolves the scheduling-alignment granularity via
resolve_kv_cache_block_sizes(returned asscheduler_block_size, the LCM of all group block sizes for the multi-group non-context-parallel case) and stores it asScheduler.block_size. Separately,HybridKVCacheCoordinatorindependently recomputed the same quantity asself.lcm_block_size = lcm(*block_sizes). This PR removes that duplicate computation and instead passes the already-resolved value down, making the alignment invariant a single explicit input rather than a value derived in two places.This is a preliminary step in prep for refactoring/merging #43447 (selective prefix-cache retention for sliding-window KV cache), which needs the scheduling block size available at the manager/coordinator level. Landing the plumbing on its own keeps that follow-up focused on the retention logic.
Behavioral equivalence
HybridKVCacheCoordinator.cache_blocksandfind_longest_cache_hitnow align onself.scheduler_block_sizeinstead ofself.lcm_block_size. For the only configuration that reachesHybridKVCacheCoordinator(multiple KV cache groups, context parallelism disabled),resolve_kv_cache_block_sizesreturns exactlymath.lcm(*group_block_sizes)— identical to the old internal computation over the same set of groups. Hybrid groups + context parallelism is rejected upstream inresolve_kv_cache_block_sizes, so there is no configuration where the two values could diverge.self.scheduler_block_sizeis also stored onSingleTypeKVCacheManager. It is not consumed yet in this PR; it is the plumbing that [Prefix Caching] DeepSeekv4 - Support selective prefix-cache retention for sliding-window KV cache #43447 builds on.All
get_kv_cache_coordinator/KVCacheManagerconstructor sites are updated (scheduler andsimple_kv_offload). The Mooncake store path uses its own coordinator and already carries its ownscheduler_block_size; it is untouched here.Why this is not duplicating an existing PR
A search of open PRs (
scheduler_block_size, block-size threading intoKVCacheManager) returns no overlap. #36317 ("Adjust alignment block size according attn supported kernel sizes") changes how the alignment block size is chosen per attention kernel — a different concern from threading the already-resolved value through the manager/coordinator. This PR adds no new behavior and changes no defaults.Test Plan
Tests are updated to pass
scheduler_block_size.test_prefix_caching.pyadds a smallmake_kv_cache_managerhelper that derivesscheduler_block_sizefrom the config (LCM of group block sizes), mirroringresolve_kv_cache_block_sizesfor the non-context-parallel path so call sites don't repeat it.Test Result
pre-commit run(ruff, mypy) passes on all changed files.AI assistance (Claude Code) was used while preparing this change. The submitter has reviewed every changed line and run the tests above.