[Core][Refactor]: thread `scheduler_block_size` into KVCacheManager and KVCacheCoordinator by ivanium · Pull Request #44165 · vllm-project/vllm

ivanium · 2026-06-01T04:57:39Z

Purpose

This is a small, behavior-preserving refactor that threads an explicit scheduler_block_size through KVCacheManager → KVCacheCoordinator → SingleTypeKVCacheManager, instead of having HybridKVCacheCoordinator recompute the LCM of group block sizes internally.

Today the scheduler already resolves the scheduling-alignment granularity via resolve_kv_cache_block_sizes (returned as scheduler_block_size, the LCM of all group block sizes for the multi-group non-context-parallel case) and stores it as Scheduler.block_size. Separately, HybridKVCacheCoordinator independently recomputed the same quantity as self.lcm_block_size = lcm(*block_sizes). This PR removes that duplicate computation and instead passes the already-resolved value down, making the alignment invariant a single explicit input rather than a value derived in two places.

This is a preliminary step in prep for refactoring/merging #43447 (selective prefix-cache retention for sliding-window KV cache), which needs the scheduling block size available at the manager/coordinator level. Landing the plumbing on its own keeps that follow-up focused on the retention logic.

Behavioral equivalence

HybridKVCacheCoordinator.cache_blocks and find_longest_cache_hit now align on self.scheduler_block_size instead of self.lcm_block_size. For the only configuration that reaches HybridKVCacheCoordinator (multiple KV cache groups, context parallelism disabled), resolve_kv_cache_block_sizes returns exactly math.lcm(*group_block_sizes) — identical to the old internal computation over the same set of groups. Hybrid groups + context parallelism is rejected upstream in resolve_kv_cache_block_sizes, so there is no configuration where the two values could diverge.
self.scheduler_block_size is also stored on SingleTypeKVCacheManager. It is not consumed yet in this PR; it is the plumbing that [Prefix Caching] DeepSeekv4 - Support selective prefix-cache retention for sliding-window KV cache #43447 builds on.

All get_kv_cache_coordinator / KVCacheManager constructor sites are updated (scheduler and simple_kv_offload). The Mooncake store path uses its own coordinator and already carries its own scheduler_block_size; it is untouched here.

Why this is not duplicating an existing PR

A search of open PRs (scheduler_block_size, block-size threading into KVCacheManager) returns no overlap. #36317 ("Adjust alignment block size according attn supported kernel sizes") changes how the alignment block size is chosen per attention kernel — a different concern from threading the already-resolved value through the manager/coordinator. This PR adds no new behavior and changes no defaults.

Test Plan

.venv/bin/python -m pytest \
  tests/v1/core/test_prefix_caching.py \
  tests/v1/core/test_single_type_kv_cache_manager.py \
  tests/v1/core/test_kv_cache_utils.py -v

Tests are updated to pass scheduler_block_size. test_prefix_caching.py adds a small make_kv_cache_manager helper that derives scheduler_block_size from the config (LCM of group block sizes), mirroring resolve_kv_cache_block_sizes for the non-context-parallel path so call sites don't repeat it.

Test Result

126 passed in 35.08s

pre-commit run (ruff, mypy) passes on all changed files.

AI assistance (Claude Code) was used while preparing this change. The submitter has reviewed every changed line and run the tests above.

njhill

Thanks @ivanium

njhill · 2026-06-01T19:05:55Z

+        # The scheduling granularity (LCM of all group block sizes), must be a multiple
+        # of the hash_block_size and the block size of each group.


Could/should we add an assert here for this?

…heCoordinator Signed-off-by: Yifan Qiao <yifanqiao@inferact.ai>

Signed-off-by: Yifan Qiao <yifanqiao@inferact.ai>

…nd KVCacheCoordinator (vllm-project#44165) Signed-off-by: Yifan Qiao <yifanqiao@inferact.ai> Signed-off-by: Matt Van Horn <455140+mvanhorn@users.noreply.github.com>

…nd KVCacheCoordinator (vllm-project#44165) Signed-off-by: Yifan Qiao <yifanqiao@inferact.ai>

…nd KVCacheCoordinator (vllm-project#44165) Signed-off-by: Yifan Qiao <yifanqiao@inferact.ai> Signed-off-by: JisoLya <523420504@qq.com>

…nd KVCacheCoordinator (vllm-project#44165) Signed-off-by: Yifan Qiao <yifanqiao@inferact.ai>

…nd KVCacheCoordinator (vllm-project#44165) Signed-off-by: Yifan Qiao <yifanqiao@inferact.ai> Signed-off-by: Waqar Ahmed <waqar.ahmed@amd.com>

mergify Bot added the v1 label Jun 1, 2026

ivanium force-pushed the refactor/scheduler-block-size branch from ccc75d7 to c221114 Compare June 1, 2026 05:15

ivanium marked this pull request as ready for review June 1, 2026 05:15

ivanium requested review from ApostaC, WoosukKwon, alexm-redhat, heheda12345, njhill, orozery, robertgshaw2-redhat and ywang96 as code owners June 1, 2026 05:15

ywang96 added the ready ONLY add when PR is ready to merge/full CI is needed label Jun 1, 2026

ivanium changed the title ~~refactor: thread scheduler_block_size into KVCacheManager and KVCacheCoordinator~~ [Core][Refactor]: thread scheduler_block_size into KVCacheManager and KVCacheCoordinator Jun 1, 2026

njhill approved these changes Jun 1, 2026

View reviewed changes

ivanium added 3 commits June 1, 2026 22:25

refactor: thread scheduler_block_size into KVCacheManager and KVCac…

ba4d14b

…heCoordinator Signed-off-by: Yifan Qiao <yifanqiao@inferact.ai>

chore: conservative assertion

7c4c796

Signed-off-by: Yifan Qiao <yifanqiao@inferact.ai>

fix: test mismatches

8c8a6e5

Signed-off-by: Yifan Qiao <yifanqiao@inferact.ai>

ivanium force-pushed the refactor/scheduler-block-size branch from ece24fc to 8c8a6e5 Compare June 1, 2026 22:26

mergify Bot added the kv-connector label Jun 1, 2026

ywang96 merged commit 7c37096 into vllm-project:main Jun 2, 2026
64 checks passed

HF-001 mentioned this pull request Jun 2, 2026

[Refactor] thread scheduler_block_size into KVCacheManager and KVCacheCoordinator vllm-project/vllm-ascend#9886

Open

bnellnm pushed a commit to neuralmagic/vllm that referenced this pull request Jun 4, 2026

[Core][Refactor]: thread scheduler_block_size into KVCacheManager a…

1e33fb3

…nd KVCacheCoordinator (vllm-project#44165) Signed-off-by: Yifan Qiao <yifanqiao@inferact.ai>

andakai pushed a commit to andakai/vllm that referenced this pull request Jun 4, 2026

[Core][Refactor]: thread scheduler_block_size into KVCacheManager a…

e37632b

…nd KVCacheCoordinator (vllm-project#44165) Signed-off-by: Yifan Qiao <yifanqiao@inferact.ai>

knight0528 pushed a commit to knight0528/vllm that referenced this pull request Jun 8, 2026

[Core][Refactor]: thread scheduler_block_size into KVCacheManager a…

ce45800

…nd KVCacheCoordinator (vllm-project#44165) Signed-off-by: Yifan Qiao <yifanqiao@inferact.ai>

pjdurden mentioned this pull request Jun 9, 2026

[Bugfix] Fix SinkFullAttentionManager startup crash on scheduler_block_size #44951

Open

nofushanquan mentioned this pull request Jun 12, 2026

[Misc]m2m upgrade vllm-project/vllm-ascend#10099

Open

zhao-stack mentioned this pull request Jun 12, 2026

[Misc] Main2Main 0605 vllm-project/vllm-ascend#10250

Merged

ivanium deleted the refactor/scheduler-block-size branch June 13, 2026 23:41

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Core][Refactor]: thread `scheduler_block_size` into KVCacheManager and KVCacheCoordinator#44165

[Core][Refactor]: thread `scheduler_block_size` into KVCacheManager and KVCacheCoordinator#44165
ywang96 merged 3 commits into
vllm-project:mainfrom
ivanium:refactor/scheduler-block-size

ivanium commented Jun 1, 2026

Uh oh!

njhill left a comment

Uh oh!

njhill Jun 1, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

		# The scheduling granularity (LCM of all group block sizes), must be a multiple
		# of the hash_block_size and the block size of each group.

Uh oh!

Conversation

ivanium commented Jun 1, 2026

Purpose

Behavioral equivalence

Why this is not duplicating an existing PR

Test Plan

Test Result

Uh oh!

njhill left a comment

Choose a reason for hiding this comment

Uh oh!

njhill Jun 1, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants