Skip to content

[Bugfix][Mooncake] Fix per-group block_size/block_hash and group_idx in MooncakeStoreConnector KV events#44103

Open
ivanium wants to merge 2 commits into
vllm-project:mainfrom
ivanium:fix/mk-store-kv-event
Open

[Bugfix][Mooncake] Fix per-group block_size/block_hash and group_idx in MooncakeStoreConnector KV events#44103
ivanium wants to merge 2 commits into
vllm-project:mainfrom
ivanium:fix/mk-store-kv-event

Conversation

@ivanium
Copy link
Copy Markdown
Collaborator

@ivanium ivanium commented May 31, 2026

Purpose

MooncakeStoreConnector emits BlockStored KV events so external subscribers can track which KV blocks are present in the store. In hybrid / multi-group (HMA) configurations — and whenever a group's block_size is larger than the connector's hash_block_size — three fields of the emitted event were wrong.

  1. block_hashes used req_meta.block_hashes[chunk_idx], which indexes the request's hash-granular hash list with a chunk index. When block_size > hash_block_size, consecutive hashes are merged into one chunk hash via BlockHashListWithBlockSize, so this picked the wrong hash and the event advertised a block_hash that does not match the key the block is actually stored under. A consumer indexing the store by that hash would miss or mismatch. The fix derives the hash directly from the store key being written (BlockHash(bytes.fromhex(key.chunk_hash))), so the event hash and the storage key agree by construction. For the uniform single-group case (block_size == hash_block_size) this is identical to the previous behavior, so nothing regresses there.

  2. block_size used a single global original_block_size (cache_config.block_size), which is wrong for groups whose block size differs. It now uses the per-group db.block_size.

  3. group_idx was not set, so multi-group consumers could not attribute an event to its KV-cache group. It is now populated from the per-group loop index. BlockStored.group_idx already exists, so no event-schema change is needed.

With original_block_size no longer read anywhere after (2), the now-dead field and its plumbing (scheduler attribute, ReqMeta field/parameter, worker attribute, and test references) are removed. This mechanical cleanup is bundled with the substantive fix rather than shipped on its own.

Test

.venv/bin/python -m pytest tests/v1/kv_connector/unit/test_mooncake_store_worker.py tests/v1/kv_connector/unit/test_mooncake_store_scheduler.py -q
# 66 passed

pre-commit run ruff-check --files <changed files>
# Passed

The worker tests include a case with enable_kv_event=True and a block_size > hash_block_size ("double hash") group, which exercises the corrected event fields.

Not a duplicate

Searched open PRs on vllm-project/vllm (mooncake kv event, MooncakeStoreConnector, mooncake BlockStored). The nearby open mooncake PRs are unrelated in scope: #43742 (release GPU pin on failed store), #43701 (DummyClient mode), #42199 (DP worker bootstrap timeout). None touch BlockStored per-group correctness.

AI assistance

This change was prepared with AI assistance (Claude Code). The author has reviewed every changed line and run the tests listed above.

@mergify mergify Bot added v1 bug Something isn't working kv-connector labels May 31, 2026
@ivanium ivanium marked this pull request as ready for review May 31, 2026 21:52
@ivanium
Copy link
Copy Markdown
Collaborator Author

ivanium commented May 31, 2026

cc @zhewenl @Dao007forever

Copy link
Copy Markdown
Contributor

@Dao007forever Dao007forever left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@mergify
Copy link
Copy Markdown
Contributor

mergify Bot commented Jun 2, 2026

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @ivanium.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

@mergify mergify Bot added the needs-rebase label Jun 2, 2026
ivanium added 2 commits June 2, 2026 20:40
…iginal_block_size

Signed-off-by: Yifan Qiao <yifanqiao@inferact.ai>
Signed-off-by: Yifan Qiao <yifanqiao@inferact.ai>
@ivanium ivanium force-pushed the fix/mk-store-kv-event branch from 570ddee to 85de01f Compare June 2, 2026 21:11
@mergify mergify Bot removed the needs-rebase label Jun 2, 2026
@njhill njhill added the ready ONLY add when PR is ready to merge/full CI is needed label Jun 3, 2026
@njhill njhill enabled auto-merge (squash) June 3, 2026 01:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working kv-connector ready ONLY add when PR is ready to merge/full CI is needed v1

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants