Skip to content

[NIXL][1/N] Refactor kernel_block_size detection#35752

Draft
NickLucche wants to merge 2 commits intovllm-project:mainfrom
NickLucche:minor-refactor-register-kv-cache
Draft

[NIXL][1/N] Refactor kernel_block_size detection#35752
NickLucche wants to merge 2 commits intovllm-project:mainfrom
NickLucche:minor-refactor-register-kv-cache

Conversation

@NickLucche
Copy link
Collaborator

@NickLucche NickLucche commented Mar 2, 2026

This PR is based on top #32204, hence the latter must be merged before the former.

This PR is a small refactor/cleanup of the register_kv_cache main loop (which is quite dense), utilizing the KVCacheConfig (now available after the HMA PR) aimed at simplifying code logic.
In fact, there's no need to wait until iteration over kv cache tensors to figure out which kernel block size was selected by the backend or how many blocks the kv cache has.

This is also an attempt at breaking up hybrid SSM support here #34727 into smaller, more easily reviewable PRs.

Signed-off-by: NickLucche <nlucches@redhat.com>
Signed-off-by: NickLucche <nlucches@redhat.com>
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a significant refactoring to the NIXL connector to support Hybrid Memory Allocator (HMA) and improve kernel block size detection. The changes are extensive, modifying core connector logic, utility functions, and tests to handle multiple KV cache groups, which is essential for HMA. Key additions include the BlockIds type alias, the _sync_block_size_with_kernel method for managing logical and physical block sizes, and updates to many components to be HMA-aware. The test suite has been appropriately expanded with HMA-specific tests and existing tests have been adapted. The overall changes are well-structured and appear correct. I've identified one area for improvement in a test script concerning code duplication.

Comment on lines +169 to +172
# Add HMA flag if specified
if [[ -n "$ENABLE_HMA_VAR" ]]; then
BASE_CMD="${BASE_CMD} $ENABLE_HMA_VAR"
fi
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

This block of code to add the HMA flag is duplicated for the decode instances loop (lines 220-223). To improve maintainability and reduce redundancy, consider refactoring this logic into a helper function or applying it once to avoid having the same logic in two places.

@NickLucche NickLucche changed the title [NIXL] Refactor kernel_block_size detection [NIXL][1/N] Refactor kernel_block_size detection Mar 3, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant