[NIXL][1/N] Refactor kernel_block_size detection#35752
[NIXL][1/N] Refactor kernel_block_size detection#35752NickLucche wants to merge 2 commits intovllm-project:mainfrom
kernel_block_size detection#35752Conversation
There was a problem hiding this comment.
Code Review
This pull request introduces a significant refactoring to the NIXL connector to support Hybrid Memory Allocator (HMA) and improve kernel block size detection. The changes are extensive, modifying core connector logic, utility functions, and tests to handle multiple KV cache groups, which is essential for HMA. Key additions include the BlockIds type alias, the _sync_block_size_with_kernel method for managing logical and physical block sizes, and updates to many components to be HMA-aware. The test suite has been appropriately expanded with HMA-specific tests and existing tests have been adapted. The overall changes are well-structured and appear correct. I've identified one area for improvement in a test script concerning code duplication.
| # Add HMA flag if specified | ||
| if [[ -n "$ENABLE_HMA_VAR" ]]; then | ||
| BASE_CMD="${BASE_CMD} $ENABLE_HMA_VAR" | ||
| fi |
There was a problem hiding this comment.
kernel_block_size detectionkernel_block_size detection
This PR is based on top #32204, hence the latter must be merged before the former.
This PR is a small refactor/cleanup of the
register_kv_cachemain loop (which is quite dense), utilizing the KVCacheConfig (now available after the HMA PR) aimed at simplifying code logic.In fact, there's no need to wait until iteration over kv cache tensors to figure out which kernel block size was selected by the backend or how many blocks the kv cache has.
This is also an attempt at breaking up hybrid SSM support here #34727 into smaller, more easily reviewable PRs.