[NIXL] refine decoder side post process for heterogeneous BlockSize and kv_layout#30275
Conversation
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
There was a problem hiding this comment.
Code Review
This pull request refactors the post-processing logic for heterogeneous BlockSize and kv_layout, which is a good direction for code cleanup. However, the implementation introduces several issues. There are critical bugs in the tensor reshape operations within the new helper functions (_kv_postprocess_layout, _kv_postprocess_blksize, and _kv_postprocess_blksize_and_layout), which will likely lead to runtime errors or corrupted KV cache data. Additionally, there's a redundant index_select operation that should be removed to improve performance. These issues need to be addressed to ensure the correctness and efficiency of the new implementation.
b405900 to
edc6d6e
Compare
|
This pull request has merge conflicts that must be resolved before it can be |
edc6d6e to
1753de4
Compare
|
Hi @xuechendi, the pre-commit checks have failed. Please run: uv pip install pre-commit
pre-commit install
pre-commit run --all-filesThen, commit the changes and push to your branch. For future commits, Tip Is
|
1753de4 to
010e76a
Compare
|
This pull request has merge conflicts that must be resolved before it can be |
Signed-off-by: Chendi Xue <chendi.xue@intel.com>
010e76a to
002a105
Compare
NickLucche
left a comment
There was a problem hiding this comment.
Thanks for refactoring this @xuechendi !
Left a few comments, overall this is looking pretty good.
| blocks_to_update.permute(0, 2, 1, 3), block_size_ratio | ||
| ).permute(0, 2, 1, 3) | ||
| cache.index_copy_(0, indices, permuted_blocks) | ||
| device = sample_cache.device |
There was a problem hiding this comment.
isn't this self.device?
| split_k_and_v = not ( | ||
| self.use_mla or self._use_pallas or self.kv_topo.is_kv_layout_blocks_first | ||
| ) | ||
| assert block_size_ratio >= 1, "Only nP < nD supported currently." |
There was a problem hiding this comment.
we could probably use debug log here stating what's being post-processed
There was a problem hiding this comment.
I used logger.info_once(), is that ok?
There was a problem hiding this comment.
I think they serve two different purposes, a debug log would provide info on the proceeding of the transfer operation per-request which I think is ok being debug.
info_once may still be useful for the end user, although in theory we could later allow deployments where P1 block_size != P2 block_size=D block_size, hence the log info_once would fall short in reporting that.
| ): | ||
| def _kv_postprocess_blksize(cache, indices, block_size_ratio): |
There was a problem hiding this comment.
We could add a short comment at the top of post_process_device_kv_on_receive now that the 3 functions can be moved out
| ): | ||
| block_ids_for_blocksize_post_process[block_size_ratio].append( | ||
| meta.local_block_ids | ||
| meta.local_physical_block_ids |
There was a problem hiding this comment.
ok this was a bug then right
There was a problem hiding this comment.
yes, I missed that in previous PR
|
This pull request has merge conflicts that must be resolved before it can be |
Signed-off-by: Chendi Xue <chendi.xue@intel.com>
…ocess Signed-off-by: Chendi Xue <chendi.xue@intel.com>
1d2394d to
f0befd7
Compare
|
@NickLucche , Thanks for the review, I have resolved all comments and rebased. |
|
This pull request has merge conflicts that must be resolved before it can be |
Signed-off-by: Chendi Xue <chendi.xue@intel.com>
NickLucche
left a comment
There was a problem hiding this comment.
Just a nit on logging. LGTM , thanks @xuechendi !
| split_k_and_v = not ( | ||
| self.use_mla or self._use_pallas or self.kv_topo.is_kv_layout_blocks_first | ||
| ) | ||
| assert block_size_ratio >= 1, "Only nP < nD supported currently." |
There was a problem hiding this comment.
I think they serve two different purposes, a debug log would provide info on the proceeding of the transfer operation per-request which I think is ok being debug.
info_once may still be useful for the end user, although in theory we could later allow deployments where P1 block_size != P2 block_size=D block_size, hence the log info_once would fall short in reporting that.
Signed-off-by: Chendi Xue <chendi.xue@intel.com>
Thanks, @NickLucche , you're right, I updated it to logger.debug |
…nd kv_layout (vllm-project#30275) Signed-off-by: dsuhinin <suhinin.dmitriy@gmail.com>
Purpose
We have supported heterogeneous BlockSize and kv_layout in seperate post process methods.
This PR is to clean up and use single method to post_process for cases.
What is changed in this PR:
I removed
permute_device_kvandblocksize_post_process, and move the logic intopost_process_device_kv_on_receiveas single post_process function with 3 options:Test Plan
Test with heterogeneous KV_layout + heterogeneous block_size
=> Passed accuracy test
Test with heterogeneous KV_layout
=> Passed accuracy test
Test with heterogeneous heterogeneous block_size
=> Passed accuracy test
Test Result
Essential Elements of an Effective PR Description Checklist
supported_models.mdandexamplesfor a new model.Note
Cursor Bugbot is generating a summary for commit bff8eaa. Configure here.
Note
Consolidates decoder-side KV post-processing into a single path with shared utils to handle heterogeneous block size and layout.
kv_postprocess_blksize_on_receive,kv_postprocess_layout_on_receive, andkv_postprocess_blksize_and_layout_on_receiveinutils.pypermute_device_kvandblocksize_post_processwith unifiedpost_process_device_kv_on_receiveinnixl_connector.py, selecting behavior based onenable_permute_local_kvandblock_size_ratioget_finishedto batch block IDs per ratio and invoke the new post-process; minor logging and tensor creation tweaksWritten by Cursor Bugbot for commit bff8eaa. This will update automatically on new commits. Configure here.
Note
Cursor Bugbot is generating a summary for commit 15ff574. Configure here.