[KVCache][ModelRunner] Use vllm InputBatch and Blocktable#5182
[KVCache][ModelRunner] Use vllm InputBatch and Blocktable#5182MengqingCao wants to merge 2 commits intovllm-project:mainfrom
Conversation
|
👋 Hi! Thank you for contributing to the vLLM Ascend project. The following points will speed up your PR merge:
If CI fails, you can run linting and testing checks locally according Contributing and Testing. |
There was a problem hiding this comment.
Code Review
This pull request is a significant refactoring that removes the custom BlockTable and NPUInputBatch implementations from vllm_ascend. Instead, it now utilizes the InputBatch from the core vLLM library. This change simplifies the codebase by removing duplicated and specialized logic for block and slot management, aligning it more closely with the upstream vLLM implementation. The concept of "hybrid blocks" has been removed and replaced with a more generic mechanism for handling kernel-specific block sizes. The changes are consistently applied across the affected files, including updates to method signatures and call sites. Overall, this is a positive change that improves maintainability and reduces custom code. I have reviewed the changes and found no issues.
57ea030 to
aecd2ab
Compare
|
This pull request has conflicts, please resolve those before we can evaluate the pull request. |
| @staticmethod | ||
| def get_supported_block_size() -> list[int]: | ||
| def get_supported_kernel_block_sizes() -> list[int]: | ||
| return [128] |
There was a problem hiding this comment.
paged attention branch
44b2321 to
07858c8
Compare
|
This pull request has conflicts, please resolve those before we can evaluate the pull request. |
07858c8 to
f4b66ff
Compare
|
This pull request has conflicts, please resolve those before we can evaluate the pull request. |
8169764 to
bd0e19c
Compare
|
This pull request has conflicts, please resolve those before we can evaluate the pull request. |
bd0e19c to
b8ed30b
Compare
Signed-off-by: MengqingCao <cmq0113@163.com>
b8ed30b to
c6c8fd7
Compare
|
This pull request has conflicts, please resolve those before we can evaluate the pull request. |
|
no update for long time, close this now. Feel free to reopen if it's still needed. |
What this PR does / why we need it?
This pr change to use
InputBatchandBlockTablein vLLM, we just implement theget_supported_kernel_block_sizesin our attention backend, and calculate theblock_sizeandnum_blockaccording to thekernel_block_sizes.How was this patch tested?
test by the exsisting CI