Skip to content

[KVCache][ModelRunner] Use vllm InputBatch and Blocktable#5182

Closed
MengqingCao wants to merge 2 commits intovllm-project:mainfrom
MengqingCao:rm_block_table
Closed

[KVCache][ModelRunner] Use vllm InputBatch and Blocktable#5182
MengqingCao wants to merge 2 commits intovllm-project:mainfrom
MengqingCao:rm_block_table

Conversation

@MengqingCao
Copy link
Copy Markdown
Collaborator

@MengqingCao MengqingCao commented Dec 19, 2025

What this PR does / why we need it?

This pr change to use InputBatch and BlockTable in vLLM, we just implement the get_supported_kernel_block_sizes in our attention backend, and calculate the block_size and num_block according to the kernel_block_sizes.

How was this patch tested?

test by the exsisting CI

@github-actions
Copy link
Copy Markdown
Contributor

👋 Hi! Thank you for contributing to the vLLM Ascend project. The following points will speed up your PR merge:‌‌

  • A PR should do only one thing, smaller PRs enable faster reviews.
  • Every PR should include unit tests and end-to-end tests ‌to ensure it works and is not broken by other future PRs.
  • Write the commit message by fulfilling the PR description to help reviewer and future developers understand.

If CI fails, you can run linting and testing checks locally according Contributing and Testing.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request is a significant refactoring that removes the custom BlockTable and NPUInputBatch implementations from vllm_ascend. Instead, it now utilizes the InputBatch from the core vLLM library. This change simplifies the codebase by removing duplicated and specialized logic for block and slot management, aligning it more closely with the upstream vLLM implementation. The concept of "hybrid blocks" has been removed and replaced with a more generic mechanism for handling kernel-specific block sizes. The changes are consistently applied across the affected files, including updates to method signatures and call sites. Overall, this is a positive change that improves maintainability and reduces custom code. I have reviewed the changes and found no issues.

@MengqingCao MengqingCao changed the title Rm block table [KVCache][ModelRunner] Use vllm InputBatch and Blocktable Dec 19, 2025
Comment thread vllm_ascend/attention/attention_v1.py Outdated
@github-actions
Copy link
Copy Markdown
Contributor

This pull request has conflicts, please resolve those before we can evaluate the pull request.

@staticmethod
def get_supported_block_size() -> list[int]:
def get_supported_kernel_block_sizes() -> list[int]:
return [128]
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

paged attention branch

@github-actions
Copy link
Copy Markdown
Contributor

This pull request has conflicts, please resolve those before we can evaluate the pull request.

@github-actions
Copy link
Copy Markdown
Contributor

This pull request has conflicts, please resolve those before we can evaluate the pull request.

@github-actions
Copy link
Copy Markdown
Contributor

This pull request has conflicts, please resolve those before we can evaluate the pull request.

Signed-off-by: MengqingCao <cmq0113@163.com>
Signed-off-by: MengqingCao <cmq0113@163.com>
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Jan 7, 2026

This pull request has conflicts, please resolve those before we can evaluate the pull request.

@wangxiyuan
Copy link
Copy Markdown
Collaborator

no update for long time, close this now. Feel free to reopen if it's still needed.

@wangxiyuan wangxiyuan closed this Apr 10, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

merge-conflicts ready read for review ready-for-test start test by label for PR

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants