-
-
Notifications
You must be signed in to change notification settings - Fork 11.3k
[Core] Introduce popleft_n and append_n in FreeKVCacheBlockQueue to further optimize block_pool #21222
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
👋 Hi! Thank you for contributing to the vLLM project. 💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels. Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging. To run CI, PR reviewers can either: Add 🚀 |
|
resolve #21141 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request introduces popleft_n and append_n methods to FreeKVCacheBlockQueue for bulk operations, optimizing get_new_blocks and free_blocks in BlockPool. Benchmark results show significant improvements. To enhance robustness, I've suggested materializing the ordered_blocks iterable to a list in free_blocks to prevent potential OOM errors.
houseroad
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me. Impressive results, and two nits to consider to address.
Signed-off-by: Jialin Ouyang <[email protected]>
Signed-off-by: Jialin Ouyang <[email protected]>
Signed-off-by: Jialin Ouyang <[email protected]>
Signed-off-by: Jialin Ouyang <[email protected]>
…ations) Signed-off-by: Jialin Ouyang <[email protected]>
Signed-off-by: Jialin Ouyang <[email protected]>
Signed-off-by: Jialin Ouyang <[email protected]>
njhill
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @Jialin!
…urther optimize block_pool (vllm-project#21222) Signed-off-by: Jialin Ouyang <[email protected]> Signed-off-by: qizixi <[email protected]>
…urther optimize block_pool (vllm-project#21222) Signed-off-by: Jialin Ouyang <[email protected]> Signed-off-by: x22x22 <[email protected]>
…urther optimize block_pool (vllm-project#21222) Signed-off-by: Jialin Ouyang <[email protected]>
…urther optimize block_pool (vllm-project#21222) Signed-off-by: Jialin Ouyang <[email protected]>
…urther optimize block_pool (vllm-project#21222) Signed-off-by: Jialin Ouyang <[email protected]> Signed-off-by: Jinzhen Lin <[email protected]>
…urther optimize block_pool (vllm-project#21222) Signed-off-by: Jialin Ouyang <[email protected]> Signed-off-by: Paul Pak <[email protected]>
…urther optimize block_pool (vllm-project#21222) Signed-off-by: Jialin Ouyang <[email protected]> Signed-off-by: Diego-Castan <[email protected]>
…urther optimize block_pool (vllm-project#21222) Signed-off-by: Jialin Ouyang <[email protected]>
Essential Elements of an Effective PR Description Checklist
supported_models.mdandexamplesfor a new model.Purpose
Most of the block_pool operators are on critical path
In this PR, we're focusing on further optimization these 2 operators.
Bulk popleft instead of popleft n times
Originally, in block_pool.get_new_blocks, we popped blocks one at a time, which would triggered the second block to fake head connections (which are unnecessary operations as the second block might be popped right after this).
As we knew total number of blocks to pop ahead, we could simply introduce popleft_n for buck popleft. Overall, the number link list operations to linked list of popleft_n would only be half of n popleft.
Bulk append instead of append n times
Similar, in block_pool.free_blocks, we invoke append one at a time. Introducing bulk append would also cut link list operations by half.
Test Plan
Test Result
benchmark scripts
After


Before


benchmark_blockpool
As expected, get_blocks and free_blocks times are cut in half.
After


Before
(Optional) Documentation Update