[Optimization] Make new_block_ids None if empty#23262
Conversation
Signed-off-by: Woosuk Kwon <woosuk@thinkingmachines.ai>
|
👋 Hi! Thank you for contributing to the vLLM project. 💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels. Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging. To run CI, PR reviewers can either: Add 🚀 |
There was a problem hiding this comment.
Code Review
This pull request introduces an optimization to avoid serializing empty block ID lists for decode requests, which occurs frequently. This is achieved by returning None instead of an empty list structure. The changes are consistently applied across the scheduler and worker components. The type hints have been updated to reflect the optional return value, and consumers of this data now correctly handle the None case. The refactoring to delay block ID generation is a clean approach. The code quality is high, and the changes appear correct and safe.
I don't think this optimization is necessary. And FYI, we now support mamba by setting |
…ne if empty (#93) Culprit commit: vllm-project/vllm#23262 --------- Signed-off-by: Agata Dobrzyniewicz <adobrzyniewicz@habana.ai>
Signed-off-by: Woosuk Kwon <woosuk@thinkingmachines.ai> Signed-off-by: Duncan Moss <djm.moss@gmail.com>
### What this PR does / why we need it? 1. use action/checkout@v5 instead of v4 2. remove dbo test case because there is issue with it and will be refactored later 3. make vllm-ascend compatible with vllm v0.10.1.1 and add CI for it 4. fix sampler api changes introduced by vllm-project/vllm#22387 6. fix qwen3 moe config changes intruoduced by vllm-project/vllm#20562 7. fix kvcache block changes introduced by vllm-project/vllm#23262 ### Does this PR introduce _any_ user-facing change? N/A ### How was this patch tested? CI passed with existing test. - vLLM version: v0.10.0 - vLLM main: vllm-project/vllm@0c6e40b --------- Signed-off-by: MengqingCao <cmq0113@163.com>
…ne if empty (#93) Culprit commit: vllm-project/vllm#23262 --------- Signed-off-by: Agata Dobrzyniewicz <adobrzyniewicz@habana.ai> Signed-off-by: Marcin Swiniarski <marcin.swiniarski@intel.com>
Signed-off-by: Woosuk Kwon <woosuk@thinkingmachines.ai>
Signed-off-by: Woosuk Kwon <woosuk@thinkingmachines.ai> Signed-off-by: Xiao Yu <xiao.yu@amd.com>
Signed-off-by: Woosuk Kwon <woosuk@thinkingmachines.ai>
Signed-off-by: Woosuk Kwon <woosuk@thinkingmachines.ai>
Signed-off-by: Woosuk Kwon <woosuk@thinkingmachines.ai>
### What this PR does / why we need it? 1. use action/checkout@v5 instead of v4 2. remove dbo test case because there is issue with it and will be refactored later 3. make vllm-ascend compatible with vllm v0.10.1.1 and add CI for it 4. fix sampler api changes introduced by vllm-project/vllm#22387 6. fix qwen3 moe config changes intruoduced by vllm-project/vllm#20562 7. fix kvcache block changes introduced by vllm-project/vllm#23262 ### Does this PR introduce _any_ user-facing change? N/A ### How was this patch tested? CI passed with existing test. - vLLM version: v0.10.0 - vLLM main: vllm-project/vllm@0c6e40b --------- Signed-off-by: MengqingCao <cmq0113@163.com>
Signed-off-by: Woosuk Kwon <woosuk@thinkingmachines.ai>
### What this PR does / why we need it? 1. use action/checkout@v5 instead of v4 2. remove dbo test case because there is issue with it and will be refactored later 3. make vllm-ascend compatible with vllm v0.10.1.1 and add CI for it 4. fix sampler api changes introduced by vllm-project/vllm#22387 6. fix qwen3 moe config changes intruoduced by vllm-project/vllm#20562 7. fix kvcache block changes introduced by vllm-project/vllm#23262 ### Does this PR introduce _any_ user-facing change? N/A ### How was this patch tested? CI passed with existing test. - vLLM version: v0.10.0 - vLLM main: vllm-project/vllm@0c6e40b --------- Signed-off-by: MengqingCao <cmq0113@163.com>
### What this PR does / why we need it? 1. use action/checkout@v5 instead of v4 2. remove dbo test case because there is issue with it and will be refactored later 3. make vllm-ascend compatible with vllm v0.10.1.1 and add CI for it 4. fix sampler api changes introduced by vllm-project/vllm#22387 6. fix qwen3 moe config changes intruoduced by vllm-project/vllm#20562 7. fix kvcache block changes introduced by vllm-project/vllm#23262 ### Does this PR introduce _any_ user-facing change? N/A ### How was this patch tested? CI passed with existing test. - vLLM version: v0.10.0 - vLLM main: vllm-project/vllm@0c6e40b --------- Signed-off-by: MengqingCao <cmq0113@163.com>
### What this PR does / why we need it? 1. use action/checkout@v5 instead of v4 2. remove dbo test case because there is issue with it and will be refactored later 3. make vllm-ascend compatible with vllm v0.10.1.1 and add CI for it 4. fix sampler api changes introduced by vllm-project/vllm#22387 6. fix qwen3 moe config changes intruoduced by vllm-project/vllm#20562 7. fix kvcache block changes introduced by vllm-project/vllm#23262 ### Does this PR introduce _any_ user-facing change? N/A ### How was this patch tested? CI passed with existing test. - vLLM version: v0.10.0 - vLLM main: vllm-project/vllm@0c6e40b --------- Signed-off-by: MengqingCao <cmq0113@163.com>
Since our page size is 16,
new_block_idsfor decode requests is empty for every 15 out of 16 steps.However, currently we serialize the empty lists every step.
This PR fixes this unnecessary overheads by making them
Nonebefore serialization.TODO: When
block_sizeis different across groups, we should changelist[Optional[tuple[list[int], ...]]intolist[Optional[tuple[Optional[list[int], ....]]]].