[v1] Refactor KVCacheConfig#14079
Conversation
Signed-off-by: Chen Zhang <zhangch99@outlook.com>
Signed-off-by: Chen Zhang <zhangch99@outlook.com>
|
👋 Hi! Thank you for contributing to the vLLM project. 💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels. Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging. To run CI, PR reviewers can either: Add 🚀 |
zhuohan123
left a comment
There was a problem hiding this comment.
Left some minor comments. In general LGTM!
vllm/v1/kv_cache_interface.py
Outdated
|
|
||
|
|
||
| @dataclass | ||
| class VirtualLayer: |
There was a problem hiding this comment.
Why do we call this VirtualLayer? I believe woosuk and you had some discussions among this. The issue of VirtualLayer is that you can't tell that it's related to KV cache and KV cache grouping. To me I feel like maybe some names like "GroupedLayerKV" will be more straightforward.
There was a problem hiding this comment.
Discussing in #hybrid-mem channel of slack.
|
This pull request has merge conflicts that must be resolved before it can be |
Signed-off-by: Chen Zhang <zhangch99@outlook.com>
Signed-off-by: Chen Zhang <zhangch99@outlook.com>
heheda12345
left a comment
There was a problem hiding this comment.
@zhuohan123 Updated the PR based on your review. More discussion on the name of VirtualLayer is needed.
vllm/v1/kv_cache_interface.py
Outdated
|
|
||
|
|
||
| @dataclass | ||
| class VirtualLayer: |
There was a problem hiding this comment.
Discussing in #hybrid-mem channel of slack.
Signed-off-by: Chen Zhang <zhangch99@outlook.com>
vllm/v1/core/kv_cache_utils.py
Outdated
| raise NotImplementedError | ||
|
|
||
|
|
||
| def make_kv_cache_configs_consistent(kv_cache_configs: list[KVCacheConfig]): |
There was a problem hiding this comment.
Looks more like unify_kv_cache_configs?
vllm/v1/core/kv_cache_utils.py
Outdated
| # Change the num_blocks of each rank to the smallest among all ranks. We | ||
| # do not need to shrink the tensor size because it is valid to only use the | ||
| # first `num_blocks` blocks of the tensor. | ||
| num_blocks = min(kv_cache_config.num_blocks |
There was a problem hiding this comment.
nit
| num_blocks = min(kv_cache_config.num_blocks | |
| min_num_blocks = min(kv_cache_config.num_blocks |
vllm/v1/worker/gpu_model_runner.py
Outdated
| dtype=dtype, | ||
| device=self.device) | ||
| else: | ||
| raise NotImplementedError |
|
This pull request has merge conflicts that must be resolved before it can be |
Signed-off-by: Chen Zhang <zhangch99@outlook.com>
Signed-off-by: Chen Zhang <zhangch99@outlook.com>
Signed-off-by: Chen Zhang <zhangch99@outlook.com>
Signed-off-by: Chen Zhang <zhangch99@outlook.com>
|
@zhuohan123 @comaniac Updated the PR based on your comments. Can you review it again? |
comaniac
left a comment
There was a problem hiding this comment.
LGTM. Approve to unblock. Left to @WoosukKwon and @zhuohan123
|
Please fix the merge conflict |
|
This pull request has merge conflicts that must be resolved before it can be |
Signed-off-by: Chen Zhang <zhangch99@outlook.com>
Signed-off-by: Chen Zhang <zhangch99@outlook.com>
Signed-off-by: Chen Zhang <zhangch99@outlook.com> Signed-off-by: Louis Ulmer <ulmerlouis@gmail.com>
Signed-off-by: Chen Zhang <zhangch99@outlook.com>
Signed-off-by: Chen Zhang <zhangch99@outlook.com> Signed-off-by: Mu Huai <tianbowen.tbw@antgroup.com>
This PR makes the following changes to KVCacheConfig
KVCacheSpecfrom the spec of the model to the spec of one layerKVCacheConfigclass:2. save the kv_cache_spec for each kv cache group instead of each layer
make_kv_cache_configs_consistent)