[V0 Deprecation] Refactor kv cache from list to element#37487
Conversation
Signed-off-by: yewentao256 <zhyanwentao@126.com>
There was a problem hiding this comment.
Code Review
This pull request refactors the kv_cache by removing the outer list wrapper, simplifying its structure from a list of one element to just the element itself (a tensor or a tuple of tensors). This change is consistently applied across various components, including attention layers, mamba-based layers, and their corresponding test files. The modifications simplify code by removing unnecessary [0] indexing when accessing the kv_cache. The change in _cleanup_profiling_kv_cache is a good addition that makes the cleanup logic more robust to the different types of kv_cache. The refactoring appears to be correct and improves code clarity.
|
The NIXL failure seems like it might be relevant? |
| @@ -185,15 +185,13 @@ def inject_kv_into_layer( | |||
| if kv_cache_attr is None: | |||
| continue | |||
|
|
|||
| kv_cache_layer = kv_cache_attr[0] | |||
|
|
|||
| filename = self._generate_filename_debug( | |||
| layer_name, request.token_ids, request.mm_hashes | |||
| ) | |||
| kv_cache = safetensors.torch.load_file(filename)["kv_cache"].cuda() | |||
| if isinstance(attn_metadata, dict): | |||
| inject_kv_into_layer( | |||
| kv_cache_layer, | |||
| kv_cache_attr, | |||
There was a problem hiding this comment.
Could we rename this to kv_cache_layer?
There was a problem hiding this comment.
Done, thanks! And also fix the previous CI issue
…#37487) Signed-off-by: yewentao256 <zhyanwentao@126.com>
…#37487) Signed-off-by: yewentao256 <zhyanwentao@126.com>
…#37487) Signed-off-by: yewentao256 <zhyanwentao@126.com>
…#37487) Signed-off-by: yewentao256 <zhyanwentao@126.com> Signed-off-by: Monishver Chandrasekaran <monishverchandrasekaran@gmail.com>
…#37487) Signed-off-by: yewentao256 <zhyanwentao@126.com> Signed-off-by: Nithin Chalapathi <nithin.ch10@gmail.com>
…#37487) Signed-off-by: yewentao256 <zhyanwentao@126.com>
…#37487) Signed-off-by: yewentao256 <zhyanwentao@126.com> Signed-off-by: Vinay Damodaran <vrdn@hey.com>
…#37487) Signed-off-by: yewentao256 <zhyanwentao@126.com> Signed-off-by: EricccYang <yangyang4991@gmail.com>
Purpose
A follow up for #37195 of removing the virtual engine, this PR further refactor the kv cache from list to element to clean the code
Tests in CI