[KVConnector][LMCache] Enable Support for cross-layer Layout#33395
[KVConnector][LMCache] Enable Support for cross-layer Layout#33395Shaoting-Feng wants to merge 3 commits intovllm-project:mainfrom
Conversation
Signed-off-by: Shaoting Feng <shaotingf@uchicago.edu>
There was a problem hiding this comment.
Code Review
This pull request adds support for cross-layer KV cache layout in the LMCache connector, introducing a property to enable this feature and a method for registering the cross-layer KV cache. My review identified a critical bug in the boolean logic of the prefer_cross_layer_blocks property that would cause it to almost always be enabled. Additionally, I found two issues in the new register_cross_layers_kv_cache method: an unused parameter that should be passed to the underlying engine and an incorrect log message. I've provided code suggestions to address these findings.
vllm/distributed/kv_transfer/kv_connector/v1/lmcache_connector.py
Outdated
Show resolved
Hide resolved
| cross_layers_kv_cache: kv cache of all layers | ||
| """ | ||
| if hasattr(self._lmcache_engine, "register_cross_layers_kv_cache"): | ||
| self._lmcache_engine.register_cross_layers_kv_cache(cross_layers_kv_cache) |
There was a problem hiding this comment.
The cross_layers_attn_backend parameter is unused in this method call. The base class KVConnectorBase_V1 includes this parameter in its register_cross_layers_kv_cache signature, suggesting it's intended to be used. It should be passed to the underlying _lmcache_engine's method to ensure correct functionality, assuming the engine's method expects it.
self._lmcache_engine.register_cross_layers_kv_cache(cross_layers_kv_cache, cross_layers_attn_backend)There was a problem hiding this comment.
But lmcache engine doesn't need it.
vllm/distributed/kv_transfer/kv_connector/v1/lmcache_connector.py
Outdated
Show resolved
Hide resolved
Signed-off-by: Shaoting Feng <shaotingf@uchicago.edu>
|
Hi @Shaoting-Feng, the pre-commit checks have failed. Please run: uv pip install pre-commit
pre-commit install
pre-commit run --all-filesThen, commit the changes and push to your branch. For future commits, Tip Is
|
Signed-off-by: Shaoting Feng <shaotingf@uchicago.edu>
Purpose
Required for compatibility with the new KV cache shape introduced by vLLM PR #27743.
Test Plan
Note: This change depends on LMCache PR #2498.
The implementation has been validated on both:
Command:
Test Result
Both models work.
Essential Elements of an Effective PR Description Checklist
supported_models.mdandexamplesfor a new model.