feat: include model name in KVBlock key hash#220
Merged
github-actions[bot] merged 4 commits intoDec 18, 2025
Merged
Conversation
Signed-off-by: Sage Ahrac <sagiahrak@gmail.com>
Contributor
There was a problem hiding this comment.
Pull request overview
This PR aims to include the model name in the KVBlock key hash to address issue #189. The change ensures that different models or LoRA adapters generate distinct hash keys, preventing cache collisions. The implementation applies the model name specifically to the first block key hash initialization.
Key Changes
- Changed
ChunkedTokenDatabasestruct from exported to unexported (chunkedTokenDatabase) - Moved initial hash computation from lazy initialization to constructor
- Modified
getInitHashto acceptmodelNameparameter and include it in hash computation
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Collaborator
Author
|
@vMaroon Can you take a look? |
Member
|
Great, thanks @sagiahrac /lgtm |
Collaborator
Author
|
closes #47 |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Closes #189
Since the hashing algorithm is now decoupled from vllm (#195), the simplest solution is to always include the model/lora name from the request in the hash, along with the token ids.
Specifically, in this PR this is applied only to the first block key hash.
Notes:
Changed
chunkedTokenDatabaseto private, as it already implementsTokenProcessorand is not used directly anywhere. This is mainly to ensure that the initial hash cannot be overridden by mistake.