-
Notifications
You must be signed in to change notification settings - Fork 59
feat: Simplify tokenization prefix store to single-model architecture #182
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
84935e6 to
3bf99f3
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR refactors the tokenization prefix store architecture to eliminate multi-model support within individual store instances, simplifying the implementation to a single-model-per-store design. This aligns with a broader architectural shift where model identity is managed at the scheduler/indexer level rather than within the storage layer.
- Removed multi-model management from
LRUTokenStoreandTrieTokenStoreimplementations - Updated
Indexerinterface to removemodelNameparameters fromAddTokenizationandFindLongestContainedTokensmethods - Updated all tests and consumers to remove model name parameters from tokenization calls
Reviewed changes
Copilot reviewed 6 out of 6 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
| pkg/tokenization/prefixstore/trie_store.go | Refactored from ContainedTokenStore (multi-model) to TrieTokenStore (single-model); removed model management logic |
| pkg/tokenization/prefixstore/lru_store.go | Simplified to single-model architecture by removing per-model cache map and using single LRU cache instance |
| pkg/tokenization/prefixstore/lru_store_test.go | Updated all test cases to remove modelName parameter from function calls |
| pkg/tokenization/prefixstore/indexer.go | Updated interface signatures to remove modelName parameters |
| pkg/tokenization/pool_test.go | Updated mock implementations and test expectations to match new interface signatures |
| pkg/tokenization/pool.go | Updated indexer method calls to remove modelName arguments |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Signed-off-by: Sage Ahrac <[email protected]>
Signed-off-by: Sage Ahrac <[email protected]>
Signed-off-by: Sage Ahrac <[email protected]>
Signed-off-by: Sage Ahrac <[email protected]>
Signed-off-by: Sage Ahrac <[email protected]>
Signed-off-by: Sage Ahrac <[email protected]>
3cca416 to
db36361
Compare
Co-authored-by: Copilot <[email protected]> Signed-off-by: Sage <[email protected]>
Co-authored-by: Copilot <[email protected]> Signed-off-by: Sage <[email protected]>
| blockSize int | ||
|
|
||
| store map[string]*lru.Cache[uint64, Block] | ||
| cache *lru.Cache[uint64, Block] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think index might be a better name here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Considering that the prefix store caches tokenizations, the term index feels too generic. It’s also already used in both kvcache/kvblock/index.go and kvcache/indexer.go. I think we should choose a name that avoids this overloaded notation and better reflects the actual object the cache is representing.
|
/lgtm |
Summary
This PR refactors the tokenization prefix store architecture to support a single model per store instance, eliminating the need for model name management within the store implementations.
Changes Made
LRUTokenStore&TrieTokenStore- Removed multi base model support.modelNameparameters fromAddTokenizationandFindLongestContainedTokensmethods. Model identity managed at scheduler/indexer level, not storage levelConfigure the base model name in the indexer config
Remove model name parameters from tokenization method calls
The change aligns with the single-model-per-scheduler architecture where each scheduler instance handles one specific base model.
Solves #190
Part of #167