Skip to content

Conversation

@sagiahrac
Copy link
Contributor

@sagiahrac sagiahrac commented Nov 24, 2025

Summary

This PR refactors the tokenization prefix store architecture to support a single model per store instance, eliminating the need for model name management within the store implementations.

Changes Made

  • Simplified LRUTokenStore & TrieTokenStore - Removed multi base model support.
  • Updated Indexer interface - Removed modelName parameters from AddTokenization and FindLongestContainedTokens methods. Model identity managed at scheduler/indexer level, not storage level
  • Updated tests

Configure the base model name in the indexer config
Remove model name parameters from tokenization method calls
The change aligns with the single-model-per-scheduler architecture where each scheduler instance handles one specific base model.

Solves #190
Part of #167

@sagiahrac sagiahrac force-pushed the add-base-model-to-config branch from 84935e6 to 3bf99f3 Compare November 24, 2025 10:51
@sagiahrac sagiahrac changed the title feat: Base-Model-Aware Indexer for Multi-LoRA KV-Cache Support feat: Simplify tokenization prefix store to single-model architecture Nov 24, 2025
@sagiahrac sagiahrac marked this pull request as ready for review November 24, 2025 14:54
Copilot AI review requested due to automatic review settings November 24, 2025 14:54
Copilot finished reviewing on behalf of sagiahrac November 24, 2025 14:58
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR refactors the tokenization prefix store architecture to eliminate multi-model support within individual store instances, simplifying the implementation to a single-model-per-store design. This aligns with a broader architectural shift where model identity is managed at the scheduler/indexer level rather than within the storage layer.

  • Removed multi-model management from LRUTokenStore and TrieTokenStore implementations
  • Updated Indexer interface to remove modelName parameters from AddTokenization and FindLongestContainedTokens methods
  • Updated all tests and consumers to remove model name parameters from tokenization calls

Reviewed changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
pkg/tokenization/prefixstore/trie_store.go Refactored from ContainedTokenStore (multi-model) to TrieTokenStore (single-model); removed model management logic
pkg/tokenization/prefixstore/lru_store.go Simplified to single-model architecture by removing per-model cache map and using single LRU cache instance
pkg/tokenization/prefixstore/lru_store_test.go Updated all test cases to remove modelName parameter from function calls
pkg/tokenization/prefixstore/indexer.go Updated interface signatures to remove modelName parameters
pkg/tokenization/pool_test.go Updated mock implementations and test expectations to match new interface signatures
pkg/tokenization/pool.go Updated indexer method calls to remove modelName arguments

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@sagiahrac sagiahrac force-pushed the add-base-model-to-config branch from 3cca416 to db36361 Compare November 24, 2025 15:15
blockSize int

store map[string]*lru.Cache[uint64, Block]
cache *lru.Cache[uint64, Block]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think index might be a better name here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Considering that the prefix store caches tokenizations, the term index feels too generic. It’s also already used in both kvcache/kvblock/index.go and kvcache/indexer.go. I think we should choose a name that avoids this overloaded notation and better reflects the actual object the cache is representing.

@vMaroon
Copy link
Member

vMaroon commented Dec 1, 2025

/lgtm
/approve

@github-actions github-actions bot added the lgtm label Dec 1, 2025
@github-actions github-actions bot merged commit 6dbe3f6 into llm-d:main Dec 1, 2025
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants