Skip to content

fix: Single tokenizer initialization at startup#192

Merged
vMaroon merged 6 commits into
llm-d:mainfrom
sagearc:single-tokenizer-arch-2
Dec 17, 2025
Merged

fix: Single tokenizer initialization at startup#192
vMaroon merged 6 commits into
llm-d:mainfrom
sagearc:single-tokenizer-arch-2

Conversation

@sagearc
Copy link
Copy Markdown
Collaborator

@sagearc sagearc commented Nov 25, 2025

#187

This PR loads local and HF tokenizers at the pool initialization.
CachedTokenizer instances now carry a single tokenizer with respect to the base model name provided at startup.
Fixed tests

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR refactors the tokenization system to use single tokenizer instances initialized at startup per base model, replacing the previous LRU caching mechanism. This change simplifies the architecture by binding each tokenization pool to a specific base model, improving startup predictability and removing the complexity of dynamic tokenizer loading.

Key Changes

  • Refactored CachedTokenizer to hold a single pre-initialized tokenizer instead of an LRU cache with dynamic loading
  • Added BaseModelName field to kvcache.Config for binding pools to specific models
  • Updated NewCachedHFTokenizer and NewCachedLocalTokenizer to take a modelName parameter and initialize tokenizers at creation time
  • Removed singleflight and LRU cache dependencies from tokenizer implementation
  • Removed deprecated Redis mock e2e tests and corresponding Makefile targets

Reviewed changes

Copilot reviewed 9 out of 11 changed files in this pull request and generated 6 comments.

Show a summary per file
File Description
pkg/tokenization/tokenizer.go Removed LRU cache and singleflight, refactored CachedTokenizer to hold single tokenizer instance, updated constructor signatures
pkg/tokenization/tokenizer_test.go Updated test signatures to pass modelName to constructors, simplified test scenarios
pkg/tokenization/pool.go Added modelName field to Pool struct, removed ModelName from Task struct, updated API signatures
pkg/tokenization/pool_test.go Updated tests to use new API signatures, refactored benchmarks to run per-model
pkg/kvcache/indexer.go Added BaseModelName field to Config struct with proper documentation
tests/e2e/redis_mock/e2e_test.go Removed deprecated e2e tests
tests/e2e/redis_mock/e2e_suite_test.go Removed deprecated e2e test suite
Makefile Removed e2e-test target, simplified test target

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread pkg/tokenization/pool.go Outdated
Comment thread pkg/tokenization/pool.go
Comment thread pkg/kvcache/indexer.go Outdated
Comment thread pkg/tokenization/tokenizer.go
Comment thread pkg/tokenization/tokenizer.go
Comment thread pkg/tokenization/tokenizer.go
@sagearc sagearc force-pushed the single-tokenizer-arch-2 branch 2 times, most recently from deab7a0 to dab9fed Compare November 25, 2025 21:41
Comment thread pkg/kvcache/indexer.go Outdated
Comment thread pkg/kvcache/indexer.go Outdated
Signed-off-by: Sage Ahrac <sagiahrak@gmail.com>
Signed-off-by: Sage Ahrac <sagiahrak@gmail.com>
Signed-off-by: Sage Ahrac <sagiahrak@gmail.com>
…startup

Signed-off-by: Sage Ahrac <sagiahrak@gmail.com>
Signed-off-by: Sage Ahrac <sagiahrak@gmail.com>
@sagearc sagearc force-pushed the single-tokenizer-arch-2 branch 3 times, most recently from 052f04a to 7769f09 Compare December 17, 2025 12:21
Signed-off-by: Sage Ahrac <sagiahrak@gmail.com>
@sagearc sagearc force-pushed the single-tokenizer-arch-2 branch from 7769f09 to 8d0936e Compare December 17, 2025 13:30
@vMaroon vMaroon merged commit 8d0936e into llm-d:main Dec 17, 2025
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants