Refactor: Replace daulet/tokenizers with vLLM tokenizer#254
Conversation
There was a problem hiding this comment.
Pull request overview
This PR refactors the tokenization system by replacing the daulet/tokenizers Go library with vLLM's Python-based tokenizer wrapper. The change introduces a new encode function through CGO bindings that communicates with vLLM's tokenizer, allowing for more consistent tokenization behavior with vLLM's inference engine.
Changes:
- Removed daulet/tokenizers dependency and replaced with vLLM tokenizer via Python/CGO bindings
- Updated Encode interface to accept EncodeRequest struct instead of individual parameters
- Added new encode Python function and corresponding C/CGO bindings
Reviewed changes
Copilot reviewed 20 out of 21 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
| go.mod, go.sum | Removed daulet/tokenizers dependency |
| pkg/preprocessing/chat_completions/types.go | Added Offset type and Tokenizer struct |
| pkg/preprocessing/chat_completions/tokenizer_wrapper.py | Added encode function for tokenization |
| pkg/preprocessing/chat_completions/cgo_functions.h | Added encode function declarations |
| pkg/preprocessing/chat_completions/cgo_functions.c | Implemented encode function C bindings |
| pkg/preprocessing/chat_completions/cgo_functions.go | Added Encode Go wrapper and EncodeRequest/Response types |
| pkg/preprocessing/chat_completions/cgo_functions_test.go | Added comprehensive tests for encode functionality |
| pkg/tokenization/tokenizer.go | Refactored to use vLLM tokenizer, removed provider interfaces |
| pkg/tokenization/uds_tokenizer.go | Updated to use EncodeRequest struct |
| pkg/tokenization/pool.go | Updated to construct EncodeRequest |
| pkg/tokenization/tokenizer_test.go | Updated all tests to use new Encode interface |
| pkg/tokenization/pool_test.go | Updated mock tokenizer and test cases |
| pkg/tokenization/prefixstore/*.go | Updated Offset type references |
| tests/e2e/redis_mock/*.go | Updated all e2e tests to use new Encode interface |
| pkg/preprocessing/chat_completions/README.md | Updated documentation to reflect vLLM usage |
| docs/architecture.md | Updated dependencies documentation |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| func (u *UdsTokenizer) Encode(req *preprocessing.EncodeRequest) ([]uint32, []preprocessing.Offset, error) { | ||
| httpReq, err := http.NewRequestWithContext( | ||
| context.Background(), | ||
| http.MethodPost, | ||
| u.baseURL+"/tokenize", | ||
| strings.NewReader(input), | ||
| strings.NewReader(req.Text), | ||
| ) |
There was a problem hiding this comment.
The AddSpecialTokens field from EncodeRequest is not being used or passed to the UDS tokenizer service. If the external tokenizer service needs to know whether to add special tokens, this parameter should be included in the request (e.g., as a query parameter or in the request body). If the service handles this automatically, this should be documented.
Signed-off-by: HyunKyun Moon <mhg5303@gmail.com>
b0b814b to
95ef324
Compare
Signed-off-by: HyunKyun Moon <mhg5303@gmail.com>
Signed-off-by: HyunKyun Moon <mhg5303@gmail.com>
8cb392b to
24f21f4
Compare
| require ( | ||
| github.com/alicebob/miniredis/v2 v2.35.0 | ||
| github.com/cespare/xxhash/v2 v2.3.0 | ||
| github.com/daulet/tokenizers v1.22.1 |
There was a problem hiding this comment.
CI, makefile and dockerfile should also be updated.
There was a problem hiding this comment.
Regarding the CI updates (removing the tokenizer), I was planning to separate them into a different PR as discussed in the review. However, would it be better to just merge them into this PR instead?
Signed-off-by: HyunKyun Moon <mhg5303@gmail.com>
|
Good to go after rebase. Thanks @hyeongyun0916! Do you have any performance benchmarks? |
Signed-off-by: HyunKyun Moon <mhg5303@gmail.com>
I’ve added the CGO benchmarks to the PR. Since daulet/tokenizers didn't have existing benchmarks, I created and ran them myself. daulet/tokenizers |
|
Sounds good - overall this "slowdown" will become a speedup once we move to tokens-in architecture, in which this will be the only tokenization stage on the entire serving path. |
will pass when #265 is merged. |
14393a2 to
d804deb
Compare
Signed-off-by: HyunKyun Moon <mhg5303@gmail.com>
d804deb to
e9fc1a2
Compare
|
/lgtm |
This PR refactors the tokenization system to use vLLM's tokenizer wrapper instead of the daulet/tokenizers.
https://llm-d.slack.com/archives/C0A0SU5J68Y/p1764153758005369