[BugFix][Frontend] Use correct, shared tokenizer in OpenAI server#3512
Closed
njhill wants to merge 1 commit intovllm-project:mainfrom
Closed
[BugFix][Frontend] Use correct, shared tokenizer in OpenAI server#3512njhill wants to merge 1 commit intovllm-project:mainfrom
njhill wants to merge 1 commit intovllm-project:mainfrom
Conversation
Member
Author
|
Test failures look unrelated (network blips). |
Yard1
reviewed
Mar 20, 2024
Collaborator
Yard1
left a comment
There was a problem hiding this comment.
Could we add a test? We can mock some stuff - just to make sure that if we go through the OpenAI server with different lora requests, they are tokenized correctly.
0aa9277 to
06188e7
Compare
The front-end server code currently doesn't use lora-specific tokenizers. It also won't make use of the recently introduced parallel async tokenization if enabled.
06188e7 to
1db1b92
Compare
This was referenced Apr 12, 2024
Merged
Member
|
Can the same tokenizer be used to apply the chat template as well? |
dtrifiro
reviewed
Jun 26, 2024
Comment on lines
+385
to
+386
| else: | ||
| return self.engine.get_tokenizer_group() |
Contributor
There was a problem hiding this comment.
nit:
Suggested change
| else: | |
| return self.engine.get_tokenizer_group() | |
| return self.engine.get_tokenizer_group() |
njhill
added a commit
to njhill/vllm
that referenced
this pull request
Jul 8, 2024
Currently the LoRA tokenizers aren't used in the OpenAI APIs, meaning the behaviour won't be correct if adapters are used that have custom added tokens. This PR includes changes to address that. It mostly replaces vllm-project#3512. More work is needed to address remaining inconsistencies in tokenization behaviour between the OpenAI front-end and standalone LLMEngine/AsyncLLMEngine use, including: - Standalone cases don't honor truncation and add_special_tokens request parameters - OpenAI API cases don't make use of TokenizerGroups for possible parallelization of tokenization As well as some other inefficiencies. But these are to be addressed in follow-on PRs.
Member
|
Closing as superseded by #6227. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
The front-end server code currently doesn't use lora-specific tokenizers.
It also won't make use of the recently introduced parallel async tokenization if enabled.