[Mistral common] Ensure all functions are imported from the top & only use public methods#31138
Conversation
9551034 to
669d8e8
Compare
There was a problem hiding this comment.
Code Review
This pull request refactors the Mistral tokenizer to use public APIs from mistral_common and moves all imports to the top of the file. The dependencies for mistral_common are also updated. While the changes are generally positive for code quality, I've identified a critical bug in the implementation of _get_special_token_ids that will cause a TypeError at runtime. Additionally, there are several duplicated imports that should be consolidated to improve code clarity.
vllm/tokenizers/mistral.py
Outdated
| else: | ||
| raise ValueError(f"Unknown tokenizer type: {type(self.tokenizer)}") | ||
| return sorted(special_ids) | ||
| return sorted(self.tokenizer.is_special(i) for i in len(self._vocab)) |
There was a problem hiding this comment.
This implementation is incorrect and will cause a TypeError at runtime because len(self._vocab) is an integer, not an iterable. It should be range(len(self._vocab)). Additionally, self.tokenizer.is_special(i) returns a boolean, but the function is expected to return a list[int]. The implementation should collect the indices i for which is_special(i) is true.
return sorted([i for i in range(len(self._vocab)) if self.tokenizer.is_special(i)])There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| from .protocol import TokenizerLike | ||
|
|
||
| if TYPE_CHECKING: | ||
| from mistral_common.protocol.instruct.request import ( |
There was a problem hiding this comment.
mistral_common is very light weight (has no heavy dependencies) and is a necessary requirement in common.txt => so I think it's cleaner to directly import at the top
There was a problem hiding this comment.
I think this will cause a problem for people who don't have mistral_common installed. We currently still import tokenizers.mistral from various places (in particular chat-related entrypoints). Need to wait for #30200 to make those imports lazy.
Ah ok was under the impression that mistral_common is a req dep. Ok let's wait with this PR then! |
juliendenize
left a comment
There was a problem hiding this comment.
Left some comments ! Looks better with is_special API !
Co-authored-by: Julien Denize <40604584+juliendenize@users.noreply.github.com> Signed-off-by: Patrick von Platen <patrick.v.platen@gmail.com>
Co-authored-by: Julien Denize <40604584+juliendenize@users.noreply.github.com> Signed-off-by: Patrick von Platen <patrick.v.platen@gmail.com>
DarkLight1337
left a comment
There was a problem hiding this comment.
Ok, let's merge this and see if anyone complains 😅
Thanks! Happy to revert in case it leads to bigger complaints! I think all the test failures above are unrelated / flaky |
…y use public methods (vllm-project#31138) Signed-off-by: Patrick von Platen <patrick.v.platen@gmail.com> Co-authored-by: Julien Denize <40604584+juliendenize@users.noreply.github.com> Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk> Signed-off-by: Patrick von Platen <patrick.v.platen@gmail.com>
…y use public methods (vllm-project#31138) Signed-off-by: Patrick von Platen <patrick.v.platen@gmail.com> Co-authored-by: Julien Denize <40604584+juliendenize@users.noreply.github.com> Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
…y use public methods (vllm-project#31138) Signed-off-by: Patrick von Platen <patrick.v.platen@gmail.com> Co-authored-by: Julien Denize <40604584+juliendenize@users.noreply.github.com> Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk> Signed-off-by: dsuhinin <suhinin.dmitriy@gmail.com>
…y use public methods (vllm-project#31138) Signed-off-by: Patrick von Platen <patrick.v.platen@gmail.com> Co-authored-by: Julien Denize <40604584+juliendenize@users.noreply.github.com> Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
This PR makes sure that only public methods are used and that all imports are done at the top