Skip to content

[Mistral common] Ensure all functions are imported from the top & only use public methods#31138

Merged
vllm-bot merged 9 commits intovllm-project:mainfrom
patrickvonplaten:refactor_mistral_tok
Dec 26, 2025
Merged

[Mistral common] Ensure all functions are imported from the top & only use public methods#31138
vllm-bot merged 9 commits intovllm-project:mainfrom
patrickvonplaten:refactor_mistral_tok

Conversation

@patrickvonplaten
Copy link
Copy Markdown
Collaborator

This PR makes sure that only public methods are used and that all imports are done at the top

Signed-off-by: Patrick von Platen <patrick.v.platen@gmail.com>
Signed-off-by: Patrick von Platen <patrick.v.platen@gmail.com>
Signed-off-by: Patrick von Platen <patrick.v.platen@gmail.com>
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request refactors the Mistral tokenizer to use public APIs from mistral_common and moves all imports to the top of the file. The dependencies for mistral_common are also updated. While the changes are generally positive for code quality, I've identified a critical bug in the implementation of _get_special_token_ids that will cause a TypeError at runtime. Additionally, there are several duplicated imports that should be consolidated to improve code clarity.

else:
raise ValueError(f"Unknown tokenizer type: {type(self.tokenizer)}")
return sorted(special_ids)
return sorted(self.tokenizer.is_special(i) for i in len(self._vocab))
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

This implementation is incorrect and will cause a TypeError at runtime because len(self._vocab) is an integer, not an iterable. It should be range(len(self._vocab)). Additionally, self.tokenizer.is_special(i) returns a boolean, but the function is expected to return a list[int]. The implementation should collect the indices i for which is_special(i) is true.

return sorted([i for i in range(len(self._vocab)) if self.tokenizer.is_special(i)])

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Signed-off-by: Patrick von Platen <patrick.v.platen@gmail.com>
from .protocol import TokenizerLike

if TYPE_CHECKING:
from mistral_common.protocol.instruct.request import (
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mistral_common is very light weight (has no heavy dependencies) and is a necessary requirement in common.txt => so I think it's cleaner to directly import at the top

Copy link
Copy Markdown
Member

@DarkLight1337 DarkLight1337 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this will cause a problem for people who don't have mistral_common installed. We currently still import tokenizers.mistral from various places (in particular chat-related entrypoints). Need to wait for #30200 to make those imports lazy.

@patrickvonplaten
Copy link
Copy Markdown
Collaborator Author

I think this will cause a problem for people who don't have mistral_common installed. We currently still import tokenizers.mistral from various places (in particular chat-related entrypoints). Need to wait for #30200 to make those imports lazy.

Ah ok was under the impression that mistral_common is a req dep. Ok let's wait with this PR then!

Copy link
Copy Markdown
Contributor

@juliendenize juliendenize left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left some comments ! Looks better with is_special API !

@DarkLight1337
Copy link
Copy Markdown
Member

It is indeed a required dependency but iirc we only made it so to avoid conflicts with OpenCV? cc @Isotr0py @ywang96

patrickvonplaten and others added 2 commits December 22, 2025 12:01
Co-authored-by: Julien Denize <40604584+juliendenize@users.noreply.github.com>
Signed-off-by: Patrick von Platen <patrick.v.platen@gmail.com>
Co-authored-by: Julien Denize <40604584+juliendenize@users.noreply.github.com>
Signed-off-by: Patrick von Platen <patrick.v.platen@gmail.com>
@patrickvonplaten
Copy link
Copy Markdown
Collaborator Author

It is indeed a required dependency but iirc we only made it so to avoid conflicts with OpenCV? cc @Isotr0py @ywang96

Actually I think this was solved inside mistral_common

Copy link
Copy Markdown
Member

@DarkLight1337 DarkLight1337 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, let's merge this and see if anyone complains 😅

@DarkLight1337 DarkLight1337 enabled auto-merge (squash) December 23, 2025 13:32
@github-actions github-actions bot added the ready ONLY add when PR is ready to merge/full CI is needed label Dec 23, 2025
@patrickvonplaten
Copy link
Copy Markdown
Collaborator Author

Ok, let's merge this and see if anyone complains 😅

Thanks! Happy to revert in case it leads to bigger complaints! I think all the test failures above are unrelated / flaky

@vllm-bot vllm-bot merged commit 48e7449 into vllm-project:main Dec 26, 2025
13 of 14 checks passed
patrickvonplaten added a commit to patrickvonplaten/vllm that referenced this pull request Dec 27, 2025
…y use public methods (vllm-project#31138)

Signed-off-by: Patrick von Platen <patrick.v.platen@gmail.com>
Co-authored-by: Julien Denize <40604584+juliendenize@users.noreply.github.com>
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
Signed-off-by: Patrick von Platen <patrick.v.platen@gmail.com>
yiliu30 pushed a commit to yiliu30/vllm-fork that referenced this pull request Dec 30, 2025
…y use public methods (vllm-project#31138)

Signed-off-by: Patrick von Platen <patrick.v.platen@gmail.com>
Co-authored-by: Julien Denize <40604584+juliendenize@users.noreply.github.com>
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
dsuhinin pushed a commit to dsuhinin/vllm that referenced this pull request Jan 21, 2026
…y use public methods (vllm-project#31138)

Signed-off-by: Patrick von Platen <patrick.v.platen@gmail.com>
Co-authored-by: Julien Denize <40604584+juliendenize@users.noreply.github.com>
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
Signed-off-by: dsuhinin <suhinin.dmitriy@gmail.com>
ItzDEXX pushed a commit to ItzDEXX/vllm that referenced this pull request Feb 19, 2026
…y use public methods (vllm-project#31138)

Signed-off-by: Patrick von Platen <patrick.v.platen@gmail.com>
Co-authored-by: Julien Denize <40604584+juliendenize@users.noreply.github.com>
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ci/build ready ONLY add when PR is ready to merge/full CI is needed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants