Skip to content

Patch Mistral config#37104

Merged
hmellor merged 2 commits intovllm-project:mainfrom
juliendenize:silence_warnings_hf
Mar 16, 2026
Merged

Patch Mistral config#37104
hmellor merged 2 commits intovllm-project:mainfrom
juliendenize:silence_warnings_hf

Conversation

@juliendenize
Copy link
Copy Markdown
Contributor

@juliendenize juliendenize commented Mar 15, 2026

Purpose

This PR does the following:

  • rope parameters are now casted to the type expected by Transformers v5. I believe it has no effect on vLLM computations but please correct me if I'm wrong. This silences warnings raised by Transformers
  • ignore warnings from Transformers relative to apply_yarn_scaling parameter not found that is an argument stored by Mistral config but unknown to Transformers.
  • infer dtype directly in the Mistral config instead of later in the code. This way errors mentioning the model is not a safetensors repo that is raised when infering dtype in multiple places are removed by only one change instead of multiple ones.

Test Plan

Checked by serving a Mistral model

Test Result

serving worked


Essential Elements of an Effective PR Description Checklist
  • The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
  • The test plan, such as providing test command.
  • The test results, such as pasting the results comparison before and after, or e2e results
  • (Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
  • (Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces several patches for Mistral model configuration handling. It silences some warnings from the Transformers library, improves data type casting for RoPE parameters, and refactors the data type inference to be more robust. The changes are generally good, but I've found a critical thread-safety issue with how global constants are being modified. My review includes a suggestion to fix this potential race condition.

Comment on lines +142 to +152
@contextmanager
def _mistral_patch_hf_hub_constants() -> Iterator[None]:
hf_safetensors_single_file = constants.SAFETENSORS_SINGLE_FILE
hf_safetensors_index_file = constants.SAFETENSORS_INDEX_FILE
constants.SAFETENSORS_SINGLE_FILE = "consolidated.safetensors"
constants.SAFETENSORS_INDEX_FILE = "consolidated.safetensors.index.json"
try:
yield
finally:
constants.SAFETENSORS_SINGLE_FILE = hf_safetensors_single_file
constants.SAFETENSORS_INDEX_FILE = hf_safetensors_index_file
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

The modification of global constants in huggingface_hub.constants is not thread-safe. In a scenario where multiple models are loaded concurrently in different threads (e.g., one Mistral model and one standard Hugging Face model), this monkey-patching can create a race condition. One thread might be expecting the default constant values while another has temporarily changed them, potentially leading to FileNotFoundError or other unpredictable behavior during model loading. To prevent this, the critical section where constants are modified should be protected by a lock.

Please also add import threading at the top of the file.

_mistral_patch_lock = threading.Lock()


@contextmanager
def _mistral_patch_hf_hub_constants() -> Iterator[None]:
    with _mistral_patch_lock:
        hf_safetensors_single_file = constants.SAFETENSORS_SINGLE_FILE
        hf_safetensors_index_file = constants.SAFETENSORS_INDEX_FILE
        constants.SAFETENSORS_SINGLE_FILE = "consolidated.safetensors"
        constants.SAFETENSORS_INDEX_FILE = "consolidated.safetensors.index.json"
        try:
            yield
        finally:
            constants.SAFETENSORS_SINGLE_FILE = hf_safetensors_single_file
            constants.SAFETENSORS_INDEX_FILE = hf_safetensors_index_file

@DarkLight1337 DarkLight1337 requested a review from hmellor March 16, 2026 08:03
@hmellor
Copy link
Copy Markdown
Member

hmellor commented Mar 16, 2026

To clarify, are these warnings that are appearing in v4 or v5?

Also, would it not be better to add apply_yarn_scaling to the list of expected keys in Transformers so that this issue is fixed for you everywhere, not just in vLLM?

@juliendenize
Copy link
Copy Markdown
Contributor Author

juliendenize commented Mar 16, 2026

@hmellor it only occurs for v5.

Also, would it not be better to add apply_yarn_scaling to the list of expected keys in Transformers so that this issue is fixed for you everywhere, not just in vLLM?

vLLM use arguments slightly differently than HF that requires adding this key which is not needed in HF so not sure it would make sense. Either way for now it's not in hf so silencing the warnings would be cool as a temporary fix if that's ok to you.

@hmellor
Copy link
Copy Markdown
Member

hmellor commented Mar 16, 2026

@juliendenize
Copy link
Copy Markdown
Contributor Author

Yeah we can do it if you prefer we do it like that. So for this current pr do you want me to discard only this filter warning but keep the casting ?

@hmellor
Copy link
Copy Markdown
Member

hmellor commented Mar 16, 2026

Yeah I think that's best. It hardens the conversion in vLLM and ensures that you don't get the unexpected key warning anywhere that Mistral configs are loaded with Transformers.

Signed-off-by: juliendenize <julien.denize@mistral.ai>
Signed-off-by: juliendenize <julien.denize@mistral.ai>
@juliendenize
Copy link
Copy Markdown
Contributor Author

Made the Transformers PR here :)
huggingface/transformers#44747

@github-actions github-actions bot added the ready ONLY add when PR is ready to merge/full CI is needed label Mar 16, 2026
@hmellor
Copy link
Copy Markdown
Member

hmellor commented Mar 16, 2026

Could we instead pass apply_yarn_scaling to validate_rope in

config.validate_rope()
(I wasn't aware that this was an option before).

Something like:

ignore_keys = {}
if config_format == "mistral" and config.rope_parameters.type == "yarn":
    ignore_keys.add("apply_yarn_scaling")
config.validate_rope(ignore_keys=ignore_keys)

@hmellor hmellor merged commit ffbc2e5 into vllm-project:main Mar 16, 2026
45 checks passed
@juliendenize
Copy link
Copy Markdown
Contributor Author

Would have been better indeed i'll do a followup pr once i have time

@hmellor
Copy link
Copy Markdown
Member

hmellor commented Mar 17, 2026

I've implemented this in #37292

Lucaskabela pushed a commit to Lucaskabela/vllm that referenced this pull request Mar 17, 2026
Signed-off-by: juliendenize <julien.denize@mistral.ai>
wendyliu235 pushed a commit to wendyliu235/vllm-public that referenced this pull request Mar 18, 2026
Signed-off-by: juliendenize <julien.denize@mistral.ai>
fxdawnn pushed a commit to fxdawnn/vllm that referenced this pull request Mar 19, 2026
Signed-off-by: juliendenize <julien.denize@mistral.ai>
khairulkabir1661 pushed a commit to khairulkabir1661/vllm that referenced this pull request Mar 27, 2026
Signed-off-by: juliendenize <julien.denize@mistral.ai>
Monishver11 pushed a commit to Monishver11/vllm that referenced this pull request Mar 27, 2026
Signed-off-by: juliendenize <julien.denize@mistral.ai>
Signed-off-by: Monishver Chandrasekaran <monishverchandrasekaran@gmail.com>
JiantaoXu pushed a commit to JiantaoXu/vllm that referenced this pull request Mar 28, 2026
Signed-off-by: juliendenize <julien.denize@mistral.ai>
vrdn-23 pushed a commit to vrdn-23/vllm that referenced this pull request Mar 30, 2026
Signed-off-by: juliendenize <julien.denize@mistral.ai>
Signed-off-by: Vinay Damodaran <vrdn@hey.com>
EricccYang pushed a commit to EricccYang/vllm that referenced this pull request Apr 1, 2026
Signed-off-by: juliendenize <julien.denize@mistral.ai>
Signed-off-by: EricccYang <yangyang4991@gmail.com>
liuchenbing2026 pushed a commit to liuchenbing2026/vllm that referenced this pull request Apr 4, 2026
Signed-off-by: juliendenize <julien.denize@mistral.ai>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ready ONLY add when PR is ready to merge/full CI is needed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants