Skip to content

Fix: Correct max_model_len derivation from config.json for Mistral format#17777

Closed
princepride wants to merge 72 commits intovllm-project:mainfrom
princepride:patch-1
Closed

Fix: Correct max_model_len derivation from config.json for Mistral format#17777
princepride wants to merge 72 commits intovllm-project:mainfrom
princepride:patch-1

Conversation

@princepride
Copy link
Contributor

@princepride princepride commented May 7, 2025

FIX #17747

Description of the Bug:

When loading models using the "mistral" format (--config-format mistral), the load_params_config function is invoked. This function prioritizes loading model configuration from a params.json file. For certain models, such as mistralai/Mistral-Small-3.1-24B-Instruct-2503, the params.json file does not contain explicit values for max_seq_len or max_position_embeddings.

In such cases, the original load_params_config function would apply a hardcoded default value of 128,000 for both max_seq_len and max_position_embeddings. This occurred even if the standard Hugging Face config.json file for the model specified a different (and correct) value for these parameters (e.g., max_position_embeddings: 131072 in the text_config of Mistral-Small-3.1).

This discrepancy led to vLLM deriving an incorrect maximum model length (128,000), triggering warnings if the user specified a --max-model-len closer to the true model capacity, and potentially causing runtime errors or incorrect behavior when processing sequences longer than this erroneously derived limit.

Solution:

This PR modifies the load_params_config function to implement a more robust defaulting mechanism for max_seq_len and max_position_embeddings when the "mistral" format is used:

  1. The function still prioritizes values for these parameters if they are explicitly defined in the params.json file.
  2. If these parameters are not found in params.json:
    • The function now attempts to load the standard Hugging Face config.json for the same model.
    • It checks config.json (looking first within a text_config dictionary, then at the top level) for max_position_embeddings and max_seq_len.
    • If found in config.json, these values are used as the defaults.
  3. Only if the parameters are absent from both params.json and config.json will the hardcoded fallback of 128,000 be applied. (For max_seq_len, if it's missing but max_position_embeddings was determined from config.json, max_seq_len will default to that determined max_position_embeddings value before falling back to 128,000).

This change ensures that models like mistralai/Mistral-Small-3.1-24B-Instruct-2503, which have accurate length information in their standard config.json but not in their params.json, will have their maximum sequence lengths correctly determined by vLLM. This resolves the misleading warnings and allows users to utilize the model's true context capacity. The fix maintains backward compatibility by respecting params.json content first and retaining the ultimate fallback if no configuration is found.

Here is the execute result after the bug fix:
屏幕截图 2025-05-07 171057

Signed-off-by: princepride princepride@gmail.com

@github-actions
Copy link

github-actions bot commented May 7, 2025

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

🚀

@princepride princepride changed the title Fix https://github.com/vllm-project/vllm/issues/17747 Bug Fix: Correct max_model_len derivation from config.json for Mistral format May 7, 2025
@DarkLight1337
Copy link
Member

cc @tjohnson31415

@princepride
Copy link
Contributor Author

@DarkLight1337 Can you review it

@DarkLight1337
Copy link
Member

@tjohnson31415 can you review? I can stamp if you approve

Copy link
Contributor

@tjohnson31415 tjohnson31415 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for getting a fix up quickly!

The logic looks sound; I'm just looking for ways to simplify.

From what I see, we can remove all the logic around max_seq_len.
The git blame shows that setting max_seq_len was done in the initial PR to support Pixtral, but a follow-on hotfix PR added max_position_embeddings to actually get it working. AFAICT, the Pixtral models (and all Mistral models) use max_position_embeddings since the text model is Llama based.

Since the goal here is to fall back to the HF config for the missing keys, instead of parsing config.json as a raw dict, it would be more complete to use get_config with ConfigFormat.HF. This adds an additional default value of max_position_embeddings from the config class if it is missing from config.json. Doing this also enables using hf_config.get_text_config() to simplify accessing the langauge model configuration.

Comment on lines 665 to 668
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This currently always loads and inspects the config.json. Instead, we should only do this extra lookup if config_dict is missing max_position_embeddings.

princepride and others added 22 commits May 10, 2025 12:03
Signed-off-by: 汪志鹏 <wangzhipeng628@gmail.com>
Signed-off-by: 汪志鹏 <wangzhipeng628@gmail.com>
Signed-off-by: 汪志鹏 <wangzhipeng628@gmail.com>
Signed-off-by: 汪志鹏 <wangzhipeng628@gmail.com>
Signed-off-by: 汪志鹏 <wangzhipeng628@gmail.com>
Signed-off-by: reidliu41 <reid201711@gmail.com>
Co-authored-by: reidliu41 <reid201711@gmail.com>
Signed-off-by: 汪志鹏 <wangzhipeng628@gmail.com>
vllm-project#17139)

Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com>
Signed-off-by: 汪志鹏 <wangzhipeng628@gmail.com>
Signed-off-by: Christian Heimes <christian@python.org>
Signed-off-by: 汪志鹏 <wangzhipeng628@gmail.com>
Signed-off-by: Isotr0py <2037008807@qq.com>
Signed-off-by: 汪志鹏 <wangzhipeng628@gmail.com>
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Signed-off-by: 汪志鹏 <wangzhipeng628@gmail.com>
…project#17793)

Signed-off-by: Isotr0py <2037008807@qq.com>
Signed-off-by: 汪志鹏 <wangzhipeng628@gmail.com>
Signed-off-by: 汪志鹏 <wangzhipeng628@gmail.com>
…llm-project#14238)

Signed-off-by: Akshat Tripathi <akshat@krai.ai>
Signed-off-by: Chengji Yao <chengjiyao@google.com>
Co-authored-by: Chengji Yao <chengjiyao@google.com>
Signed-off-by: 汪志鹏 <wangzhipeng628@gmail.com>
…llm-project#17811)

Signed-off-by: Nick Hill <nhill@redhat.com>
Signed-off-by: 汪志鹏 <wangzhipeng628@gmail.com>
Signed-off-by: Wallas Santos <wallashss@ibm.com>
Signed-off-by: 汪志鹏 <wangzhipeng628@gmail.com>
…t#17815)

Signed-off-by: Aaron Pham <contact@aarnphm.xyz>
Signed-off-by: 汪志鹏 <wangzhipeng628@gmail.com>
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Signed-off-by: 汪志鹏 <wangzhipeng628@gmail.com>
)

Signed-off-by: Nick Hill <nhill@redhat.com>
Signed-off-by: 汪志鹏 <wangzhipeng628@gmail.com>
Signed-off-by: Chanh Nguyen <cnguyen@linkedin.com>
Co-authored-by: Chanh Nguyen <cnguyen@linkedin.com>
Signed-off-by: 汪志鹏 <wangzhipeng628@gmail.com>
Signed-off-by: Vadim Markovtsev <vadim@poolside.ai>
Signed-off-by: 汪志鹏 <wangzhipeng628@gmail.com>
Signed-off-by: Mikhail Podvitskii <podvitskiymichael@gmail.com>
Signed-off-by: 汪志鹏 <wangzhipeng628@gmail.com>
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Signed-off-by: 汪志鹏 <wangzhipeng628@gmail.com>
pavanimajety and others added 3 commits May 10, 2025 12:03
…llm-project#17913)

Signed-off-by: Mark McLoughlin <markmc@redhat.com>
Signed-off-by: 汪志鹏 <wangzhipeng628@gmail.com>
Signed-off-by: 汪志鹏 <wangzhipeng628@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ci/build documentation Improvements or additions to documentation frontend multi-modality Related to multi-modality (#4194) structured-output tool-calling tpu Related to Google TPUs v1

Projects

Status: Done
Status: Done

Development

Successfully merging this pull request may close these issues.

[Bug]: Issues with max_model_len and config_format mistral