Fix: Correct max_model_len derivation from config.json for Mistral format by princepride · Pull Request #17777 · vllm-project/vllm

princepride · 2025-05-07T09:06:20Z

Description of the Bug:

When loading models using the "mistral" format (--config-format mistral), the load_params_config function is invoked. This function prioritizes loading model configuration from a params.json file. For certain models, such as mistralai/Mistral-Small-3.1-24B-Instruct-2503, the params.json file does not contain explicit values for max_seq_len or max_position_embeddings.

In such cases, the original load_params_config function would apply a hardcoded default value of 128,000 for both max_seq_len and max_position_embeddings. This occurred even if the standard Hugging Face config.json file for the model specified a different (and correct) value for these parameters (e.g., max_position_embeddings: 131072 in the text_config of Mistral-Small-3.1).

This discrepancy led to vLLM deriving an incorrect maximum model length (128,000), triggering warnings if the user specified a --max-model-len closer to the true model capacity, and potentially causing runtime errors or incorrect behavior when processing sequences longer than this erroneously derived limit.

Solution:

This PR modifies the load_params_config function to implement a more robust defaulting mechanism for max_seq_len and max_position_embeddings when the "mistral" format is used:

The function still prioritizes values for these parameters if they are explicitly defined in the params.json file.
If these parameters are not found in params.json:
- The function now attempts to load the standard Hugging Face config.json for the same model.
- It checks config.json (looking first within a text_config dictionary, then at the top level) for max_position_embeddings and max_seq_len.
- If found in config.json, these values are used as the defaults.
Only if the parameters are absent from both params.json and config.json will the hardcoded fallback of 128,000 be applied. (For max_seq_len, if it's missing but max_position_embeddings was determined from config.json, max_seq_len will default to that determined max_position_embeddings value before falling back to 128,000).

This change ensures that models like mistralai/Mistral-Small-3.1-24B-Instruct-2503, which have accurate length information in their standard config.json but not in their params.json, will have their maximum sequence lengths correctly determined by vLLM. This resolves the misleading warnings and allows users to utilize the model's true context capacity. The fix maintains backward compatibility by respecting params.json content first and retaining the ultimate fallback if no configuration is found.

Here is the execute result after the bug fix:

Signed-off-by: princepride princepride@gmail.com

github-actions · 2025-05-07T09:06:28Z

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

🚀

DarkLight1337 · 2025-05-07T09:10:18Z

cc @tjohnson31415

princepride · 2025-05-09T01:34:53Z

@DarkLight1337 Can you review it

DarkLight1337 · 2025-05-09T03:13:20Z

@tjohnson31415 can you review? I can stamp if you approve

tjohnson31415

Thanks for getting a fix up quickly!

The logic looks sound; I'm just looking for ways to simplify.

From what I see, we can remove all the logic around max_seq_len.
The git blame shows that setting max_seq_len was done in the initial PR to support Pixtral, but a follow-on hotfix PR added max_position_embeddings to actually get it working. AFAICT, the Pixtral models (and all Mistral models) use max_position_embeddings since the text model is Llama based.

Since the goal here is to fall back to the HF config for the missing keys, instead of parsing config.json as a raw dict, it would be more complete to use get_config with ConfigFormat.HF. This adds an additional default value of max_position_embeddings from the config class if it is missing from config.json. Doing this also enables using hf_config.get_text_config() to simplify accessing the langauge model configuration.

tjohnson31415 · 2025-05-09T19:13:52Z

vllm/transformers_utils/config.py

This currently always loads and inspects the config.json. Instead, we should only do this extra lookup if config_dict is missing max_position_embeddings.

Signed-off-by: 汪志鹏 <wangzhipeng628@gmail.com>

Signed-off-by: reidliu41 <reid201711@gmail.com> Co-authored-by: reidliu41 <reid201711@gmail.com> Signed-off-by: 汪志鹏 <wangzhipeng628@gmail.com>

vllm-project#17139) Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com> Signed-off-by: 汪志鹏 <wangzhipeng628@gmail.com>

Signed-off-by: Christian Heimes <christian@python.org> Signed-off-by: 汪志鹏 <wangzhipeng628@gmail.com>

Signed-off-by: Isotr0py <2037008807@qq.com> Signed-off-by: 汪志鹏 <wangzhipeng628@gmail.com>

Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> Signed-off-by: 汪志鹏 <wangzhipeng628@gmail.com>

…project#17793) Signed-off-by: Isotr0py <2037008807@qq.com> Signed-off-by: 汪志鹏 <wangzhipeng628@gmail.com>

Signed-off-by: 汪志鹏 <wangzhipeng628@gmail.com>

…llm-project#14238) Signed-off-by: Akshat Tripathi <akshat@krai.ai> Signed-off-by: Chengji Yao <chengjiyao@google.com> Co-authored-by: Chengji Yao <chengjiyao@google.com> Signed-off-by: 汪志鹏 <wangzhipeng628@gmail.com>

…llm-project#17811) Signed-off-by: Nick Hill <nhill@redhat.com> Signed-off-by: 汪志鹏 <wangzhipeng628@gmail.com>

Signed-off-by: Wallas Santos <wallashss@ibm.com> Signed-off-by: 汪志鹏 <wangzhipeng628@gmail.com>

…t#17815) Signed-off-by: Aaron Pham <contact@aarnphm.xyz> Signed-off-by: 汪志鹏 <wangzhipeng628@gmail.com>

Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> Signed-off-by: 汪志鹏 <wangzhipeng628@gmail.com>

) Signed-off-by: Nick Hill <nhill@redhat.com> Signed-off-by: 汪志鹏 <wangzhipeng628@gmail.com>

Signed-off-by: Chanh Nguyen <cnguyen@linkedin.com> Co-authored-by: Chanh Nguyen <cnguyen@linkedin.com> Signed-off-by: 汪志鹏 <wangzhipeng628@gmail.com>

Signed-off-by: Vadim Markovtsev <vadim@poolside.ai> Signed-off-by: 汪志鹏 <wangzhipeng628@gmail.com>

Signed-off-by: Mikhail Podvitskii <podvitskiymichael@gmail.com> Signed-off-by: 汪志鹏 <wangzhipeng628@gmail.com>

Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> Signed-off-by: 汪志鹏 <wangzhipeng628@gmail.com>

…oject#16362) Signed-off-by: 汪志鹏 <wangzhipeng628@gmail.com>

…llm-project#17913) Signed-off-by: Mark McLoughlin <markmc@redhat.com> Signed-off-by: 汪志鹏 <wangzhipeng628@gmail.com>

Signed-off-by: 汪志鹏 <wangzhipeng628@gmail.com>

princepride changed the title ~~Fix https://github.com/vllm-project/vllm/issues/17747 Bug~~ Fix: Correct max_model_len derivation from config.json for Mistral format May 7, 2025

princepride force-pushed the patch-1 branch from e20c121 to f42b264 Compare May 7, 2025 10:09

princepride mentioned this pull request May 7, 2025

[Bug]: Issues with max_model_len and config_format mistral #17747

Closed

1 task

tjohnson31415 reviewed May 9, 2025

View reviewed changes

princepride and others added 22 commits May 10, 2025 12:03

Fix vllm-project#17747 Bug

a61b7d5

Signed-off-by: 汪志鹏 <wangzhipeng628@gmail.com>

Update config.py

b30790a

Signed-off-by: 汪志鹏 <wangzhipeng628@gmail.com>

Update config.py

8183e97

Signed-off-by: 汪志鹏 <wangzhipeng628@gmail.com>

Update config.py

c33870b

Signed-off-by: 汪志鹏 <wangzhipeng628@gmail.com>

Update config.py

88c4d8a

Signed-off-by: 汪志鹏 <wangzhipeng628@gmail.com>

[doc] update the issue link (vllm-project#17782)

074e27e

Signed-off-by: reidliu41 <reid201711@gmail.com> Co-authored-by: reidliu41 <reid201711@gmail.com> Signed-off-by: 汪志鹏 <wangzhipeng628@gmail.com>

[ROCm][FP8][Kernel] FP8 quantization fused into Custom Paged Attention (

f8890d3

vllm-project#17139) Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com> Signed-off-by: 汪志鹏 <wangzhipeng628@gmail.com>

Only depend on importlib-metadata for Python < 3.10 (vllm-project#17776)

bbb980c

Signed-off-by: Christian Heimes <christian@python.org> Signed-off-by: 汪志鹏 <wangzhipeng628@gmail.com>

[Bugfix] Fix Video IO error for short video (vllm-project#17791)

9992ed9

Signed-off-by: Isotr0py <2037008807@qq.com> Signed-off-by: 汪志鹏 <wangzhipeng628@gmail.com>

Fix and simplify deprecated=True CLI kwarg (vllm-project#17781)

981767c

Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> Signed-off-by: 汪志鹏 <wangzhipeng628@gmail.com>

[Bugfix] Fix missing lora name mapping for lora without prefix (vllm-…

3f7dcec

…project#17793) Signed-off-by: Isotr0py <2037008807@qq.com> Signed-off-by: 汪志鹏 <wangzhipeng628@gmail.com>

[Quantization] Quark MXFP4 format loading (vllm-project#16943)

29aed29

Signed-off-by: 汪志鹏 <wangzhipeng628@gmail.com>

[BugFix] Avoid secondary missing MultiprocExecutor.workers error (v…

a49056b

…llm-project#17811) Signed-off-by: Nick Hill <nhill@redhat.com> Signed-off-by: 汪志鹏 <wangzhipeng628@gmail.com>

[Core][Feature] Input metadata dump on crash (vllm-project#13407)

5ed956b

Signed-off-by: Wallas Santos <wallashss@ibm.com> Signed-off-by: 汪志鹏 <wangzhipeng628@gmail.com>

[Chore][Doc] uses model id determined from OpenAI client (vllm-projec…

aef8e82

…t#17815) Signed-off-by: Aaron Pham <contact@aarnphm.xyz> Signed-off-by: 汪志鹏 <wangzhipeng628@gmail.com>

Don't call the venv vllm (vllm-project#17810)

023a56a

Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> Signed-off-by: 汪志鹏 <wangzhipeng628@gmail.com>

[BugFix] Fix --disable-log-stats in V1 server mode (vllm-project#17600

4cf0313

) Signed-off-by: Nick Hill <nhill@redhat.com> Signed-off-by: 汪志鹏 <wangzhipeng628@gmail.com>

[Core] Support full cuda graph in v1 (vllm-project#16072)

8835526

Signed-off-by: Chanh Nguyen <cnguyen@linkedin.com> Co-authored-by: Chanh Nguyen <cnguyen@linkedin.com> Signed-off-by: 汪志鹏 <wangzhipeng628@gmail.com>

Improve exception reporting in MP engine (vllm-project#17800)

24aa65b

Signed-off-by: Vadim Markovtsev <vadim@poolside.ai> Signed-off-by: 汪志鹏 <wangzhipeng628@gmail.com>

[Installation] OpenTelemetry version update (vllm-project#17771)

2752cee

Signed-off-by: Mikhail Podvitskii <podvitskiymichael@gmail.com> Signed-off-by: 汪志鹏 <wangzhipeng628@gmail.com>

Only log non-default CLI args for online serving (vllm-project#17803)

0aa8870

Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> Signed-off-by: 汪志鹏 <wangzhipeng628@gmail.com>

pavanimajety and others added 3 commits May 10, 2025 12:03

[Hardware/NVIDIA/Kernel] Enable nvidia/DeepSeek-R1-FP4 Model (vllm-pr…

5eb9495

…oject#16362) Signed-off-by: 汪志鹏 <wangzhipeng628@gmail.com>

[V1][Spec Decoding] Log accumulated metrics after system goes idle (v…

560c70e

…llm-project#17913) Signed-off-by: Mark McLoughlin <markmc@redhat.com> Signed-off-by: 汪志鹏 <wangzhipeng628@gmail.com>

adjust the code

1153393

Signed-off-by: 汪志鹏 <wangzhipeng628@gmail.com>

princepride force-pushed the patch-1 branch from d2bb2f3 to 1153393 Compare May 10, 2025 04:03

princepride requested review from DarkLight1337, WoosukKwon, alexm-redhat, comaniac, mgoin, njhill, robertgshaw2-redhat, russellb, simon-mo, tlrmchlsmth and ywang96 as code owners May 10, 2025 04:03

mergify bot added documentation Improvements or additions to documentation ci/build frontend multi-modality Related to multi-modality (#4194) structured-output v1 tpu Related to Google TPUs tool-calling labels May 10, 2025

github-project-automation bot added this to Structured Output and Tool Calling May 10, 2025

princepride closed this May 10, 2025

github-project-automation bot moved this to Done in Structured Output May 10, 2025

github-project-automation bot moved this to Done in Tool Calling May 10, 2025

princepride deleted the patch-1 branch May 10, 2025 04:25

princepride mentioned this pull request May 10, 2025

[BugFix] Correct max_model_len derivation from config.json for Mistral format #17937

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix: Correct max_model_len derivation from config.json for Mistral format#17777

Fix: Correct max_model_len derivation from config.json for Mistral format#17777
princepride wants to merge 72 commits intovllm-project:mainfrom
princepride:patch-1

princepride commented May 7, 2025 •

edited by github-actions bot

Loading

Uh oh!

github-actions bot commented May 7, 2025

Uh oh!

DarkLight1337 commented May 7, 2025

Uh oh!

princepride commented May 9, 2025

Uh oh!

DarkLight1337 commented May 9, 2025

Uh oh!

tjohnson31415 left a comment

Uh oh!

tjohnson31415 May 9, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

20 participants

Uh oh!

Conversation

princepride commented May 7, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented May 7, 2025

Uh oh!

DarkLight1337 commented May 7, 2025

Uh oh!

princepride commented May 9, 2025

Uh oh!

DarkLight1337 commented May 9, 2025

Uh oh!

tjohnson31415 left a comment

Choose a reason for hiding this comment

Uh oh!

tjohnson31415 May 9, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

20 participants

princepride commented May 7, 2025 •

edited by github-actions bot

Loading