Fix: Correct max_model_len derivation from config.json for Mistral format#17777
Fix: Correct max_model_len derivation from config.json for Mistral format#17777princepride wants to merge 72 commits intovllm-project:mainfrom
Conversation
|
👋 Hi! Thank you for contributing to the vLLM project. 💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels. Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging. To run CI, PR reviewers can either: Add 🚀 |
|
@DarkLight1337 Can you review it |
|
@tjohnson31415 can you review? I can stamp if you approve |
tjohnson31415
left a comment
There was a problem hiding this comment.
Thanks for getting a fix up quickly!
The logic looks sound; I'm just looking for ways to simplify.
From what I see, we can remove all the logic around max_seq_len.
The git blame shows that setting max_seq_len was done in the initial PR to support Pixtral, but a follow-on hotfix PR added max_position_embeddings to actually get it working. AFAICT, the Pixtral models (and all Mistral models) use max_position_embeddings since the text model is Llama based.
Since the goal here is to fall back to the HF config for the missing keys, instead of parsing config.json as a raw dict, it would be more complete to use get_config with ConfigFormat.HF. This adds an additional default value of max_position_embeddings from the config class if it is missing from config.json. Doing this also enables using hf_config.get_text_config() to simplify accessing the langauge model configuration.
vllm/transformers_utils/config.py
Outdated
There was a problem hiding this comment.
This currently always loads and inspects the config.json. Instead, we should only do this extra lookup if config_dict is missing max_position_embeddings.
Signed-off-by: 汪志鹏 <wangzhipeng628@gmail.com>
Signed-off-by: 汪志鹏 <wangzhipeng628@gmail.com>
Signed-off-by: 汪志鹏 <wangzhipeng628@gmail.com>
Signed-off-by: 汪志鹏 <wangzhipeng628@gmail.com>
Signed-off-by: 汪志鹏 <wangzhipeng628@gmail.com>
Signed-off-by: reidliu41 <reid201711@gmail.com> Co-authored-by: reidliu41 <reid201711@gmail.com> Signed-off-by: 汪志鹏 <wangzhipeng628@gmail.com>
vllm-project#17139) Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com> Signed-off-by: 汪志鹏 <wangzhipeng628@gmail.com>
Signed-off-by: Christian Heimes <christian@python.org> Signed-off-by: 汪志鹏 <wangzhipeng628@gmail.com>
Signed-off-by: Isotr0py <2037008807@qq.com> Signed-off-by: 汪志鹏 <wangzhipeng628@gmail.com>
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> Signed-off-by: 汪志鹏 <wangzhipeng628@gmail.com>
…project#17793) Signed-off-by: Isotr0py <2037008807@qq.com> Signed-off-by: 汪志鹏 <wangzhipeng628@gmail.com>
Signed-off-by: 汪志鹏 <wangzhipeng628@gmail.com>
…llm-project#14238) Signed-off-by: Akshat Tripathi <akshat@krai.ai> Signed-off-by: Chengji Yao <chengjiyao@google.com> Co-authored-by: Chengji Yao <chengjiyao@google.com> Signed-off-by: 汪志鹏 <wangzhipeng628@gmail.com>
…llm-project#17811) Signed-off-by: Nick Hill <nhill@redhat.com> Signed-off-by: 汪志鹏 <wangzhipeng628@gmail.com>
Signed-off-by: Wallas Santos <wallashss@ibm.com> Signed-off-by: 汪志鹏 <wangzhipeng628@gmail.com>
…t#17815) Signed-off-by: Aaron Pham <contact@aarnphm.xyz> Signed-off-by: 汪志鹏 <wangzhipeng628@gmail.com>
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> Signed-off-by: 汪志鹏 <wangzhipeng628@gmail.com>
Signed-off-by: Chanh Nguyen <cnguyen@linkedin.com> Co-authored-by: Chanh Nguyen <cnguyen@linkedin.com> Signed-off-by: 汪志鹏 <wangzhipeng628@gmail.com>
Signed-off-by: Vadim Markovtsev <vadim@poolside.ai> Signed-off-by: 汪志鹏 <wangzhipeng628@gmail.com>
Signed-off-by: Mikhail Podvitskii <podvitskiymichael@gmail.com> Signed-off-by: 汪志鹏 <wangzhipeng628@gmail.com>
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> Signed-off-by: 汪志鹏 <wangzhipeng628@gmail.com>
…oject#16362) Signed-off-by: 汪志鹏 <wangzhipeng628@gmail.com>
…llm-project#17913) Signed-off-by: Mark McLoughlin <markmc@redhat.com> Signed-off-by: 汪志鹏 <wangzhipeng628@gmail.com>
Signed-off-by: 汪志鹏 <wangzhipeng628@gmail.com>
FIX #17747
Description of the Bug:
When loading models using the "mistral" format (
--config-format mistral), theload_params_configfunction is invoked. This function prioritizes loading model configuration from aparams.jsonfile. For certain models, such asmistralai/Mistral-Small-3.1-24B-Instruct-2503, theparams.jsonfile does not contain explicit values formax_seq_lenormax_position_embeddings.In such cases, the original
load_params_configfunction would apply a hardcoded default value of128,000for bothmax_seq_lenandmax_position_embeddings. This occurred even if the standard Hugging Faceconfig.jsonfile for the model specified a different (and correct) value for these parameters (e.g.,max_position_embeddings: 131072in thetext_configofMistral-Small-3.1).This discrepancy led to vLLM deriving an incorrect maximum model length (128,000), triggering warnings if the user specified a
--max-model-lencloser to the true model capacity, and potentially causing runtime errors or incorrect behavior when processing sequences longer than this erroneously derived limit.Solution:
This PR modifies the
load_params_configfunction to implement a more robust defaulting mechanism formax_seq_lenandmax_position_embeddingswhen the "mistral" format is used:params.jsonfile.params.json:config.jsonfor the same model.config.json(looking first within atext_configdictionary, then at the top level) formax_position_embeddingsandmax_seq_len.config.json, these values are used as the defaults.params.jsonandconfig.jsonwill the hardcoded fallback of128,000be applied. (Formax_seq_len, if it's missing butmax_position_embeddingswas determined fromconfig.json,max_seq_lenwill default to that determinedmax_position_embeddingsvalue before falling back to128,000).This change ensures that models like
mistralai/Mistral-Small-3.1-24B-Instruct-2503, which have accurate length information in their standardconfig.jsonbut not in theirparams.json, will have their maximum sequence lengths correctly determined by vLLM. This resolves the misleading warnings and allows users to utilize the model's true context capacity. The fix maintains backward compatibility by respectingparams.jsoncontent first and retaining the ultimate fallback if no configuration is found.Here is the execute result after the bug fix:

Signed-off-by: princepride princepride@gmail.com