[BugFix] KeyError when loading Mistral/vision-enabled checkpoints after PR #32780#33006
Conversation
|
👋 Hi! Thank you for contributing to the vLLM project. 💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels. Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run You ask your reviewers to trigger select CI tests on top of Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging. To run CI, PR reviewers can either: Add If you have any questions, please reach out to us on Slack at https://slack.vllm.ai. 🚀 |
aa240d4 to
a4edf60
Compare
There was a problem hiding this comment.
Code Review
This pull request aims to fix a KeyError when loading certain Mistral vision-enabled checkpoints. The changes involve using safe dictionary access (.get()) instead of direct key access and expanding the weight name detection for patch_merger. While these changes successfully prevent the KeyError crash, I've identified a critical issue where the updated logic will silently fail to load weights with the multi_modal_projector.patch_merger prefix due to incorrect name trimming. My review includes a detailed comment on this issue and how to resolve it.
|
Could you elaborate? |
Thanks for checking @DarkLight1337 , this is good question for @patrickvonplaten who authored the refactor in #32780 :) From my understanding of the current Mistral implementation, looking at lines 91-106 in vllm/vllm/transformers_utils/configs/mistral.py Lines 76 to 77 in da5e7b1 The
vllm/vllm/transformers_utils/configs/mistral.py Lines 91 to 106 in da5e7b1 |
…llm-project#32780 The refactor in PR vllm-project#32780 moved Mistral-specific code to mistral.py, but pixtral.py had unsafe dictionary accesses that caused KeyError when loading checkpoints with multi_modal_projector.patch_merger weights. Changes: - Use .get() instead of direct access for patch_merger_dict - Use .get() instead of direct access for pre_mm_projector_norm_dict - Improved is_patch_merger() to recognize multi_modal_projector.patch_merger prefix - Added null checks to gracefully handle missing weights Fixes vllm-project#32959 Signed-off-by: Mieszko Syty <mieszko@ms1design.pl>
Signed-off-by: Lukas Geiger <lukas.geiger94@gmail.com> Signed-off-by: Mieszko Syty <mieszko@ms1design.pl>
…modes (vllm-project#32842) Signed-off-by: Hongjian Zhang <zhanghongjian@xiaohongshu.com> Signed-off-by: Xingran Wang <wangxingran123456@outlook.com> Co-authored-by: Xingran Wang <wangxingran123456@outlook.com> Signed-off-by: Mieszko Syty <mieszko@ms1design.pl>
The logic for trimmed_name assumed a single-component prefix, producing incorrect names like 'patch_merger.merging_layer.weight' instead of 'merging_layer.weight', causing weights to not load. Updated the trimming logic to handle multi-component prefixes: - multi_modal_projector.patch_merger → skip 2 components - patch_merger → skip 1 component Fixes vllm-project#32959 Signed-off-by: Mieszko Syty <mieszko@ms1design.pl>
Signed-off-by: Tsai, Louie <louie.tsai@intel.com> Signed-off-by: Louie Tsai <louie.tsai@intel.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Signed-off-by: Mieszko Syty <mieszko@ms1design.pl>
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com> Signed-off-by: Mieszko Syty <mieszko@ms1design.pl>
…llm-project#32986) Signed-off-by: 7. Sun <jhao.sun@gmail.com> Signed-off-by: Mieszko Syty <mieszko@ms1design.pl>
5ff74c8 to
f94a373
Compare
|
Documentation preview: https://vllm--33006.org.readthedocs.build/en/33006/ |
|
Moved changes to #33008 |
Purpose
This PR fixes a bug introduced by PR #32780 where loading Mistral/vision-enabled checkpoints (e.g.,
mistralai/Devstral-Small-2-24B-Instruct-2512) fails with aKeyError: 'merging_layer.weight'.The refactor in PR #32780 moved Mistral-specific code from
llama.pyto a newmistral.pyfile, which is correct. However, thepixtral.pyfile had unsafe dictionary accesses that caused KeyErrors when loading checkpoints containingmulti_modal_projector.patch_mergerweights.Root Cause
In
pixtral.py'sload_weightsmethod, the code used direct dictionary access without null checks:This caused failures when:
multi_modal_projector.patch_merger.merging_layer.weightkeyspatch_mergermodule didn't have the expected parameter (or was None)is_patch_mergerfunction didn't recognizemulti_modal_projector.patch_mergerprefixChanges
1. Fixed unsafe dictionary access for patch_merger
Changed from direct access to safe access with null check:
2. Fixed unsafe dictionary access for pre_mm_projector_norm
Applied the same fix to
pre_mm_projector_norm_dictaccess.3. Improved patch_merger weight detection
Updated
is_patch_mergerfunction to recognize both prefixes:Test Plan
Reproduce the original issue:
This should fail with
KeyError: 'merging_layer.weight'before the fix.Verify the fix:
mistralai/Devstral-Small-2-24B-Instruct-2512model)Test with other Mistral models:
Test Result
✅ Before fix: Command fails with
KeyError: 'merging_layer.weight'✅ After fix: Command completes successfully, model loads and is ready for inference
✅ Regression test: Regular Mistral models still load correctly
✅ Multimodality: Works, we can use images in input to the model (tested with
mistralai/Devstral-Small-2-24B-Instruct-2512)Related Issues
merging_layer.weightwhen loading Mistral/vision-enabled checkpoints after PR #32780 refactor #32959Essential Elements of an Effective PR Description Checklist
supported_models.mdandexamplesfor a new model.