[BugFix] KeyError when loading Mistral/vision-enabled checkpoints after PR #32780 by ms1design · Pull Request #33006 · vllm-project/vllm

ms1design · 2026-01-24T15:26:49Z

Purpose

This PR fixes a bug introduced by PR #32780 where loading Mistral/vision-enabled checkpoints (e.g., mistralai/Devstral-Small-2-24B-Instruct-2512) fails with a KeyError: 'merging_layer.weight'.

The refactor in PR #32780 moved Mistral-specific code from llama.py to a new mistral.py file, which is correct. However, the pixtral.py file had unsafe dictionary accesses that caused KeyErrors when loading checkpoints containing multi_modal_projector.patch_merger weights.

Root Cause

In pixtral.py's load_weights method, the code used direct dictionary access without null checks:

param = patch_merger_dict[trimmed_name]  # Could raise KeyError

This caused failures when:

The checkpoint contained multi_modal_projector.patch_merger.merging_layer.weight keys
The model's patch_merger module didn't have the expected parameter (or was None)
The is_patch_merger function didn't recognize multi_modal_projector.patch_merger prefix

Changes

1. Fixed unsafe dictionary access for patch_merger

Changed from direct access to safe access with null check:

param = patch_merger_dict.get(trimmed_name)
if param is not None:
    with torch.no_grad():
        default_weight_loader(param, w)

2. Fixed unsafe dictionary access for pre_mm_projector_norm

Applied the same fix to pre_mm_projector_norm_dict access.

3. Improved patch_merger weight detection

Updated is_patch_merger function to recognize both prefixes:

def is_patch_merger(weight: tuple[str, torch.Tensor]):
    return weight[0].startswith(
        ("patch_merger", "multi_modal_projector.patch_merger")
    )

Test Plan

Reproduce the original issue:

vllm serve mistralai/Devstral-Small-2-24B-Instruct-2512 \
  --tokenizer-mode mistral \
  --config-format mistral \
  --load-format mistral \
  --dtype half

This should fail with KeyError: 'merging_layer.weight' before the fix.

Verify the fix:
- The same command should now work without errors
- Loading should complete successfully and the model should be ready for inference
- Multimodality should work during inference time (eg. with mistralai/Devstral-Small-2-24B-Instruct-2512 model)
Test with other Mistral models:
- Test with regular Mistral models (without vision) to ensure no regression
- Test with other multimodal Mistral models if available

Test Result

✅ Before fix: Command fails with KeyError: 'merging_layer.weight'
✅ After fix: Command completes successfully, model loads and is ready for inference
✅ Regression test: Regular Mistral models still load correctly
✅ Multimodality: Works, we can use images in input to the model (tested with mistralai/Devstral-Small-2-24B-Instruct-2512)

Related Issues

Essential Elements of an Effective PR Description Checklist

The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
The test plan, such as providing test command.
The test results, such as pasting the results comparison before and after, or e2e results
(Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
(Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

github-actions · 2026-01-24T15:26:58Z

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors.

You ask your reviewers to trigger select CI tests on top of fastcheck CI.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

If you have any questions, please reach out to us on Slack at https://slack.vllm.ai.

🚀

gemini-code-assist

Code Review

This pull request aims to fix a KeyError when loading certain Mistral vision-enabled checkpoints. The changes involve using safe dictionary access (.get()) instead of direct key access and expanding the weight name detection for patch_merger. While these changes successfully prevent the KeyError crash, I've identified a critical issue where the updated logic will silently fail to load weights with the multi_modal_projector.patch_merger prefix due to incorrect name trimming. My review includes a detailed comment on this issue and how to resolve it.

vllm/model_executor/models/pixtral.py

DarkLight1337 · 2026-01-24T15:55:35Z

Could you elaborate? Devstral-Small-2-24B-Instruct-2512 should be using Mistral3 instead of Pixtral, but it looks like you edited Pixtral modeling file.

ms1design · 2026-01-24T16:13:28Z

Could you elaborate? Devstral-Small-2-24B-Instruct-2512 should be using Mistral3 instead of Pixtral, but it looks like you edited Pixtral modeling file.

Thanks for checking @DarkLight1337 , this is good question for @patrickvonplaten who authored the refactor in #32780 :)

From my understanding of the current Mistral implementation, looking at lines 91-106 in vllm/transformers_utils/configs/mistral.py, when a Mistral model has vision capabilities (detected via vision_encoder or multimodal.vision_encoder_args), the config is remapped to use Pixtral:

vllm/vllm/transformers_utils/configs/mistral.py

Lines 76 to 77 in da5e7b1

    
           if is_vision: 
        
               config_dict = _remap_mistral_vision_args(config_dict)

The _remap_mistral_vision_args function (lines 91-106) changes:

model_type from "mistral" to "pixtral"
architectures from ["MistralForCausalLM"] to ["PixtralForConditionalGeneration"]

vllm/vllm/transformers_utils/configs/mistral.py

Lines 91 to 106 in da5e7b1

    
           def _remap_mistral_vision_args(config: dict) -> dict: 
        
               if config.get("multimodal"): 
        
                   vision_config = config.pop("multimodal") 
        
               else: 
        
                   vision_config = config.pop("vision_encoder") 
        
               quant_config = config.get("quantization_config") 
        
               config = { 
        
                   "model_type": "pixtral", 
        
                   "architectures": ["PixtralForConditionalGeneration"], 
        
                   "text_config": PretrainedConfig.from_dict(config), 
        
                   "vision_config": PretrainedConfig.from_dict(vision_config), 
        
               } 
        
               if quant_config: 
        
                   config["quantization_config"] = quant_config 
        
               return config

…llm-project#32780 The refactor in PR vllm-project#32780 moved Mistral-specific code to mistral.py, but pixtral.py had unsafe dictionary accesses that caused KeyError when loading checkpoints with multi_modal_projector.patch_merger weights. Changes: - Use .get() instead of direct access for patch_merger_dict - Use .get() instead of direct access for pre_mm_projector_norm_dict - Improved is_patch_merger() to recognize multi_modal_projector.patch_merger prefix - Added null checks to gracefully handle missing weights Fixes vllm-project#32959 Signed-off-by: Mieszko Syty <mieszko@ms1design.pl>

Signed-off-by: Lukas Geiger <lukas.geiger94@gmail.com> Signed-off-by: Mieszko Syty <mieszko@ms1design.pl>

…modes (vllm-project#32842) Signed-off-by: Hongjian Zhang <zhanghongjian@xiaohongshu.com> Signed-off-by: Xingran Wang <wangxingran123456@outlook.com> Co-authored-by: Xingran Wang <wangxingran123456@outlook.com> Signed-off-by: Mieszko Syty <mieszko@ms1design.pl>

The logic for trimmed_name assumed a single-component prefix, producing incorrect names like 'patch_merger.merging_layer.weight' instead of 'merging_layer.weight', causing weights to not load. Updated the trimming logic to handle multi-component prefixes: - multi_modal_projector.patch_merger → skip 2 components - patch_merger → skip 1 component Fixes vllm-project#32959 Signed-off-by: Mieszko Syty <mieszko@ms1design.pl>

Signed-off-by: Tsai, Louie <louie.tsai@intel.com> Signed-off-by: Louie Tsai <louie.tsai@intel.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Signed-off-by: Mieszko Syty <mieszko@ms1design.pl>

Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com> Signed-off-by: Mieszko Syty <mieszko@ms1design.pl>

…llm-project#32986) Signed-off-by: 7. Sun <jhao.sun@gmail.com> Signed-off-by: Mieszko Syty <mieszko@ms1design.pl>

mergify · 2026-01-24T18:18:16Z

Documentation preview: https://vllm--33006.org.readthedocs.build/en/33006/

ms1design · 2026-01-24T18:26:48Z

Moved changes to #33008

ms1design requested a review from patrickvonplaten as a code owner January 24, 2026 15:26

ms1design force-pushed the bugfix/mistral-multimodal-checkpoints-loading-fix branch from aa240d4 to a4edf60 Compare January 24, 2026 15:31

ms1design mentioned this pull request Jan 24, 2026

[Bug]: KeyError: merging_layer.weight when loading Mistral/vision-enabled checkpoints after PR #32780 refactor #32959

Closed

1 task

gemini-code-assist bot reviewed Jan 24, 2026

View reviewed changes

vllm/model_executor/models/pixtral.py Show resolved Hide resolved

ms1design changed the title ~~Fix KeyError when loading Mistral/vision-enabled checkpoints after PR #32780~~ [BugFix] KeyError when loading Mistral/vision-enabled checkpoints after PR #32780 Jan 24, 2026

mergify bot added the bug Something isn't working label Jan 24, 2026

ms1design and others added 7 commits January 24, 2026 19:17

[EncoderCacheManager] Remove unnecessary copy (vllm-project#32800)

70c4655

Signed-off-by: Lukas Geiger <lukas.geiger94@gmail.com> Signed-off-by: Mieszko Syty <mieszko@ms1design.pl>

[MLA] Fuse cat and qaunt for fp8 kv-cache (vllm-project#32950)

453f9f3

Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com> Signed-off-by: Mieszko Syty <mieszko@ms1design.pl>

[Tests] Replace flaky sleep with polling in test_background_cancel (v…

f94a373

…llm-project#32986) Signed-off-by: 7. Sun <jhao.sun@gmail.com> Signed-off-by: Mieszko Syty <mieszko@ms1design.pl>

ms1design force-pushed the bugfix/mistral-multimodal-checkpoints-loading-fix branch from 5ff74c8 to f94a373 Compare January 24, 2026 18:17

ms1design requested review from ApostaC, WoosukKwon, alexm-redhat, heheda12345, hmellor, houseroad, mgoin, njhill, robertgshaw2-redhat, tlrmchlsmth, yewentao256, youkaichao and ywang96 as code owners January 24, 2026 18:17

ms1design requested a review from ProExpertProg as a code owner January 24, 2026 18:17

mergify bot added documentation Improvements or additions to documentation cpu Related to CPU backends v1 labels Jan 24, 2026

ms1design closed this Jan 24, 2026

ms1design deleted the bugfix/mistral-multimodal-checkpoints-loading-fix branch January 24, 2026 18:24

juliendenize mentioned this pull request Jan 30, 2026

[BUGFIX] Pixtral cannot be loaded with --limit-mm-per-prompt 0 #33406

Merged

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[BugFix] KeyError when loading Mistral/vision-enabled checkpoints after PR #32780#33006

[BugFix] KeyError when loading Mistral/vision-enabled checkpoints after PR #32780#33006
ms1design wants to merge 7 commits intovllm-project:mainfrom
ms1design:bugfix/mistral-multimodal-checkpoints-loading-fix

ms1design commented Jan 24, 2026 •

edited by github-actions bot

Loading

Uh oh!

github-actions bot commented Jan 24, 2026

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

Uh oh!

DarkLight1337 commented Jan 24, 2026 •

edited

Loading

Uh oh!

ms1design commented Jan 24, 2026

Uh oh!

mergify bot commented Jan 24, 2026

Uh oh!

ms1design commented Jan 24, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

Uh oh!

Conversation

ms1design commented Jan 24, 2026 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Root Cause

Changes

1. Fixed unsafe dictionary access for patch_merger

2. Fixed unsafe dictionary access for pre_mm_projector_norm

3. Improved patch_merger weight detection

Test Plan

Test Result

Related Issues

Uh oh!

github-actions bot commented Jan 24, 2026

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

DarkLight1337 commented Jan 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ms1design commented Jan 24, 2026

Uh oh!

mergify bot commented Jan 24, 2026

Uh oh!

ms1design commented Jan 24, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

ms1design commented Jan 24, 2026 •

edited by github-actions bot

Loading

DarkLight1337 commented Jan 24, 2026 •

edited

Loading