[BugFix] KeyError when loading Mistral/vision-enabled checkpoints by ms1design · Pull Request #33008 · vllm-project/vllm

ms1design · 2026-01-24T18:25:39Z

Purpose

This PR fixes a bug introduced by PR #32780 where loading Mistral/vision-enabled checkpoints (e.g., mistralai/Devstral-Small-2-24B-Instruct-2512) fails with a KeyError: 'merging_layer.weight'.

The refactor in PR #32780 moved Mistral-specific code from llama.py to a new mistral.py file, which is correct. However, the pixtral.py file had unsafe dictionary accesses that caused KeyErrors when loading checkpoints containing multi_modal_projector.patch_merger weights.

Root Cause

In pixtral.py's load_weights method, the code used direct dictionary access without null checks:

param = patch_merger_dict[trimmed_name]  # Could raise KeyError

This caused failures when:

The checkpoint contained multi_modal_projector.patch_merger.merging_layer.weight keys
The model's patch_merger module didn't have the expected parameter (or was None)
The is_patch_merger function didn't recognize multi_modal_projector.patch_merger prefix

Changes

1. Fixed unsafe dictionary access for patch_merger

Changed from direct access to safe access with null check:

param = patch_merger_dict.get(trimmed_name)
if param is not None:
    with torch.no_grad():
        default_weight_loader(param, w)

2. Fixed unsafe dictionary access for pre_mm_projector_norm

Applied the same fix to pre_mm_projector_norm_dict access.

3. Improved patch_merger weight detection

Updated is_patch_merger function to recognize both prefixes:

def is_patch_merger(weight: tuple[str, torch.Tensor]):
    return weight[0].startswith(
        ("patch_merger", "multi_modal_projector.patch_merger")
    )

Test Plan

Reproduce the original issue:

vllm serve mistralai/Devstral-Small-2-24B-Instruct-2512 \
  --tokenizer-mode mistral \
  --config-format mistral \
  --load-format mistral \
  --dtype half

This should fail with KeyError: 'merging_layer.weight' before the fix.

Verify the fix:
- The same command should now work without errors
- Loading should complete successfully and the model should be ready for inference
- Multimodality should work during inference time (eg. with mistralai/Devstral-Small-2-24B-Instruct-2512 model)
Test with other Mistral models:
- Test with regular Mistral models (without vision) to ensure no regression
- Test with other multimodal Mistral models if available

Test Result

✅ Before fix: Command fails with KeyError: 'merging_layer.weight'
✅ After fix: Command completes successfully, model loads and is ready for inference
✅ Regression test: Regular Mistral models still load correctly
✅ Multimodality: Works, we can use images in input to the model (tested with mistralai/Devstral-Small-2-24B-Instruct-2512)

Related Issues

Essential Elements of an Effective PR Description Checklist

The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
The test plan, such as providing test command.
The test results, such as pasting the results comparison before and after, or e2e results
(Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
(Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

gemini-code-assist

Code Review

This pull request effectively resolves a KeyError that occurred when loading certain Mistral vision-enabled checkpoints. The root cause was correctly identified as unsafe dictionary access in pixtral.py. The fix, which involves replacing direct dictionary access with the safer .get() method and adding None checks, is appropriate and well-implemented for both patch_merger and pre_mm_projector_norm weights. Additionally, the logic for handling different weight prefixes has been correctly updated. The changes are robust and directly address the reported bug without introducing any new issues. The code is now more resilient to variations in checkpoint weight names.

Signed-off-by: Mieszko Syty <mieszko@ms1design.pl>

ms1design · 2026-01-27T21:22:40Z

Resolved in recent commits.

juliendenize · 2026-01-30T09:16:33Z

I can indeed reproduce @dbari for ML3 @ms1design can you reopen your PR ? Else I can take care of it.

Edit: nvm the loading error does not come from the weight names but when vision is disabled (limit image = 0) but the model has vision weights. I'll push a fix.

ms1design · 2026-01-30T14:17:02Z

@juliendenize Hey, good catch with that, let me know if this ticket would be handy for you to carry on.

ms1design requested a review from patrickvonplaten as a code owner January 24, 2026 18:25

ms1design mentioned this pull request Jan 24, 2026

[Bug]: KeyError: merging_layer.weight when loading Mistral/vision-enabled checkpoints after PR #32780 refactor #32959

Closed

1 task

mergify bot added the bug Something isn't working label Jan 24, 2026

ms1design mentioned this pull request Jan 24, 2026

[BugFix] KeyError when loading Mistral/vision-enabled checkpoints after PR #32780 #33006

Closed

5 tasks

gemini-code-assist bot reviewed Jan 24, 2026

View reviewed changes

[BugFix] KeyError when loading Mistral/vision-enabled checkpoints

bbd3a2a

Signed-off-by: Mieszko Syty <mieszko@ms1design.pl>

ms1design force-pushed the bugfix/fix-mistral-multimodal-checkpoint branch from c4e570a to bbd3a2a Compare January 24, 2026 18:29

dbari mentioned this pull request Jan 27, 2026

Add support for Mistral Large 3 inference with Flashinfer MoE #33174

Merged

ms1design closed this Jan 27, 2026

juliendenize mentioned this pull request Jan 30, 2026

[BUGFIX] Pixtral cannot be loaded with --limit-mm-per-prompt 0 #33406

Merged

3 tasks

ms1design reopened this Jan 30, 2026

ms1design closed this Jan 30, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[BugFix] KeyError when loading Mistral/vision-enabled checkpoints#33008

[BugFix] KeyError when loading Mistral/vision-enabled checkpoints#33008
ms1design wants to merge 1 commit intovllm-project:mainfrom
ms1design:bugfix/fix-mistral-multimodal-checkpoint

ms1design commented Jan 24, 2026 •

edited by github-actions bot

Loading

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

ms1design commented Jan 27, 2026

Uh oh!

juliendenize commented Jan 30, 2026 •

edited

Loading

Uh oh!

ms1design commented Jan 30, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

ms1design commented Jan 24, 2026 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Root Cause

Changes

1. Fixed unsafe dictionary access for patch_merger

2. Fixed unsafe dictionary access for pre_mm_projector_norm

3. Improved patch_merger weight detection

Test Plan

Test Result

Related Issues

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

ms1design commented Jan 27, 2026

Uh oh!

juliendenize commented Jan 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ms1design commented Jan 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

ms1design commented Jan 24, 2026 •

edited by github-actions bot

Loading

juliendenize commented Jan 30, 2026 •

edited

Loading

ms1design commented Jan 30, 2026 •

edited

Loading