Skip to content

[BugFix] KeyError when loading Mistral/vision-enabled checkpoints#33008

Closed
ms1design wants to merge 1 commit intovllm-project:mainfrom
ms1design:bugfix/fix-mistral-multimodal-checkpoint
Closed

[BugFix] KeyError when loading Mistral/vision-enabled checkpoints#33008
ms1design wants to merge 1 commit intovllm-project:mainfrom
ms1design:bugfix/fix-mistral-multimodal-checkpoint

Conversation

@ms1design
Copy link
Copy Markdown

@ms1design ms1design commented Jan 24, 2026

Purpose

This PR fixes a bug introduced by PR #32780 where loading Mistral/vision-enabled checkpoints (e.g., mistralai/Devstral-Small-2-24B-Instruct-2512) fails with a KeyError: 'merging_layer.weight'.

The refactor in PR #32780 moved Mistral-specific code from llama.py to a new mistral.py file, which is correct. However, the pixtral.py file had unsafe dictionary accesses that caused KeyErrors when loading checkpoints containing multi_modal_projector.patch_merger weights.

Root Cause

In pixtral.py's load_weights method, the code used direct dictionary access without null checks:

param = patch_merger_dict[trimmed_name]  # Could raise KeyError

This caused failures when:

  1. The checkpoint contained multi_modal_projector.patch_merger.merging_layer.weight keys
  2. The model's patch_merger module didn't have the expected parameter (or was None)
  3. The is_patch_merger function didn't recognize multi_modal_projector.patch_merger prefix

Changes

1. Fixed unsafe dictionary access for patch_merger

Changed from direct access to safe access with null check:

param = patch_merger_dict.get(trimmed_name)
if param is not None:
    with torch.no_grad():
        default_weight_loader(param, w)

2. Fixed unsafe dictionary access for pre_mm_projector_norm

Applied the same fix to pre_mm_projector_norm_dict access.

3. Improved patch_merger weight detection

Updated is_patch_merger function to recognize both prefixes:

def is_patch_merger(weight: tuple[str, torch.Tensor]):
    return weight[0].startswith(
        ("patch_merger", "multi_modal_projector.patch_merger")
    )

Test Plan

  1. Reproduce the original issue:

    vllm serve mistralai/Devstral-Small-2-24B-Instruct-2512 \
      --tokenizer-mode mistral \
      --config-format mistral \
      --load-format mistral \
      --dtype half

    This should fail with KeyError: 'merging_layer.weight' before the fix.

  2. Verify the fix:

    • The same command should now work without errors
    • Loading should complete successfully and the model should be ready for inference
    • Multimodality should work during inference time (eg. with mistralai/Devstral-Small-2-24B-Instruct-2512 model)
  3. Test with other Mistral models:

    • Test with regular Mistral models (without vision) to ensure no regression
    • Test with other multimodal Mistral models if available

Test Result

Before fix: Command fails with KeyError: 'merging_layer.weight'
After fix: Command completes successfully, model loads and is ready for inference
Regression test: Regular Mistral models still load correctly
Multimodality: Works, we can use images in input to the model (tested with mistralai/Devstral-Small-2-24B-Instruct-2512)

Related Issues


Essential Elements of an Effective PR Description Checklist
  • The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
  • The test plan, such as providing test command.
  • The test results, such as pasting the results comparison before and after, or e2e results
  • (Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
  • (Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request effectively resolves a KeyError that occurred when loading certain Mistral vision-enabled checkpoints. The root cause was correctly identified as unsafe dictionary access in pixtral.py. The fix, which involves replacing direct dictionary access with the safer .get() method and adding None checks, is appropriate and well-implemented for both patch_merger and pre_mm_projector_norm weights. Additionally, the logic for handling different weight prefixes has been correctly updated. The changes are robust and directly address the reported bug without introducing any new issues. The code is now more resilient to variations in checkpoint weight names.

Signed-off-by: Mieszko Syty <mieszko@ms1design.pl>
@ms1design
Copy link
Copy Markdown
Author

Resolved in recent commits.

@ms1design ms1design closed this Jan 27, 2026
@juliendenize
Copy link
Copy Markdown
Contributor

juliendenize commented Jan 30, 2026

I can indeed reproduce @dbari for ML3 @ms1design can you reopen your PR ? Else I can take care of it.

Edit: nvm the loading error does not come from the weight names but when vision is disabled (limit image = 0) but the model has vision weights. I'll push a fix.

@ms1design
Copy link
Copy Markdown
Author

ms1design commented Jan 30, 2026

@juliendenize Hey, good catch with that, let me know if this ticket would be handy for you to carry on.

@ms1design ms1design closed this Jan 30, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug]: KeyError: merging_layer.weight when loading Mistral/vision-enabled checkpoints after PR #32780 refactor

2 participants