Skip to content

[BugFix] KeyError when loading Mistral/vision-enabled checkpoints after PR #32780#33006

Closed
ms1design wants to merge 7 commits intovllm-project:mainfrom
ms1design:bugfix/mistral-multimodal-checkpoints-loading-fix
Closed

[BugFix] KeyError when loading Mistral/vision-enabled checkpoints after PR #32780#33006
ms1design wants to merge 7 commits intovllm-project:mainfrom
ms1design:bugfix/mistral-multimodal-checkpoints-loading-fix

Conversation

@ms1design
Copy link
Copy Markdown

@ms1design ms1design commented Jan 24, 2026

Purpose

This PR fixes a bug introduced by PR #32780 where loading Mistral/vision-enabled checkpoints (e.g., mistralai/Devstral-Small-2-24B-Instruct-2512) fails with a KeyError: 'merging_layer.weight'.

The refactor in PR #32780 moved Mistral-specific code from llama.py to a new mistral.py file, which is correct. However, the pixtral.py file had unsafe dictionary accesses that caused KeyErrors when loading checkpoints containing multi_modal_projector.patch_merger weights.

Root Cause

In pixtral.py's load_weights method, the code used direct dictionary access without null checks:

param = patch_merger_dict[trimmed_name]  # Could raise KeyError

This caused failures when:

  1. The checkpoint contained multi_modal_projector.patch_merger.merging_layer.weight keys
  2. The model's patch_merger module didn't have the expected parameter (or was None)
  3. The is_patch_merger function didn't recognize multi_modal_projector.patch_merger prefix

Changes

1. Fixed unsafe dictionary access for patch_merger

Changed from direct access to safe access with null check:

param = patch_merger_dict.get(trimmed_name)
if param is not None:
    with torch.no_grad():
        default_weight_loader(param, w)

2. Fixed unsafe dictionary access for pre_mm_projector_norm

Applied the same fix to pre_mm_projector_norm_dict access.

3. Improved patch_merger weight detection

Updated is_patch_merger function to recognize both prefixes:

def is_patch_merger(weight: tuple[str, torch.Tensor]):
    return weight[0].startswith(
        ("patch_merger", "multi_modal_projector.patch_merger")
    )

Test Plan

  1. Reproduce the original issue:

    vllm serve mistralai/Devstral-Small-2-24B-Instruct-2512 \
      --tokenizer-mode mistral \
      --config-format mistral \
      --load-format mistral \
      --dtype half

    This should fail with KeyError: 'merging_layer.weight' before the fix.

  2. Verify the fix:

    • The same command should now work without errors
    • Loading should complete successfully and the model should be ready for inference
    • Multimodality should work during inference time (eg. with mistralai/Devstral-Small-2-24B-Instruct-2512 model)
  3. Test with other Mistral models:

    • Test with regular Mistral models (without vision) to ensure no regression
    • Test with other multimodal Mistral models if available

Test Result

Before fix: Command fails with KeyError: 'merging_layer.weight'
After fix: Command completes successfully, model loads and is ready for inference
Regression test: Regular Mistral models still load correctly
Multimodality: Works, we can use images in input to the model (tested with mistralai/Devstral-Small-2-24B-Instruct-2512)

Related Issues


Essential Elements of an Effective PR Description Checklist
  • The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
  • The test plan, such as providing test command.
  • The test results, such as pasting the results comparison before and after, or e2e results
  • (Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
  • (Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

@github-actions
Copy link
Copy Markdown

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors.

You ask your reviewers to trigger select CI tests on top of fastcheck CI.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

If you have any questions, please reach out to us on Slack at https://slack.vllm.ai.

🚀

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request aims to fix a KeyError when loading certain Mistral vision-enabled checkpoints. The changes involve using safe dictionary access (.get()) instead of direct key access and expanding the weight name detection for patch_merger. While these changes successfully prevent the KeyError crash, I've identified a critical issue where the updated logic will silently fail to load weights with the multi_modal_projector.patch_merger prefix due to incorrect name trimming. My review includes a detailed comment on this issue and how to resolve it.

@DarkLight1337
Copy link
Copy Markdown
Member

DarkLight1337 commented Jan 24, 2026

Could you elaborate? Devstral-Small-2-24B-Instruct-2512 should be using Mistral3 instead of Pixtral, but it looks like you edited Pixtral modeling file.

@ms1design ms1design changed the title Fix KeyError when loading Mistral/vision-enabled checkpoints after PR #32780 [BugFix] KeyError when loading Mistral/vision-enabled checkpoints after PR #32780 Jan 24, 2026
@mergify mergify bot added the bug Something isn't working label Jan 24, 2026
@ms1design
Copy link
Copy Markdown
Author

Could you elaborate? Devstral-Small-2-24B-Instruct-2512 should be using Mistral3 instead of Pixtral, but it looks like you edited Pixtral modeling file.

Thanks for checking @DarkLight1337 , this is good question for @patrickvonplaten who authored the refactor in #32780 :)

From my understanding of the current Mistral implementation, looking at lines 91-106 in vllm/transformers_utils/configs/mistral.py, when a Mistral model has vision capabilities (detected via vision_encoder or multimodal.vision_encoder_args), the config is remapped to use Pixtral:

if is_vision:
config_dict = _remap_mistral_vision_args(config_dict)

The _remap_mistral_vision_args function (lines 91-106) changes:

  • model_type from "mistral" to "pixtral"
  • architectures from ["MistralForCausalLM"] to ["PixtralForConditionalGeneration"]

def _remap_mistral_vision_args(config: dict) -> dict:
if config.get("multimodal"):
vision_config = config.pop("multimodal")
else:
vision_config = config.pop("vision_encoder")
quant_config = config.get("quantization_config")
config = {
"model_type": "pixtral",
"architectures": ["PixtralForConditionalGeneration"],
"text_config": PretrainedConfig.from_dict(config),
"vision_config": PretrainedConfig.from_dict(vision_config),
}
if quant_config:
config["quantization_config"] = quant_config
return config

ms1design and others added 7 commits January 24, 2026 19:17
…llm-project#32780

The refactor in PR vllm-project#32780 moved Mistral-specific code to mistral.py,

but pixtral.py had unsafe dictionary accesses that caused KeyError

when loading checkpoints with multi_modal_projector.patch_merger weights.

Changes:

- Use .get() instead of direct access for patch_merger_dict

- Use .get() instead of direct access for pre_mm_projector_norm_dict

- Improved is_patch_merger() to recognize multi_modal_projector.patch_merger prefix

- Added null checks to gracefully handle missing weights

Fixes vllm-project#32959

Signed-off-by: Mieszko Syty <mieszko@ms1design.pl>
Signed-off-by: Lukas Geiger <lukas.geiger94@gmail.com>
Signed-off-by: Mieszko Syty <mieszko@ms1design.pl>
…modes (vllm-project#32842)

Signed-off-by: Hongjian Zhang <zhanghongjian@xiaohongshu.com>
Signed-off-by: Xingran Wang <wangxingran123456@outlook.com>
Co-authored-by: Xingran Wang <wangxingran123456@outlook.com>
Signed-off-by: Mieszko Syty <mieszko@ms1design.pl>
The logic for trimmed_name assumed a single-component prefix,

producing incorrect names like 'patch_merger.merging_layer.weight'

instead of 'merging_layer.weight', causing weights to not load.

Updated the trimming logic to handle multi-component prefixes:

- multi_modal_projector.patch_merger → skip 2 components

- patch_merger → skip 1 component

Fixes vllm-project#32959

Signed-off-by: Mieszko Syty <mieszko@ms1design.pl>
Signed-off-by: Tsai, Louie <louie.tsai@intel.com>
Signed-off-by: Louie Tsai <louie.tsai@intel.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Signed-off-by: Mieszko Syty <mieszko@ms1design.pl>
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
Signed-off-by: Mieszko Syty <mieszko@ms1design.pl>
…llm-project#32986)

Signed-off-by: 7. Sun <jhao.sun@gmail.com>
Signed-off-by: Mieszko Syty <mieszko@ms1design.pl>
@mergify
Copy link
Copy Markdown

mergify bot commented Jan 24, 2026

Documentation preview: https://vllm--33006.org.readthedocs.build/en/33006/

@mergify mergify bot added documentation Improvements or additions to documentation cpu Related to CPU backends v1 labels Jan 24, 2026
@ms1design ms1design closed this Jan 24, 2026
@ms1design ms1design deleted the bugfix/mistral-multimodal-checkpoints-loading-fix branch January 24, 2026 18:24
@ms1design
Copy link
Copy Markdown
Author

Moved changes to #33008

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working cpu Related to CPU backends documentation Improvements or additions to documentation v1

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug]: KeyError: merging_layer.weight when loading Mistral/vision-enabled checkpoints after PR #32780 refactor

7 participants