Skip to content

[BUGFIX] Fix Pixtral consolidated format vision weight loading#39916

Merged
vllm-bot merged 2 commits intovllm-project:mainfrom
juliendenize:fix/pixtral-consolidated-weight-loading
Apr 20, 2026
Merged

[BUGFIX] Fix Pixtral consolidated format vision weight loading#39916
vllm-bot merged 2 commits intovllm-project:mainfrom
juliendenize:fix/pixtral-consolidated-weight-loading

Conversation

@juliendenize
Copy link
Copy Markdown
Contributor

@juliendenize juliendenize commented Apr 15, 2026

Purpose

#36963 replaced the Pixtral vision encoder's nn.Linear layers (wq/wk/wv/wo/w1/w2/w3) with QKVParallelLinear and MergedColumnParallelLinear (qkv_proj/o_proj/gate_up_proj/down_proj) to support LoRA. However, the weight loading stacked_params only mapped HF-style names (q_proj, k_proj, etc.), not Mistral native names (wq, wk, etc.), causing vision encoder weights to be silently dropped when loading consolidated-format checkpoints.

Test Plan

Added a ministral test that is run for small GPUs instead of relying on Pixtral.

Test Result

Passing.


Essential Elements of an Effective PR Description Checklist
  • The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
  • The test plan, such as providing test command.
  • The test results, such as pasting the results comparison before and after, or e2e results
  • (Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
  • (Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request adds support for Mistral native (consolidated) weight formats to the Pixtral model and introduces a new test case for consolidated loading. Feedback indicates that the added test uses a text-only model, which fails to exercise the vision encoder weight loading logic. Additionally, the weight loading implementation may fail to match keys containing a '.weight' suffix, and the remapping logic for native parameter names is inefficiently located and may lead to dropped weights.

Comment thread tests/models/multimodal/generation/test_pixtral.py
Comment thread vllm/model_executor/models/pixtral.py
Comment thread vllm/model_executor/models/pixtral.py Outdated
Signed-off-by: Julien Denize <julien.denize@mistral.ai>
Signed-off-by: juliendenize <julien.denize@mistral.ai>
@juliendenize juliendenize force-pushed the fix/pixtral-consolidated-weight-loading branch from e1ed624 to b22a56b Compare April 15, 2026 15:19
@mergify mergify Bot added multi-modality Related to multi-modality (#4194) bug Something isn't working labels Apr 15, 2026
(".qkv_proj", ".v_proj", "v"),
(".gate_up_proj", ".gate_proj", 0),
(".gate_up_proj", ".up_proj", 1),
# Mistral native (consolidated) format
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The wo and w2 parameters are handled via _vision_encoder_name_remap rather than through _vision_encoder_stacked_params. Since they're not sharded across TP ranks like qkv/w1/w3, they don't appear in the stacked params list. Is there a reason they couldn't be added to the stacked params list with their shard_id, or is the remap approach more robust to variations in how these keys appear across different checkpoint formats?

Copy link
Copy Markdown
Collaborator

@NickLucche NickLucche left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does the ministral test you added actually excercise the fix? 🤔

@mgoin mgoin added the ready ONLY add when PR is ready to merge/full CI is needed label Apr 16, 2026
@vllm-bot vllm-bot merged commit 6097afb into vllm-project:main Apr 20, 2026
58 of 60 checks passed
@juliendenize
Copy link
Copy Markdown
Contributor Author

does the ministral test you added actually excercise the fix? 🤔

Hey thanks for the merge and to answer your question @NickLucche It does by making sure the output is not garbaged with a smaller gpu size than needed to Pixtral. I think I could even lower the size of the GPU but AFAIK it should be always used by the CI for 16 GB right ? So now we should catch whenever a regression happens !

bnellnm pushed a commit to neuralmagic/vllm that referenced this pull request Apr 20, 2026
…project#39916)

Signed-off-by: Julien Denize <julien.denize@mistral.ai>
Signed-off-by: juliendenize <julien.denize@mistral.ai>
baonudesifeizhai pushed a commit to baonudesifeizhai/vllm that referenced this pull request Apr 23, 2026
…project#39916)

Signed-off-by: Julien Denize <julien.denize@mistral.ai>
Signed-off-by: juliendenize <julien.denize@mistral.ai>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working multi-modality Related to multi-modality (#4194) ready ONLY add when PR is ready to merge/full CI is needed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants