[BUGFIX] Pixtral cannot be loaded with --limit-mm-per-prompt 0#33406
Conversation
Signed-off-by: juliendenize <julien.denize@mistral.ai>
There was a problem hiding this comment.
Code Review
This pull request addresses a bug where the Pixtral model fails to load when multi-modal features are disabled via --limit-mm-per-prompt 0. The root cause is that vision-related modules are replaced by StageMissingLayer placeholders, but the weight loading logic only checked for None, leading to attempts to load weights into non-existent modules.
The fix introduces a helper function _is_layer_none_or_staged to correctly check if a layer is either None or a StageMissingLayer placeholder. This check is then applied in the load_weights method for all vision-related components (vision_encoder, patch_merger, pre_mm_projector_norm, vision_language_adapter), ensuring that weight loading is skipped for these components when they are not active.
The changes are correct, well-targeted, and effectively resolve the described bug. The use of a helper function for the check is good practice and keeps the code clean.
|
@juliendenize good catch, it works for me, thanks! |
…project#33406) Signed-off-by: juliendenize <julien.denize@mistral.ai>
…project#33406) Signed-off-by: juliendenize <julien.denize@mistral.ai> Signed-off-by: Pai <416932041@qq.com>
…project#33406) (#28) Signed-off-by: juliendenize <julien.denize@mistral.ai> Co-authored-by: Julien Denize <40604584+juliendenize@users.noreply.github.com>
…project#33406) Signed-off-by: juliendenize <julien.denize@mistral.ai>
Purpose
This PR fixes the loading of Pixtral model when the
--limit-mm-per-promptis set to 0 for images.In such cases, the vision part is no longer
NonebutStageMissingLayer. However as the vision weights exist the loading weight functions still try to load them in a non-existent module.Fix #32959 in place of #33006 or #33008.
Thanks @dbary for informing me that the error was still present. I believe this PR should also be used for #33174. LMK if it works out for you 😄
Test Plan
I checked it works for
mistralai/Mistral-Large-3-675B-Instruct-2512andmistralai/Devstral-Small-2-24B-Instruct-2512for--limit-mm-per-promptin [0,1].Test Result
Model is successfully loaded.
Essential Elements of an Effective PR Description Checklist