[Llama.py -> mistral.py] Extract mistral-only relevant code into separate file#32780
Conversation
1141f58 to
9913155
Compare
|
|
||
| # This function is used to remap the mistral format as | ||
| # used by Mistral and Llama <=2 | ||
| def maybe_remap_mistral( |
There was a problem hiding this comment.
this code comes from a very old PR: #8168 and I'm quite convinced that it's only mistral checkpoints that actually make use of this function so moving it out
| prefix=f"{prefix}.attn", | ||
| ) | ||
|
|
||
| def _get_llama_4_attn_scale(self, positions: torch.Tensor) -> torch.Tensor: |
There was a problem hiding this comment.
only mistral makes use of llama_4 scaling
| assert tp_size % self.total_num_kv_heads == 0 | ||
| self.num_kv_heads = max(1, self.total_num_kv_heads // tp_size) | ||
| # MistralConfig has an optional head_dim introduced by Mistral-Nemo | ||
| head_dim = getattr(config, "head_dim", None) |
There was a problem hiding this comment.
afaik only mistral-nemo every used this
There was a problem hiding this comment.
Doesn't seem to be the case 😅
|
Hi @patrickvonplaten, the pre-commit checks have failed. Please run: uv pip install pre-commit
pre-commit install
pre-commit run --all-filesThen, commit the changes and push to your branch. For future commits, Tip Is
|
There was a problem hiding this comment.
Code Review
The pull request successfully extracts Mistral-specific model adaptations into a new file, mistral.py, and refactors llama.py to be more generic. This improves modularity and maintainability of the codebase. The changes in llama.py and registry.py are appropriate for this refactoring.
I am having trouble creating individual review comments. Click here to see my feedback.
vllm/model_executor/mistral.py (214-233)
The logic within maybe_remap_mistral for handling wq and wk weights, especially with the conditional checks for qscale_weight and loaded_weight.numel() > 1, is quite complex and repetitive. This intricate logic increases the potential for errors and makes future modifications or debugging challenging. Consider refactoring this section to improve clarity and reduce duplication, perhaps by abstracting the common permutation and conditional checks into smaller, more focused helper functions.
DarkLight1337
left a comment
There was a problem hiding this comment.
Thanks for the cleanup!
|
Hmm the docker building is probably flaky no @DarkLight1337 ? |
…atrickvonplaten/vllm into move_mistral_into_its_own_file
|
Think final failing tests are unrelated: good to merge you think @DarkLight1337 ? |
…rate file (vllm-project#32780) Signed-off-by: Patrick von Platen <patrick.v.platen@gmail.com> Signed-off-by: mohammad najafi <mohammad.najafi@amd.com>
…llm-project#32780 The refactor in PR vllm-project#32780 moved Mistral-specific code to mistral.py, but pixtral.py had unsafe dictionary accesses that caused KeyError when loading checkpoints with multi_modal_projector.patch_merger weights. Changes: - Use .get() instead of direct access for patch_merger_dict - Use .get() instead of direct access for pre_mm_projector_norm_dict - Improved is_patch_merger() to recognize multi_modal_projector.patch_merger prefix - Added null checks to gracefully handle missing weights Fixes vllm-project#32959 Signed-off-by: Mieszko Syty <mieszko@ms1design.pl>
…rate file (vllm-project#32780) Signed-off-by: Patrick von Platen <patrick.v.platen@gmail.com> Signed-off-by: 陈建华 <1647430658@qq.com>
…rate file (vllm-project#32780) Signed-off-by: Patrick von Platen <patrick.v.platen@gmail.com>
…rate file (vllm-project#32780) Signed-off-by: Patrick von Platen <patrick.v.platen@gmail.com>
We're adding more and more mistral-only code to the llama.py class which makes it harder to read and creates possible future unwanted dependencies. E.g. if other models depend on the llama.py class one might think that mistral-only code might also be relevant for such classes and thus make vLLM too rigid.