[Llama.py -> mistral.py] Extract mistral-only relevant code into separate file by patrickvonplaten · Pull Request #32780 · vllm-project/vllm

patrickvonplaten · 2026-01-21T13:44:27Z

We're adding more and more mistral-only code to the llama.py class which makes it harder to read and creates possible future unwanted dependencies. E.g. if other models depend on the llama.py class one might think that mistral-only code might also be relevant for such classes and thus make vLLM too rigid.

Signed-off-by: Patrick von Platen <patrick.v.platen@gmail.com>

patrickvonplaten · 2026-01-21T13:46:56Z

vllm/model_executor/models/llama.py

-
-    # This function is used to remap the mistral format as
-    # used by Mistral and Llama <=2
-    def maybe_remap_mistral(


this code comes from a very old PR: #8168 and I'm quite convinced that it's only mistral checkpoints that actually make use of this function so moving it out

patrickvonplaten · 2026-01-21T13:47:25Z

vllm/model_executor/models/llama.py

            prefix=f"{prefix}.attn",
        )

-    def _get_llama_4_attn_scale(self, positions: torch.Tensor) -> torch.Tensor:


only mistral makes use of llama_4 scaling

patrickvonplaten · 2026-01-21T13:47:36Z

vllm/model_executor/models/llama.py

            assert tp_size % self.total_num_kv_heads == 0
        self.num_kv_heads = max(1, self.total_num_kv_heads // tp_size)
-        # MistralConfig has an optional head_dim introduced by Mistral-Nemo
-        head_dim = getattr(config, "head_dim", None)


afaik only mistral-nemo every used this

Doesn't seem to be the case 😅

mergify · 2026-01-21T13:48:57Z

Hi @patrickvonplaten, the pre-commit checks have failed. Please run:

uv pip install pre-commit
pre-commit install
pre-commit run --all-files

Then, commit the changes and push to your branch.

For future commits, pre-commit will run automatically on changed files before each commit.

Tip

Is mypy or markdownlint failing?

mypy and markdownlint are run differently in CI. If the failure is related to either of these checks, please use the following commands to run them locally:

# For mypy (substitute "3.10" with the failing version if needed)
pre-commit run --hook-stage manual mypy-3.10
# For markdownlint
pre-commit run --hook-stage manual markdownlint

Signed-off-by: Patrick von Platen <patrick.v.platen@gmail.com>

gemini-code-assist

Code Review

The pull request successfully extracts Mistral-specific model adaptations into a new file, mistral.py, and refactors llama.py to be more generic. This improves modularity and maintainability of the codebase. The changes in llama.py and registry.py are appropriate for this refactoring.

I am having trouble creating individual review comments. Click here to see my feedback.

vllm/model_executor/mistral.py (214-233)

The logic within maybe_remap_mistral for handling wq and wk weights, especially with the conditional checks for qscale_weight and loaded_weight.numel() > 1, is quite complex and repetitive. This intricate logic increases the potential for errors and makes future modifications or debugging challenging. Consider refactoring this section to improve clarity and reduce duplication, perhaps by abstracting the common permutation and conditional checks into smaller, more focused helper functions.

DarkLight1337

Thanks for the cleanup!

patrickvonplaten · 2026-01-21T15:13:15Z

Hmm the docker building is probably flaky no @DarkLight1337 ?

Signed-off-by: Patrick von Platen <patrick.v.platen@gmail.com>

…atrickvonplaten/vllm into move_mistral_into_its_own_file

Signed-off-by: Patrick von Platen <patrick.v.platen@gmail.com>

patrickvonplaten · 2026-01-21T22:46:37Z

Think final failing tests are unrelated:

[2026-01-21T19:36:47Z] ERROR entrypoints/openai/test_serving_chat.py::TestGPTOSSChat::test_gpt_oss_chat_tool_call_streaming[with_tool_parser-exclude_tools_when_tool_choice_none] - RuntimeError: Server failed to start in time.
[2026-01-21T18:50:16Z] ERROR entrypoints/openai/responses/test_parsable_context.py::test_basic[Qwen/Qwen3-8B] - RuntimeError: Server failed to start in time.

good to merge you think @DarkLight1337 ?

…rate file (vllm-project#32780) Signed-off-by: Patrick von Platen <patrick.v.platen@gmail.com> Signed-off-by: mohammad najafi <mohammad.najafi@amd.com>

…llm-project#32780 The refactor in PR vllm-project#32780 moved Mistral-specific code to mistral.py, but pixtral.py had unsafe dictionary accesses that caused KeyError when loading checkpoints with multi_modal_projector.patch_merger weights. Changes: - Use .get() instead of direct access for patch_merger_dict - Use .get() instead of direct access for pre_mm_projector_norm_dict - Improved is_patch_merger() to recognize multi_modal_projector.patch_merger prefix - Added null checks to gracefully handle missing weights Fixes vllm-project#32959 Signed-off-by: Mieszko Syty <mieszko@ms1design.pl>

…rate file (vllm-project#32780) Signed-off-by: Patrick von Platen <patrick.v.platen@gmail.com> Signed-off-by: 陈建华 <1647430658@qq.com>

…rate file (vllm-project#32780) Signed-off-by: Patrick von Platen <patrick.v.platen@gmail.com>

patrickvonplaten added 6 commits January 21, 2026 14:44

WIP

96725f0

Signed-off-by: Patrick von Platen <patrick.v.platen@gmail.com>

up

e4a900b

Signed-off-by: Patrick von Platen <patrick.v.platen@gmail.com>

up

a2c20f2

Signed-off-by: Patrick von Platen <patrick.v.platen@gmail.com>

up

8ff01e3

Signed-off-by: Patrick von Platen <patrick.v.platen@gmail.com>

up

7018238

Signed-off-by: Patrick von Platen <patrick.v.platen@gmail.com>

up

9913155

Signed-off-by: Patrick von Platen <patrick.v.platen@gmail.com>

patrickvonplaten force-pushed the move_mistral_into_its_own_file branch from 1141f58 to 9913155 Compare January 21, 2026 13:44

mergify bot added llama Related to Llama models new-model Requests to new models labels Jan 21, 2026

patrickvonplaten commented Jan 21, 2026

View reviewed changes

patrickvonplaten requested a review from DarkLight1337 January 21, 2026 13:48

up

3f9ecab

Signed-off-by: Patrick von Platen <patrick.v.platen@gmail.com>

gemini-code-assist bot reviewed Jan 21, 2026

View reviewed changes

DarkLight1337 approved these changes Jan 21, 2026

View reviewed changes

DarkLight1337 added the ready ONLY add when PR is ready to merge/full CI is needed label Jan 21, 2026

Merge branch 'main' into move_mistral_into_its_own_file

f731007

DarkLight1337 enabled auto-merge (squash) January 21, 2026 15:31

patrickvonplaten added 3 commits January 21, 2026 18:32

up

ede4bcc

Signed-off-by: Patrick von Platen <patrick.v.platen@gmail.com>

Merge branch 'move_mistral_into_its_own_file' of https://github.com/p…

3e4e976

…atrickvonplaten/vllm into move_mistral_into_its_own_file

up

4cab862

Signed-off-by: Patrick von Platen <patrick.v.platen@gmail.com>

DarkLight1337 merged commit 1579c9b into vllm-project:main Jan 22, 2026
53 checks passed

ms1design mentioned this pull request Jan 23, 2026

[Bug]: KeyError: merging_layer.weight when loading Mistral/vision-enabled checkpoints after PR #32780 refactor #32959

Closed

1 task

ms1design mentioned this pull request Jan 24, 2026

[BugFix] KeyError when loading Mistral/vision-enabled checkpoints after PR #32780 #33006

Closed

5 tasks

ms1design mentioned this pull request Jan 24, 2026

[BugFix] KeyError when loading Mistral/vision-enabled checkpoints #33008

Closed

5 tasks

andylolu2 mentioned this pull request Jan 26, 2026

Remove unused logic in models/mistral.py #33095

Merged

lapy pushed a commit to lapy/vllm that referenced this pull request Jan 27, 2026

[Llama.py -> mistral.py] Extract mistral-only relevant code into sepa…

d9de4b9

…rate file (vllm-project#32780) Signed-off-by: Patrick von Platen <patrick.v.platen@gmail.com>

ItzDEXX pushed a commit to ItzDEXX/vllm that referenced this pull request Feb 19, 2026

[Llama.py -> mistral.py] Extract mistral-only relevant code into sepa…

b322795

…rate file (vllm-project#32780) Signed-off-by: Patrick von Platen <patrick.v.platen@gmail.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Llama.py -> mistral.py] Extract mistral-only relevant code into separate file#32780

[Llama.py -> mistral.py] Extract mistral-only relevant code into separate file#32780
DarkLight1337 merged 11 commits intovllm-project:mainfrom
patrickvonplaten:move_mistral_into_its_own_file

patrickvonplaten commented Jan 21, 2026 •

edited by github-actions bot

Loading

Uh oh!

patrickvonplaten Jan 21, 2026

Uh oh!

patrickvonplaten Jan 21, 2026

Uh oh!

patrickvonplaten Jan 21, 2026

Uh oh!

patrickvonplaten Jan 21, 2026

Uh oh!

mergify bot commented Jan 21, 2026

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

DarkLight1337 left a comment

Uh oh!

patrickvonplaten commented Jan 21, 2026

Uh oh!

patrickvonplaten commented Jan 21, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

patrickvonplaten commented Jan 21, 2026 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

patrickvonplaten Jan 21, 2026

Choose a reason for hiding this comment

Uh oh!

patrickvonplaten Jan 21, 2026

Choose a reason for hiding this comment

Uh oh!

patrickvonplaten Jan 21, 2026

Choose a reason for hiding this comment

Uh oh!

patrickvonplaten Jan 21, 2026

Choose a reason for hiding this comment

Uh oh!

mergify bot commented Jan 21, 2026

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

vllm/model_executor/mistral.py (214-233)

Uh oh!

DarkLight1337 left a comment

Choose a reason for hiding this comment

Uh oh!

patrickvonplaten commented Jan 21, 2026

Uh oh!

patrickvonplaten commented Jan 21, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

patrickvonplaten commented Jan 21, 2026 •

edited by github-actions bot

Loading