Skip to content

Add Mistral Large 3 and Ministral 3#29757

Merged
khluu merged 24 commits intovllm-project:mainfrom
juliendenize:add_mistral_large_3
Dec 2, 2025
Merged

Add Mistral Large 3 and Ministral 3#29757
khluu merged 24 commits intovllm-project:mainfrom
juliendenize:add_mistral_large_3

Conversation

@juliendenize
Copy link
Contributor

@juliendenize juliendenize commented Nov 30, 2025

Purpose

This PR adds support to Mistral-Large-3 and Ministral-3.

Test Plan

Test Result


Essential Elements of an Effective PR Description Checklist
  • The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
  • The test plan, such as providing test command.
  • The test results, such as pasting the results comparison before and after, or e2e results
  • (Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
  • (Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

Signed-off-by: Julien Denize <julien.denize@mistral.ai>
@chatgpt-codex-connector
Copy link

Codex usage limits have been reached for code reviews. Please check with the admins of this repo to increase the limits by adding credits.

@mergify mergify bot added deepseek Related to DeepSeek models new-model Requests to new models speculative-decoding v1 labels Nov 30, 2025
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request adds support for Mistral Large 3 and its Eagle variant by reusing the DeepseekV2 architecture. The changes are generally well-structured, including new model files, registry updates, and configuration adaptations. However, I've identified a few potential issues concerning robustness and possible regressions that should be addressed to ensure the stability and correctness of the implementation.


@staticmethod
def hf_config_override(hf_config: PretrainedConfig) -> PretrainedConfig:
initial_architecture = hf_config.architectures[0]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

The code initial_architecture = hf_config.architectures[0] assumes that hf_config.architectures is a non-empty list. However, the architectures attribute in PretrainedConfig can be None or an empty list, which would cause a TypeError or IndexError respectively. This could lead to a crash when loading a model with a malformed or missing architectures field in its config. It's safer to check for the presence of architectures before accessing its elements.

Suggested change
initial_architecture = hf_config.architectures[0]
initial_architecture = hf_config.architectures[0] if hf_config.architectures else None

del config.rope_theta
else:
# When Transformers v4 is installed, legacy rope_scaling may be present
if Version(version("transformers")) < Version("5.0.0.dev0"):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

The logic for patching RoPE parameters for transformers>=5.0.0 has been removed. This logic handled backward compatibility for models that use the legacy rope_theta attribute. Removing it could cause a regression, leading to incorrect RoPE configurations for certain models when used with transformers version 5 or higher. This might result in silent correctness issues. Was the removal of this block intentional? If so, the reasoning should be documented. Otherwise, it should be restored to prevent potential regressions.

@mgoin mgoin added the ready ONLY add when PR is ready to merge/full CI is needed label Nov 30, 2025
Copy link
Member

@mgoin mgoin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment on lines +163 to +164
if llama_4_scaling is not None:
q *= llama_4_scaling
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not just put this in a rotary embedding layer?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a choice but no strong opinion about this that i can think of now. We also did in llama.py would it be necessary to refactor now or could it be in later pr ?

@TheLocalDrummer
Copy link

Bro...

# LlamaForCausalLM -> Eagle3LlamaForCausalLM
# LlamaForCausalLMEagle3 -> LlamaForCausalLMEagle3
if method == "eagle":
if method is None:
Copy link
Member

@DarkLight1337 DarkLight1337 Dec 1, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you elaborate on why this change is needed?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also curious why "method": None is being observed here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I run into a bug when disabling --enforce-eager.

vLLM tries to compute a hash of the config, and this PretrainedConfig is called from transformers with no arguments. From what I understand, it was trying to check the diff between the actual EAGLEConfig and an uninitialized one, which triggers the assert.

I believe the root cause is #26468, changing

Let me know if you need more investigation on my side

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can put the use_diff=False back, do you want to do this in this PR or should I do it in a separate one?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we just revert this to make sure the PR gets in?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might be too late to revert all of #26468, but I can put the use_diff=False back

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@zou3519 can you open a separate PR for this? then we can update this PR accordingly

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We reverted the change here, should be good

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

return 0.1 * mscale * math.log(scale) + 1.0


def _get_llama_4_scaling(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we have a plan to move these helper functions into a utility file?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this could done yes, we also have it in llama.py question would be do we do it now ? It is not exactly the same as the one in llama4.py (a different offset).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's address this in a follow-up PR

Signed-off-by: Julien Denize <julien.denize@mistral.ai>
@juliendenize juliendenize requested a review from ywang96 as a code owner December 1, 2025 16:36
Signed-off-by: Julien Denize <julien.denize@mistral.ai>
Signed-off-by: Julien Denize <julien.denize@mistral.ai>
Comment on lines +778 to +779
# TODO: revert once Mistral-Large-3 and Ministral-3 are publicly available.
is_available_online=False,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should flip this before 0.12 goes out @khluu

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's okay to leave this as is since we're cutting a branch today

continue_final_message: bool = False,
add_generation_prompt: bool = False,
) -> tuple[list["ChatCompletionMessageParam"], list[dict[str, Any]] | None]:
from mistral_common.protocol.instruct.tool_calls import Function, Tool
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why dynamic import here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's the same we do for all functions in this file we import at use because some people don't want to install mistral-common.

juliendenize and others added 3 commits December 1, 2025 21:44
Signed-off-by: Julien Denize <julien.denize@mistral.ai>
Hashing the config can crash because constructor is called with default
arguments only.

Pass use_diff=false to avoid this behavior

Signed-off-by: Mickael Seznec <mickael@mistral.ai>
Signed-off-by: Mickael Seznec <mickael@mistral.ai>
return x.to_json_string()
# using `use_diff=False` to avoid initializing object with
# default arguments only
return x.to_json_string(use_diff=False)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use_diff to make sure we don't have trouble with speculative decoding

Copy link
Member

@ywang96 ywang96 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Amazing! Thanks for all the work and look forward to the release!

Approving as the rest of the comments can be addressed in follow-up PRs.

mickaelseznec and others added 8 commits December 1, 2025 23:07
Signed-off-by: Mickael Seznec <mickael@mistral.ai>
Signed-off-by: Mickael Seznec <mickael@mistral.ai>
only fix for eagle config, avoid larger impact on the codebase

Signed-off-by: Mickael Seznec <mickael@mistral.ai>
Signed-off-by: Roger Wang <hey@rogerw.io>
Signed-off-by: Mickael Seznec <mickael@mistral.ai>
Signed-off-by: Mickael Seznec <mickael@mistral.ai>
@DarkLight1337 DarkLight1337 added this to the v0.12.0 milestone Dec 2, 2025
@khluu khluu enabled auto-merge (squash) December 2, 2025 10:03
@khluu khluu merged commit d8c6210 into vllm-project:main Dec 2, 2025
61 checks passed
dtrifiro pushed a commit to dtrifiro/vllm that referenced this pull request Dec 11, 2025
d8c6210

Signed-off-by: Julien Denize <julien.denize@mistral.ai>
Signed-off-by: Julien Denize <40604584+juliendenize@users.noreply.github.com>
Signed-off-by: Mickael Seznec <mickael@mistral.ai>
Signed-off-by: Roger Wang <hey@rogerw.io>
Co-authored-by: Roger Wang <hey@rogerw.io>
Co-authored-by: Mickael Seznec <mickael@mistral.ai>
dtrifiro pushed a commit to dtrifiro/vllm that referenced this pull request Dec 11, 2025
…vllm-project#318)

d8c6210 plus enablement `ministral-large-3` and `ministral-large`
download in the registry


vllm-project#29757
dsuhinin pushed a commit to dsuhinin/vllm that referenced this pull request Jan 21, 2026
Signed-off-by: Julien Denize <julien.denize@mistral.ai>
Signed-off-by: Julien Denize <40604584+juliendenize@users.noreply.github.com>
Signed-off-by: Mickael Seznec <mickael@mistral.ai>
Signed-off-by: Roger Wang <hey@rogerw.io>
Co-authored-by: Roger Wang <hey@rogerw.io>
Co-authored-by: Mickael Seznec <mickael@mistral.ai>
Signed-off-by: dsuhinin <suhinin.dmitriy@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

deepseek Related to DeepSeek models documentation Improvements or additions to documentation frontend new-model Requests to new models ready ONLY add when PR is ready to merge/full CI is needed speculative-decoding tool-calling v1

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.