Skip to content

[Mistral Grammar] Support Grammar Factory#38150

Draft
juliendenize wants to merge 6 commits intovllm-project:mainfrom
juliendenize:improve_mistral_parsing
Draft

[Mistral Grammar] Support Grammar Factory#38150
juliendenize wants to merge 6 commits intovllm-project:mainfrom
juliendenize:improve_mistral_parsing

Conversation

@juliendenize
Copy link
Copy Markdown
Contributor

@juliendenize juliendenize commented Mar 25, 2026

Purpose

This PR adds Mistral grammar factory that creates lark grammar based on tools, tool_choice and reasoning.
To do that it adds the following:

  • Guidance backend now support MistralTokenizer
  • MistralTokenizer can cache a mistral-common GrammarFactory
  • MistralTokenizer can create and cache a llguidance tokenizer
  • MistralToolParser for tokenizers that support grammar (tokenizer version >= v11 and Tekkenizer) adjust the request to add the lark grammar.

For this PR to work, mistral-common must be installed from this PR for now until it is merged mistralai/mistral-common#202 and a new mistral-common release is out.

The Grammar factory details can be found in the mistral-common PR description regarding how tool choices and reasoning influence the grammar.

This PR will break the tool call parsing for some requests that expect a json grammar (such as tool_choice="required"). A follow-up PR will be submitted to ensure this is fixed.

This is the first PR that aims to replicate some features from the #37081 while ensuring better separation of concerns and that mistral-common hosts the grammar factory logic.

Test Plan

Added tests to:

  • ensure the connection between vLLM and mistral-common factory
  • ensure the guidance backend can use a tokenizer instantiated from a MistralTokenizer.

Test Result

They all pass.


Essential Elements of an Effective PR Description Checklist
  • The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
  • The test plan, such as providing test command.
  • The test results, such as pasting the results comparison before and after, or e2e results
  • (Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
  • (Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

BEFORE SUBMITTING, PLEASE READ https://docs.vllm.ai/en/latest/contributing (anything written below this line will be removed by GitHub Actions)

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request integrates llguidance and mistral-common's GrammarFactory into vLLM's Mistral tokenizer and tool parsing logic. The MistralTokenizer now exposes properties to check grammar support and access GrammarFactory and llguidance.LLTokenizer instances. The MistralToolParser's adjust_request method is enhanced to dynamically generate Lark grammars for tool calling based on request parameters for supported Mistral tokenizers. Additionally, structured output validation in sampling_params.py is refined to allow Tekken Mistral tokenizers with the guidance backend, while explicitly disallowing non-Tekken Mistral tokenizers. New tests have been added to cover this new functionality. There is no feedback to provide on the review comments as none were given.

@mergify
Copy link
Copy Markdown

mergify bot commented Mar 26, 2026

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @juliendenize.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

@mergify mergify bot added the needs-rebase label Mar 26, 2026
Copy link
Copy Markdown
Contributor

@bbrowning bbrowning left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know this is a draft, but I took a first pass at this anyway and have a few inline comments. All of these items may already be planned to handle as work progresses, so feel free to discard if they're already on your radar.

# or includes some jsonschema feature(s) that
# are not supported in xgrammar.

skip_guidance = _is_non_tekken_mistral(tokenizer)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will pick the guidance backend by default on all Mistral models that use tekken tokenizers, right? That's likely intended, but just confirming this was expected to not be limited to only Mistral 4+ models.

Copy link
Copy Markdown
Contributor Author

@juliendenize juliendenize Mar 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah so to me there is two different things in this PR:

  • support guidance backend for mistral tokenizers
  • support grammar factory

The former only requires that mistral tokenizers are tekken and the latter also has the constraints that they are > v11 with respect to Mistral tokenizer version.

Comment on lines +792 to +796
raise ValueError(
"Non-tekken Mistral tokenizers are not supported for the 'guidance'"
" structured output backend. Please use ['xgrammar', 'outlines'] "
"backends or tokenizer_mode='hf' instead."
)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From a user point-of-view, will I know what a non-tekken Mistral tokenizer is? The previous message just said Mistral tokenizer which was fairly easy to map to the tokenizer flag in vLLM and its value. Maybe this should mention specific Mistral model families or something else the user can easily understand so they know why this combination of flags doesn't work and what their options are?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tekken is rather "old" etc. Don't think it would simplify the messaging by listing a bunch of models / families as it would involve some specifics naming. Maybe it's worth mentioning that there are fresher models than the one used that supports it ?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a fair point about the number of newer models being quite large. I like your suggestion of mentioning that they'll need to use newer models if they want to use the guidance backend is a reasonable way. It keeps it actionable for the user ("Oh, I either need to swap backends or find a newer Mistral model") while not giving a detailed list of specific models that will change over time.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done ! lmk if it is good now

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, the new message does a good job of letting the user know how to resolve this, either by using a newer model or changing their vllm server args. Thank you!

"""

# Used to generate correct grammar in `adjust_request`
model_can_reason: bool = False
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there ever set to True somewhere? Perhaps some unfinished change to flip this to True for certain models?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be allowed in a subsequent PR that fixes tool / reasoning parsing when mistral grammar is active

from pathlib import Path
from typing import TYPE_CHECKING, Any, cast, overload

from mistral_common.guidance.grammar_factory import GrammarFactory
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a note that this import will fail without a minimum mistral_common version. So, this either has to be a conditional import only used and grammar factory only enabled when we have a newer mistral_common OR we need to bump the minimum mistral_common version required for vLLM. I assume that's what the plan was, once the new mistral_common gets released.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes the plan is to mark this PR ready as soon as mistral-common is updated to be sure grammar is available

def adjust_request(self, request: ChatCompletionRequest) -> ChatCompletionRequest:
request = super().adjust_request(request)
if not is_mistral_tokenizer(self.model_tokenizer):
request = super().adjust_request(request)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happens for Mistral models using the mistral tokenizer, but either a non-tekken tokenizer or a model older than what we enable for use with the new grammar? Should this call super().adjust_request(request) for all models except the ones we specifically use the new grammar with?

Otherwise, I think we end up where previously we did this for all Mistral models and now we don't do that any time a Mistral tokenizer is in use. This leaves a gap of all Mistral tokenizer usage that doesn't use the new grammar bits where they miss the request adjustment from the abstract_tool_parser.py.

Perhaps this is what you meant about breaking tool_choice="required". This breaks that for older Mistral models. And, the new Lark grammar breaks that for newer Mistral models (because it expects a JSON schema today). So, just pointing out that we've broken tool_choice="required" in 2 different ways for different variants of Mistral models.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes you're right I didn't have a strong opinionated answer to this question. I think the easiest might be indeed to just check if the grammar is available for this model or not as you mentioned so previous models stay broken for model distribution but still output a json while new models use the grammar + latter fix for parsing.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll defer to the Mistral team here. From my point-of-view the ideal outcome would be that accuracy for user's existing workflows with older Mistral models doesn't degrade while for newer Mistral models we bring the new grammar. I think the current code leaves a gap there for anyone that was using previous models with required / named tool calls before (as adjust_request is how we wire them in, I think?).

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just updated it to make sure now old models fallback to old behavior.

or request.structured_outputs is not None
or (
request.response_format is not None
and request.response_format.type != "text"
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This may be a bit broad. For example, if the response_format was set to json_object would we still want to enable the grammar-based guiding? Or does the model not output JSON for its response in that case?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So in this case we just ignore and leave the grammar to vLLM but we're indeed considering injecting our lark grammar there as well as we can handle json_schema with or without tool choice set (the model choses between calling a tool or creating a json)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good - if it's scope creep, feel free to go with what you were originally going to do and we can iterate on this in future PRs.

Signed-off-by: juliendenize <julien.denize@mistral.ai>
Signed-off-by: juliendenize <julien.denize@mistral.ai>
Signed-off-by: juliendenize <julien.denize@mistral.ai>
@juliendenize juliendenize force-pushed the improve_mistral_parsing branch from a9eaa24 to 1324904 Compare March 26, 2026 17:21
@mergify mergify bot removed the needs-rebase label Mar 26, 2026
Signed-off-by: juliendenize <julien.denize@mistral.ai>
Signed-off-by: juliendenize <julien.denize@mistral.ai>
Signed-off-by: juliendenize <julien.denize@mistral.ai>
@bbrowning
Copy link
Copy Markdown
Contributor

I reviewed the latest updates, and they look good to me. Once this is ready to come out of draft state with the updated mistral_common dependency I'm happy to take another look but we'll need an approver in this area ready to review as well to get this merged quickly.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

Status: No status
Status: No status

Development

Successfully merging this pull request may close these issues.

2 participants