[Mistral Grammar] Support Grammar Factory by juliendenize · Pull Request #38150 · vllm-project/vllm

juliendenize · 2026-03-25T21:49:35Z

Purpose

This PR adds Mistral grammar factory that creates lark grammar based on tools, tool_choice and reasoning.
To do that it adds the following:

Guidance backend now support MistralTokenizer
MistralTokenizer can cache a mistral-common GrammarFactory
MistralTokenizer can create and cache a llguidance tokenizer
MistralToolParser for tokenizers that support grammar (tokenizer version >= v11 and Tekkenizer) adjust the request to add the lark grammar.

For this PR to work, mistral-common must be installed from this PR for now until it is merged mistralai/mistral-common#202 and a new mistral-common release is out.

The Grammar factory details can be found in the mistral-common PR description regarding how tool choices and reasoning influence the grammar.

This PR will break the tool call parsing for some requests that expect a json grammar (such as tool_choice="required"). A follow-up PR will be submitted to ensure this is fixed.

This is the first PR that aims to replicate some features from the #37081 while ensuring better separation of concerns and that mistral-common hosts the grammar factory logic.

Test Plan

Added tests to:

ensure the connection between vLLM and mistral-common factory
ensure the guidance backend can use a tokenizer instantiated from a MistralTokenizer.

Test Result

They all pass.

Essential Elements of an Effective PR Description Checklist

The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
The test plan, such as providing test command.
The test results, such as pasting the results comparison before and after, or e2e results
(Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
(Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

BEFORE SUBMITTING, PLEASE READ https://docs.vllm.ai/en/latest/contributing (anything written below this line will be removed by GitHub Actions)

gemini-code-assist

Code Review

This pull request integrates llguidance and mistral-common's GrammarFactory into vLLM's Mistral tokenizer and tool parsing logic. The MistralTokenizer now exposes properties to check grammar support and access GrammarFactory and llguidance.LLTokenizer instances. The MistralToolParser's adjust_request method is enhanced to dynamically generate Lark grammars for tool calling based on request parameters for supported Mistral tokenizers. Additionally, structured output validation in sampling_params.py is refined to allow Tekken Mistral tokenizers with the guidance backend, while explicitly disallowing non-Tekken Mistral tokenizers. New tests have been added to cover this new functionality. There is no feedback to provide on the review comments as none were given.

mergify · 2026-03-26T04:05:24Z

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @juliendenize.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

bbrowning

I know this is a draft, but I took a first pass at this anyway and have a few inline comments. All of these items may already be planned to handle as work progresses, so feel free to discard if they're already on your radar.

bbrowning · 2026-03-26T13:55:49Z

vllm/sampling_params.py

                # or includes some jsonschema feature(s) that
                # are not supported in xgrammar.

+                skip_guidance = _is_non_tekken_mistral(tokenizer)


This will pick the guidance backend by default on all Mistral models that use tekken tokenizers, right? That's likely intended, but just confirming this was expected to not be limited to only Mistral 4+ models.

yeah so to me there is two different things in this PR:

support guidance backend for mistral tokenizers

support grammar factory

The former only requires that mistral tokenizers are tekken and the latter also has the constraints that they are > v11 with respect to Mistral tokenizer version.

bbrowning · 2026-03-26T13:58:11Z

vllm/sampling_params.py

+                raise ValueError(
+                    "Non-tekken Mistral tokenizers are not supported for the 'guidance'"
+                    " structured output backend. Please use ['xgrammar', 'outlines'] "
+                    "backends or tokenizer_mode='hf' instead."
+                )


From a user point-of-view, will I know what a non-tekken Mistral tokenizer is? The previous message just said Mistral tokenizer which was fairly easy to map to the tokenizer flag in vLLM and its value. Maybe this should mention specific Mistral model families or something else the user can easily understand so they know why this combination of flags doesn't work and what their options are?

Tekken is rather "old" etc. Don't think it would simplify the messaging by listing a bunch of models / families as it would involve some specifics naming. Maybe it's worth mentioning that there are fresher models than the one used that supports it ?

That's a fair point about the number of newer models being quite large. I like your suggestion of mentioning that they'll need to use newer models if they want to use the guidance backend is a reasonable way. It keeps it actionable for the user ("Oh, I either need to swap backends or find a newer Mistral model") while not giving a detailed list of specific models that will change over time.

done ! lmk if it is good now

Yes, the new message does a good job of letting the user know how to resolve this, either by using a newer model or changing their vllm server args. Thank you!

bbrowning · 2026-03-26T13:59:50Z

vllm/tool_parsers/mistral_tool_parser.py

    """

+    # Used to generate correct grammar in `adjust_request`
+    model_can_reason: bool = False


Is there ever set to True somewhere? Perhaps some unfinished change to flip this to True for certain models?

This should be allowed in a subsequent PR that fixes tool / reasoning parsing when mistral grammar is active

bbrowning · 2026-03-26T14:02:13Z

vllm/tokenizers/mistral.py

 from pathlib import Path
 from typing import TYPE_CHECKING, Any, cast, overload

+from mistral_common.guidance.grammar_factory import GrammarFactory


Just a note that this import will fail without a minimum mistral_common version. So, this either has to be a conditional import only used and grammar factory only enabled when we have a newer mistral_common OR we need to bump the minimum mistral_common version required for vLLM. I assume that's what the plan was, once the new mistral_common gets released.

Yes the plan is to mark this PR ready as soon as mistral-common is updated to be sure grammar is available

bbrowning · 2026-03-26T14:07:27Z

vllm/tool_parsers/mistral_tool_parser.py

    def adjust_request(self, request: ChatCompletionRequest) -> ChatCompletionRequest:
-        request = super().adjust_request(request)
+        if not is_mistral_tokenizer(self.model_tokenizer):
+            request = super().adjust_request(request)


What happens for Mistral models using the mistral tokenizer, but either a non-tekken tokenizer or a model older than what we enable for use with the new grammar? Should this call super().adjust_request(request) for all models except the ones we specifically use the new grammar with?

Otherwise, I think we end up where previously we did this for all Mistral models and now we don't do that any time a Mistral tokenizer is in use. This leaves a gap of all Mistral tokenizer usage that doesn't use the new grammar bits where they miss the request adjustment from the abstract_tool_parser.py.

Perhaps this is what you meant about breaking tool_choice="required". This breaks that for older Mistral models. And, the new Lark grammar breaks that for newer Mistral models (because it expects a JSON schema today). So, just pointing out that we've broken tool_choice="required" in 2 different ways for different variants of Mistral models.

Yes you're right I didn't have a strong opinionated answer to this question. I think the easiest might be indeed to just check if the grammar is available for this model or not as you mentioned so previous models stay broken for model distribution but still output a json while new models use the grammar + latter fix for parsing.

I'll defer to the Mistral team here. From my point-of-view the ideal outcome would be that accuracy for user's existing workflows with older Mistral models doesn't degrade while for newer Mistral models we bring the new grammar. I think the current code leaves a gap there for anyone that was using previous models with required / named tool calls before (as adjust_request is how we wire them in, I think?).

I just updated it to make sure now old models fallback to old behavior.

bbrowning · 2026-03-26T14:09:41Z

vllm/tool_parsers/mistral_tool_parser.py

+            or request.structured_outputs is not None
+            or (
+                request.response_format is not None
+                and request.response_format.type != "text"


This may be a bit broad. For example, if the response_format was set to json_object would we still want to enable the grammar-based guiding? Or does the model not output JSON for its response in that case?

So in this case we just ignore and leave the grammar to vLLM but we're indeed considering injecting our lark grammar there as well as we can handle json_schema with or without tool choice set (the model choses between calling a tool or creating a json)

Sounds good - if it's scope creep, feel free to go with what you were originally going to do and we can iterate on this in future PRs.

Signed-off-by: juliendenize <julien.denize@mistral.ai>

bbrowning · 2026-03-27T16:08:44Z

I reviewed the latest updates, and they look good to me. Once this is ready to come out of draft state with the updated mistral_common dependency I'm happy to take another look but we'll need an approver in this area ready to review as well to get this merged quickly.

mergify bot added structured-output v1 tool-calling labels Mar 25, 2026

github-project-automation bot added this to Tool Calling and Structured Output Mar 25, 2026

gemini-code-assist bot reviewed Mar 25, 2026

View reviewed changes

mergify bot added the needs-rebase label Mar 26, 2026

bbrowning reviewed Mar 26, 2026

View reviewed changes

juliendenize added 3 commits March 26, 2026 17:04

Add Mistral grammar

e0a0357

Signed-off-by: juliendenize <julien.denize@mistral.ai>

Improve adjust_request to support json and redirect

24c0b26

Signed-off-by: juliendenize <julien.denize@mistral.ai>

Improve error message for Guidance backend

1324904

Signed-off-by: juliendenize <julien.denize@mistral.ai>

juliendenize force-pushed the improve_mistral_parsing branch from a9eaa24 to 1324904 Compare March 26, 2026 17:21

mergify bot removed the needs-rebase label Mar 26, 2026

juliendenize added 3 commits March 27, 2026 10:32

Fix json grammar arguments

a3717f4

Signed-off-by: juliendenize <julien.denize@mistral.ai>

Update with respect to mistral-common guidance

d076e26

Signed-off-by: juliendenize <julien.denize@mistral.ai>

Fix tests

e90a7a1

Signed-off-by: juliendenize <julien.denize@mistral.ai>

juliendenize mentioned this pull request Mar 27, 2026

[Mistral Grammar] Fix tool and reasoning parsing. juliendenize/vllm#2

Open

5 tasks

Uh oh!

Conversation

juliendenize commented Mar 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Test Plan

Test Result

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

mergify bot commented Mar 26, 2026

Uh oh!

bbrowning left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

juliendenize Mar 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

bbrowning commented Mar 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

juliendenize commented Mar 25, 2026 •

edited

Loading

juliendenize Mar 26, 2026 •

edited

Loading