[Mistral Grammar] Support Grammar Factory#38150
[Mistral Grammar] Support Grammar Factory#38150juliendenize wants to merge 6 commits intovllm-project:mainfrom
Conversation
There was a problem hiding this comment.
Code Review
This pull request integrates llguidance and mistral-common's GrammarFactory into vLLM's Mistral tokenizer and tool parsing logic. The MistralTokenizer now exposes properties to check grammar support and access GrammarFactory and llguidance.LLTokenizer instances. The MistralToolParser's adjust_request method is enhanced to dynamically generate Lark grammars for tool calling based on request parameters for supported Mistral tokenizers. Additionally, structured output validation in sampling_params.py is refined to allow Tekken Mistral tokenizers with the guidance backend, while explicitly disallowing non-Tekken Mistral tokenizers. New tests have been added to cover this new functionality. There is no feedback to provide on the review comments as none were given.
|
This pull request has merge conflicts that must be resolved before it can be |
bbrowning
left a comment
There was a problem hiding this comment.
I know this is a draft, but I took a first pass at this anyway and have a few inline comments. All of these items may already be planned to handle as work progresses, so feel free to discard if they're already on your radar.
| # or includes some jsonschema feature(s) that | ||
| # are not supported in xgrammar. | ||
|
|
||
| skip_guidance = _is_non_tekken_mistral(tokenizer) |
There was a problem hiding this comment.
This will pick the guidance backend by default on all Mistral models that use tekken tokenizers, right? That's likely intended, but just confirming this was expected to not be limited to only Mistral 4+ models.
There was a problem hiding this comment.
yeah so to me there is two different things in this PR:
- support guidance backend for mistral tokenizers
- support grammar factory
The former only requires that mistral tokenizers are tekken and the latter also has the constraints that they are > v11 with respect to Mistral tokenizer version.
| raise ValueError( | ||
| "Non-tekken Mistral tokenizers are not supported for the 'guidance'" | ||
| " structured output backend. Please use ['xgrammar', 'outlines'] " | ||
| "backends or tokenizer_mode='hf' instead." | ||
| ) |
There was a problem hiding this comment.
From a user point-of-view, will I know what a non-tekken Mistral tokenizer is? The previous message just said Mistral tokenizer which was fairly easy to map to the tokenizer flag in vLLM and its value. Maybe this should mention specific Mistral model families or something else the user can easily understand so they know why this combination of flags doesn't work and what their options are?
There was a problem hiding this comment.
Tekken is rather "old" etc. Don't think it would simplify the messaging by listing a bunch of models / families as it would involve some specifics naming. Maybe it's worth mentioning that there are fresher models than the one used that supports it ?
There was a problem hiding this comment.
That's a fair point about the number of newer models being quite large. I like your suggestion of mentioning that they'll need to use newer models if they want to use the guidance backend is a reasonable way. It keeps it actionable for the user ("Oh, I either need to swap backends or find a newer Mistral model") while not giving a detailed list of specific models that will change over time.
There was a problem hiding this comment.
done ! lmk if it is good now
There was a problem hiding this comment.
Yes, the new message does a good job of letting the user know how to resolve this, either by using a newer model or changing their vllm server args. Thank you!
| """ | ||
|
|
||
| # Used to generate correct grammar in `adjust_request` | ||
| model_can_reason: bool = False |
There was a problem hiding this comment.
Is there ever set to True somewhere? Perhaps some unfinished change to flip this to True for certain models?
There was a problem hiding this comment.
This should be allowed in a subsequent PR that fixes tool / reasoning parsing when mistral grammar is active
| from pathlib import Path | ||
| from typing import TYPE_CHECKING, Any, cast, overload | ||
|
|
||
| from mistral_common.guidance.grammar_factory import GrammarFactory |
There was a problem hiding this comment.
Just a note that this import will fail without a minimum mistral_common version. So, this either has to be a conditional import only used and grammar factory only enabled when we have a newer mistral_common OR we need to bump the minimum mistral_common version required for vLLM. I assume that's what the plan was, once the new mistral_common gets released.
There was a problem hiding this comment.
Yes the plan is to mark this PR ready as soon as mistral-common is updated to be sure grammar is available
| def adjust_request(self, request: ChatCompletionRequest) -> ChatCompletionRequest: | ||
| request = super().adjust_request(request) | ||
| if not is_mistral_tokenizer(self.model_tokenizer): | ||
| request = super().adjust_request(request) |
There was a problem hiding this comment.
What happens for Mistral models using the mistral tokenizer, but either a non-tekken tokenizer or a model older than what we enable for use with the new grammar? Should this call super().adjust_request(request) for all models except the ones we specifically use the new grammar with?
Otherwise, I think we end up where previously we did this for all Mistral models and now we don't do that any time a Mistral tokenizer is in use. This leaves a gap of all Mistral tokenizer usage that doesn't use the new grammar bits where they miss the request adjustment from the abstract_tool_parser.py.
Perhaps this is what you meant about breaking tool_choice="required". This breaks that for older Mistral models. And, the new Lark grammar breaks that for newer Mistral models (because it expects a JSON schema today). So, just pointing out that we've broken tool_choice="required" in 2 different ways for different variants of Mistral models.
There was a problem hiding this comment.
Yes you're right I didn't have a strong opinionated answer to this question. I think the easiest might be indeed to just check if the grammar is available for this model or not as you mentioned so previous models stay broken for model distribution but still output a json while new models use the grammar + latter fix for parsing.
There was a problem hiding this comment.
I'll defer to the Mistral team here. From my point-of-view the ideal outcome would be that accuracy for user's existing workflows with older Mistral models doesn't degrade while for newer Mistral models we bring the new grammar. I think the current code leaves a gap there for anyone that was using previous models with required / named tool calls before (as adjust_request is how we wire them in, I think?).
There was a problem hiding this comment.
I just updated it to make sure now old models fallback to old behavior.
| or request.structured_outputs is not None | ||
| or ( | ||
| request.response_format is not None | ||
| and request.response_format.type != "text" |
There was a problem hiding this comment.
This may be a bit broad. For example, if the response_format was set to json_object would we still want to enable the grammar-based guiding? Or does the model not output JSON for its response in that case?
There was a problem hiding this comment.
So in this case we just ignore and leave the grammar to vLLM but we're indeed considering injecting our lark grammar there as well as we can handle json_schema with or without tool choice set (the model choses between calling a tool or creating a json)
There was a problem hiding this comment.
Sounds good - if it's scope creep, feel free to go with what you were originally going to do and we can iterate on this in future PRs.
Signed-off-by: juliendenize <julien.denize@mistral.ai>
Signed-off-by: juliendenize <julien.denize@mistral.ai>
Signed-off-by: juliendenize <julien.denize@mistral.ai>
a9eaa24 to
1324904
Compare
Signed-off-by: juliendenize <julien.denize@mistral.ai>
Signed-off-by: juliendenize <julien.denize@mistral.ai>
|
I reviewed the latest updates, and they look good to me. Once this is ready to come out of draft state with the updated mistral_common dependency I'm happy to take another look but we'll need an approver in this area ready to review as well to get this merged quickly. |
Purpose
This PR adds Mistral grammar factory that creates lark grammar based on
tools,tool_choiceandreasoning.To do that it adds the following:
For this PR to work, mistral-common must be installed from this PR for now until it is merged mistralai/mistral-common#202 and a new mistral-common release is out.
The Grammar factory details can be found in the mistral-common PR description regarding how tool choices and reasoning influence the grammar.
This PR will break the tool call parsing for some requests that expect a json grammar (such as
tool_choice="required"). A follow-up PR will be submitted to ensure this is fixed.This is the first PR that aims to replicate some features from the #37081 while ensuring better separation of concerns and that mistral-common hosts the grammar factory logic.
Test Plan
Added tests to:
Test Result
They all pass.
Essential Elements of an Effective PR Description Checklist
supported_models.mdandexamplesfor a new model.BEFORE SUBMITTING, PLEASE READ https://docs.vllm.ai/en/latest/contributing (anything written below this line will be removed by GitHub Actions)