[Bugfix] Mistral tool parser streaming update#19425
[Bugfix] Mistral tool parser streaming update#19425DarkLight1337 merged 80 commits intovllm-project:mainfrom
Conversation
|
👋 Hi! Thank you for contributing to the vLLM project. 💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels. Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging. To run CI, PR reviewers can either: Add 🚀 |
There was a problem hiding this comment.
Summary of Changes
Hello @avigny, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!
This pull request addresses an issue related to streaming tool calls for Mistral models by replacing the previous partial_json_parser-based implementation with a more robust, custom stateful parser. This new approach aims to accurately extract tool call information, including names and arguments, as tokens are streamed, improving the reliability of tool use functionality in streaming mode. The changes include a significant rewrite of the parsing logic and the addition of comprehensive test coverage.
Highlights
- Refactor Streaming Parser: The core logic for parsing Mistral tool calls during streaming has been completely rewritten.
- Removed Dependency: The dependency on the
partial_json_parserlibrary for streaming tool call extraction has been removed. - New Parsing Mechanism: Introduced a custom stateful parsing mechanism using regex and
json.JSONDecoder.raw_decodeto incrementally extract tool call names and arguments from the raw token stream. - Comprehensive Tests: Added a new, extensive test file (
tests/tool_use/test_mistral_tool_parser.py) with various test cases covering both streaming and non-streaming scenarios for Mistral tool calls, including single and multiple tool calls with different argument structures.
Using Gemini Code Assist
The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.
Invoking Gemini
You can request assistance from Gemini at any point in your pull request via creating an issue comment (i.e. comment on the pull request page) using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands.
| Feature | Command | Description |
|---|---|---|
| Code Review | /gemini review |
Performs a code review for the current pull request in its current state. |
| Pull Request Summary | /gemini summary |
Provides a summary of the current pull request in its current state. |
| Comment | @gemini-code-assist | Responds in comments when explicitly tagged, both in issue comments and review comments. |
| Help | /gemini help |
Displays a list of available commands. |
Customization
To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.
Limitations & Feedback
Gemini Code Assist is currently in preview and may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments to provide feedback.
You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.
Footnotes
-
Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configureGemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩
There was a problem hiding this comment.
Code Review
This pull request refactors the streaming tool call parsing logic for Mistral models and adds a comprehensive test suite. The core change involves replacing partial_json_parser with a custom regex and json.raw_decode-based approach for more fine-grained control over the streaming process. The new tests cover a variety of scenarios. The review includes stylistic suggestions for the tests and points for consideration regarding complexity and state management in the new parsing logic.
Tests are similar as the ones added for Jamba models in vllm-project#9154 Signed-off-by: avigny <47987522+avigny@users.noreply.github.com>
Signed-off-by: avigny <47987522+avigny@users.noreply.github.com>
Signed-off-by: avigny <47987522+avigny@users.noreply.github.com>
Signed-off-by: avigny <47987522+avigny@users.noreply.github.com>
c468495 to
d6d17c1
Compare
|
@hibukipanim I did run the test you provided in your issue description #17585 (comment) and got the following output: ChoiceDeltaToolCall(index=0, id='j6OY9szTS', function=ChoiceDeltaToolCallFunction(arguments=None, name='mcp_confluence'), type='function')
ChoiceDeltaToolCall(index=0, id=None, function=ChoiceDeltaToolCallFunction(arguments='{"', name=None), type=None)
ChoiceDeltaToolCall(index=0, id=None, function=ChoiceDeltaToolCallFunction(arguments='query', name=None), type=None)
ChoiceDeltaToolCall(index=0, id=None, function=ChoiceDeltaToolCallFunction(arguments='":', name=None), type=None)
ChoiceDeltaToolCall(index=0, id=None, function=ChoiceDeltaToolCallFunction(arguments=' "', name=None), type=None)
ChoiceDeltaToolCall(index=0, id=None, function=ChoiceDeltaToolCallFunction(arguments='co', name=None), type=None)
ChoiceDeltaToolCall(index=0, id=None, function=ChoiceDeltaToolCallFunction(arguments='ffee', name=None), type=None)
ChoiceDeltaToolCall(index=0, id=None, function=ChoiceDeltaToolCallFunction(arguments='",', name=None), type=None)
ChoiceDeltaToolCall(index=0, id=None, function=ChoiceDeltaToolCallFunction(arguments=' "', name=None), type=None)
ChoiceDeltaToolCall(index=0, id=None, function=ChoiceDeltaToolCallFunction(arguments='limit', name=None), type=None)
ChoiceDeltaToolCall(index=0, id=None, function=ChoiceDeltaToolCallFunction(arguments='":', name=None), type=None)
ChoiceDeltaToolCall(index=0, id=None, function=ChoiceDeltaToolCallFunction(arguments=' ', name=None), type=None)
ChoiceDeltaToolCall(index=0, id=None, function=ChoiceDeltaToolCallFunction(arguments='1', name=None), type=None)
ChoiceDeltaToolCall(index=0, id=None, function=ChoiceDeltaToolCallFunction(arguments='}', name=None), type=None)It seems to fix your issue. |
|
@avigny hey! I've being trying to test your solution but with no success. This is what I'm doing: Where template.jinja is this one and mistral_tool_parser.py is the one that you've created. I'm using this test request: When I set stream to false, I'm getting this response: And this error: When I set stream=true, I dont receive any errors, but the response does not have tool calls: Am I doing something wrong here? |
|
Looks liks this PR unfortunately don't fix issues on Mistral Small 3.2. API Call : {
"stream": false,
"temperature": 0.15,
"top_p": 1.0,
"tool_choice": "auto",
"model": "mistralai/Mistral-Small-3.2-24B-Instruct-2506",
"messages": [
{
"role": "user",
"content": "Hi ! What's the result of 95478415 / 4571 ?"
}
],
"tools": [
{
"type":"function",
"function": {
"name":"calculator",
"description":"Perform a basic calculation using ruby syntax for arithmetic operations.",
"parameters": {
"type":"object",
"properties": {
"calculation": {
"type":"string",
"description":"A basic arithmetic calculation in python language (e.g., \"2+2\", \"10*3\", \"45/9\").",
"required":["calculation"]
}
},
"required":["calculation"]
}
}
}
]
}Still have this error : Here are some logs : === model_output ===
[TOOL_CALLS]calculator{"calculation": "95478415 / 4571"}
=== tool_content ===
calculator{"calculation": "95478415 / 4571"}Please note that this issue is NOT happening when using |
|
Yes you're both right! Thanks for finding this! |
|
Any update on getting this merge? |
|
cc @aarnphm |
|
So I did more complete testing and found this wasn't working that well after all -- I was getting the same errors reported above. Not sure what happened on initial testing. But, I've since taken it and have a working implementation, for streaming at least, at https://github.com/sjuxax/vllm/tree/Mistral3.2-tool-call-fix. I'm going to cherry-pick it onto #20471 in a sec. Then using that branch should work with quantized HF models and tool calling. |
PedroMiolaSilva
left a comment
There was a problem hiding this comment.
I think replacing lines 127:139 with this below will fix it for non-streaming:
#First, use the tool call token to split, and we discard the first item, because it is empty
raw_tool_calls = model_output.split(self.bot_token)[1:]
function_call_arr = []
for raw_tool_call in raw_tool_calls:
tool_name = raw_tool_call.split("{")[0]
tool_arguments_begin = raw_tool_call.find("{")
tool_arguments = raw_tool_call[tool_arguments_begin:]
function_call_arr.append({
"name": tool_name,
"arguments": json.loads(tool_arguments)
})
Signed-off-by: avigny <47987522+avigny@users.noreply.github.com>
Signed-off-by: avigny <47987522+avigny@users.noreply.github.com>
chaunceyjiang
left a comment
There was a problem hiding this comment.
I'm very sorry I missed this PR. Let's merge it quickly so we can promptly add support for Ministral-3.
|
@chaunceyjiang Thank you! |
Signed-off-by: Chauncey <chaunceyjiang@gmail.com>
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>
|
It's too late to include in v0.12.0 but it'll be in v0.12.1 |
…enerator` Signed-off-by: avigny <47987522+avigny@users.noreply.github.com>
Signed-off-by: avigny <47987522+avigny@users.noreply.github.com>
|
The unit tests should be fixed now |
The new |
That's a shame, let's hope it won't be as complicated as this fix. Thanks for all your efforts and persistence! |
I worked all the day on the issue fork branch, it works well but it seems to have one last issue. I'm facing an error in agentic task, the error appears on long context (not totally sure)
|
|
@Erwandsn this looks like a tool call generated by a v13 tokenizer, which is not handled by this PR. (edit: this looks indeed like a v11 tool call and not a v13 tool call. I'm sorry) |
|
@avigny Thanks for your answer |
|
Hello, I'll soon open a PR with a fix for this that relies on mistral-common (with MistralTokenizer). There will be 2 commits:
If you want to test it in the meantime (I'd be glad to know it also works for others), here's the updated code (from DetailsThis is how I mount this for local tests, I use `docker.io/vllm/vllm-openai:latest` (so v0.12.0 from yesterday):--volume=/path/to/local/mistral_tool_parser.py:/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/openai/tool_parsers/mistral_tool_parser.pyHere is the updated code for my upcoming PR. # SPDX-License-Identifier: Apache-2.0
# SPDX-FileCopyrightText: Copyright contributors to the vLLM project
"""
Mistral tool call parser for v11+ models.
This implementation uses token-based parsing for streaming, leveraging the
atomic nature of special token IDs ([TOOL_CALLS], [ARGS], [CALL_ID]) to
reliably detect tool call boundaries.
Supported models: Mistral-Small-3.1+, Ministral-3+, and other v11+ models.
Note: Pre-v11 models (Mistral-7B-Instruct-v0.1/v0.2/v0.3) are not supported.
These older models have limited tool calling capabilities and require complex
text-based parsing with partial JSON handling. Users should upgrade to v11+
models for reliable tool calling support.
"""
import json
from collections.abc import Sequence
from enum import Enum, auto
from random import choices
from string import ascii_letters, digits
import regex as re
from pydantic import Field
from vllm.entrypoints.openai.protocol import (
ChatCompletionRequest,
DeltaFunctionCall,
DeltaMessage,
DeltaToolCall,
ExtractedToolCallInformation,
FunctionCall,
ToolCall,
)
from vllm.entrypoints.openai.tool_parsers.abstract_tool_parser import (
ToolParser,
)
from vllm.logger import init_logger
from vllm.tokenizers import MistralTokenizer, TokenizerLike
logger = init_logger(__name__)
ALPHANUMERIC = ascii_letters + digits
class MistralToolCall(ToolCall):
id: str = Field(default_factory=lambda: MistralToolCall.generate_random_id())
@staticmethod
def generate_random_id():
# Mistral Tool Call Ids must be alphanumeric with a length of 9.
# https://github.com/mistralai/mistral-common/blob/21ee9f6cee3441e9bb1e6ed2d10173f90bd9b94b/src/mistral_common/protocol/instruct/validator.py#L299
return "".join(choices(ALPHANUMERIC, k=9))
@staticmethod
def is_valid_id(id: str) -> bool:
return id.isalnum() and len(id) == 9
class StreamingState(Enum):
"""Streaming state for tool call parsing."""
CONTENT = auto() # Before any [TOOL_CALLS] token
PARSING_TOOL_NAME = auto() # After [TOOL_CALLS], parsing function name
PARSING_TOOL_ARGS = auto() # Parsing JSON arguments
COMPLETE = auto() # All tools parsed
class MistralToolParser(ToolParser):
"""
Tool call parser for Mistral v11+ models.
Supports the v11+ format: [TOOL_CALLS]name[ARGS]{...}
Optionally with call ID: [TOOL_CALLS]name[CALL_ID]id[ARGS]{...}
This parser requires MistralTokenizer (tokenizer_mode=mistral) and
models using tokenizer version 11 or higher.
"""
def __init__(self, tokenizer: TokenizerLike):
super().__init__(tokenizer)
if not isinstance(self.model_tokenizer, MistralTokenizer):
raise RuntimeError(
"MistralToolParser requires MistralTokenizer. "
"Please use tokenizer_mode='mistral' in your vLLM configuration. "
"Note: Only v11+ Mistral models are supported for tool calling."
)
self._mistral_base_tokenizer = self.model_tokenizer.tokenizer
self._version = self.model_tokenizer.version
if self._version < 11:
raise RuntimeError(
f"MistralToolParser requires tokenizer version 11 or higher, "
f"but got version {self._version}. Pre-v11 models "
"(Mistral-7B-Instruct-v0.1/v0.2/v0.3) are not supported for "
"tool calling. Please use a v11+ model such as "
"Mistral-Small-3.1 or Ministral-3."
)
# Get bot token info
self.bot_token = "[TOOL_CALLS]"
self.bot_token_id = self.vocab.get(self.bot_token)
if self.bot_token_id is None:
raise RuntimeError(
"Mistral Tool Parser could not locate the [TOOL_CALLS] token "
"in the tokenizer!"
)
# Get control tokens for v11+ format
try:
self._args_token_id = self._mistral_base_tokenizer.get_control_token(
"[ARGS]"
)
except Exception:
raise RuntimeError(
"Mistral Tool Parser could not locate the [ARGS] token. "
"This token is required for v11+ tool call parsing."
)
self._call_id_token_id: int | None = None
try:
self._call_id_token_id = self._mistral_base_tokenizer.get_control_token(
"[CALL_ID]"
)
except Exception:
# [CALL_ID] is optional - some models may not have it
pass
# Regex for non-streaming parsing: name{args}
self.fn_name_regex = re.compile(
r"([a-zA-Z0-9_-]+)(\{[\s\S]*?\}+)", re.DOTALL
)
# Streaming state
self._streaming_state = StreamingState.CONTENT
self._current_tool_index = -1
self._current_tool_id: str | None = None
self._current_tool_name: str = ""
self._current_tool_args: str = ""
self._brace_depth = 0
# For compatibility with serving_chat.py's finish_reason detection
self.prev_tool_call_arr: list[dict] = []
def extract_tool_calls(
self,
model_output: str,
request: ChatCompletionRequest,
) -> ExtractedToolCallInformation:
"""
Extract tool calls from a complete model response.
Parses the v11+ format: [TOOL_CALLS]name{args}[TOOL_CALLS]name{args}...
"""
# Fast path: no tool call token present
if self.bot_token not in model_output:
return ExtractedToolCallInformation(
tools_called=False, tool_calls=[], content=model_output
)
try:
# Get content before tool calls
content = model_output.split(self.bot_token)[0]
content = content if content.strip() else None
# Parse tool calls from each segment after [TOOL_CALLS]
function_call_arr = []
for segment in model_output.split(self.bot_token):
if not segment.strip():
continue
matches = self.fn_name_regex.findall(segment)
for match in matches:
fn_name = match[0]
args = match[1]
function_call_arr.append(
{"name": fn_name, "arguments": json.loads(args)}
)
# Convert to MistralToolCall objects
tool_calls: list[MistralToolCall] = [
MistralToolCall(
type="function",
function=FunctionCall(
name=raw_function_call["name"],
arguments=json.dumps(
raw_function_call["arguments"], ensure_ascii=False
),
),
)
for raw_function_call in function_call_arr
]
return ExtractedToolCallInformation(
tools_called=True,
tool_calls=tool_calls,
content=content,
)
except Exception:
logger.exception("Error in extracting tool call from response.")
return ExtractedToolCallInformation(
tools_called=False,
tool_calls=[],
content=model_output.replace(self.bot_token, "").strip(),
)
def extract_tool_calls_streaming(
self,
previous_text: str,
current_text: str,
delta_text: str,
previous_token_ids: Sequence[int],
current_token_ids: Sequence[int],
delta_token_ids: Sequence[int],
request: ChatCompletionRequest,
) -> DeltaMessage | None:
"""
Extract tool calls from streaming output using token-based parsing.
Token IDs are atomic - they cannot be split across chunks - which
eliminates a whole class of parsing bugs that affect text-based parsing.
"""
# If no tool call token seen yet, emit as content
if self.bot_token_id not in current_token_ids:
return DeltaMessage(content=delta_text)
return self._stream_tool_calls(delta_token_ids)
def _stream_tool_calls(
self, delta_token_ids: Sequence[int]
) -> DeltaMessage | None:
"""
Stream tool calls using token-based parsing.
Detects [TOOL_CALLS] and [ARGS] tokens to identify tool call boundaries,
then streams function names and arguments as they arrive.
"""
from mistral_common.tokens.tokenizers.base import SpecialTokenPolicy
delta_tool_calls: list[DeltaToolCall] = []
for token_id in delta_token_ids:
if token_id == self.bot_token_id:
# Starting a new tool call
self._current_tool_index += 1
self._current_tool_id = MistralToolCall.generate_random_id()
self._current_tool_name = ""
self._current_tool_args = ""
self._brace_depth = 0
self._streaming_state = StreamingState.PARSING_TOOL_NAME
# Set flag for finish_reason detection
if not self.prev_tool_call_arr:
self.prev_tool_call_arr = [{"arguments": {}}]
# Initialize streamed_args_for_tool for this tool index
while len(self.streamed_args_for_tool) <= self._current_tool_index:
self.streamed_args_for_tool.append("")
elif token_id == self._args_token_id:
# Transition from name to arguments
if self._streaming_state == StreamingState.PARSING_TOOL_NAME:
# Emit the complete function name
delta_tool_calls.append(
DeltaToolCall(
index=self._current_tool_index,
type="function",
id=self._current_tool_id,
function=DeltaFunctionCall(
name=self._current_tool_name.strip()
).model_dump(exclude_none=True),
)
)
self._streaming_state = StreamingState.PARSING_TOOL_ARGS
elif token_id == self._call_id_token_id:
# Skip call ID tokens (they come between name and [ARGS])
# We generate our own IDs
pass
elif self._streaming_state == StreamingState.CONTENT:
# Before any tool call - shouldn't happen if bot_token_id
# is in current_token_ids, but handle gracefully
pass
elif self._streaming_state == StreamingState.PARSING_TOOL_NAME:
# Accumulate name tokens
token_str = self._mistral_base_tokenizer.decode(
[token_id], special_token_policy=SpecialTokenPolicy.IGNORE
)
self._current_tool_name += token_str
elif self._streaming_state == StreamingState.PARSING_TOOL_ARGS:
# Stream argument tokens
token_str = self._mistral_base_tokenizer.decode(
[token_id], special_token_policy=SpecialTokenPolicy.IGNORE
)
# Track brace depth for nested JSON
for char in token_str:
if char == "{":
self._brace_depth += 1
elif char == "}":
self._brace_depth -= 1
self._current_tool_args += token_str
# Update streamed_args_for_tool for vLLM's finish handling
if self._current_tool_index < len(self.streamed_args_for_tool):
self.streamed_args_for_tool[self._current_tool_index] = (
self._current_tool_args
)
# Emit arguments delta
delta_tool_calls.append(
DeltaToolCall(
index=self._current_tool_index,
function=DeltaFunctionCall(
arguments=token_str
).model_dump(exclude_none=True),
)
)
# Build response
if delta_tool_calls:
return DeltaMessage(tool_calls=delta_tool_calls)
return NonePS: I had already followed this approach in a PR on mlx-lm this summer. The |
|
Hi to summarize so far this added PR will fix the problem occurring for the older Mistral-Small-3.1-24B-Instruct-2503 and Devstral-Small-2505 because they use tekken version v7 however the new models including Mistral Large 3 and Ministral 3 and not so new models Mistral-Small-3.2-24B-Instruct-2506 and Devstral-Small-2507 because they use tekken v13? @avigny @graelo |
|
@graelo Great work ! I've tested your new mistral_tool_parser.py class during the last two days and it work well, but i'm still facing an issue, one case that break the parser: Error output (json seems parsing crash, but JSON pass synthax validation):
Detail of the tool call producing the error:
An other response triggering the error:
${event.event_name}\nVenue: ${event.venue} \nDate: ${event.date} | Time: ${event.start_time} \n <div class="mt-4">\nMain Event: ${mainFight}\n <div class="flex flex-wrap gap-4 mt-2">\n ${fighter1 ?<div class=\"fighter-card\"><p><strong>${fighter1.nickname}:</strong> ${fighter1.record}</p></div> : ''}\n ${fighter2 ? <div class=\"fighter-card\"><p><strong>${fighter2.nickname}:</strong> ${fighter2.record}</p></div> : ''}\n \n ${oddsText ? <div class=\"odds-card mt-4\">${oddsText}</div> : ''}\n \n ;\n \n eventsSection.appendChild(eventCard);\n });\n }\n\n renderData();\n});\n```\n\n4. **events.json**: \n```json\n[\n {\n \"event_name\": \"UFC Fight Night: Royval vs. Kape\",\n \"venue\": \"UFC APEX, Las Vegas, NV, USA\",\n \"date\": \"2025-12-14\",\n \"start_time\": \"03:00 AM (UTC-8)\",\n \"main_card\": [\n {\n \"fight\": \"Brandon Royval vs. Manel Kape\",\n \"rounds\": 5,\n \"weight_class\": \"Lightweight\"\n }\n ]\n },\n {\n \"event_name\": \"UFC 323: Dvalishvili vs. Yan\",\n \"venue\": \"T-Mobile Arena, Las Vegas, NV, USA\",\n \"date\": \"2025-12-06\",\n \"start_time\": \"03:00 AM (UTC-8)\",\n \"main_card\": [\n {\n \"fight\": \"Merab Dvalishvili vs. Petr Yan\",\n \"rounds\": 5,\n \"weight_class\": \"Bantamweight\"\n }\n ]\n }\n]\n```\n\n5. **fighters.json**: \n```json\n[\n {\n \"nickname\": \"Raw Dawg\",\n \"fighter_name\": \"Brandon Royval\",\n \"record\": \"17-8-0 (UFC: 12-6-0)\",\n \"ranking\": \"Undisputed\"\n },\n {\n \"nickname\": \"Starboy\",\n \"fighter_name\": \"Manel Kape\",\n \"record\": \"21-7-0 (UFC: 13-7-0)\",\n \"ranking\": \"Undisputed\"\n },\n {\n \"nickname\": \"The Machine\",\n \"fighter_name\": \"Merab Dvalishvili\",\n \"record\": \"21-4-0 (UFC: 14-3-0)\",\n \"ranking\": \"UFC Bantamweight Champion\"\n },\n {\n \"nickname\": \"No Mercy\",\n \"fighter_name\": \"Petr Yan\",\n \"record\": \"19-5-0 (UFC: 10-4-0)\",\n \"ranking\": \"#2 in UFC Bantamweight\"\n }\n]\n```\n\n6. **odds.json**: \n```json\n{\n \"ufc_events\": [\n {\n \"event_name\": \"UFC Fight Night: Royval vs. Kape\",\n \"key_fights\": [\n {\n \"fight\": \"Brandon Royval vs. Manel Kape\",\n \"odds\": {\n \"brandon_royval\": 2.10,\n \"manel_kape\": 1.90,\n \"draw\": 12.00\n }\n }\n ]\n },\n {\n \"event_name\": \"UFC 323: Dvalishvili vs. Yan\",\n \"key_fights\": [\n {\n \"fight\": \"Merab Dvalishvili vs. Petr Yan\",\n \"odds\": {\n \"merab_dvalishvili\": 1.80,\n \"petr_yan\": 3.50,\n \"draw\": 12.00\n }\n }\n ]\n }\n ]\n}\n```\n\n### Instructions for Publishing\n\n1. **Directory Structure**: - Ensure the files are organized in the correct structure: ``` ufc-events-2025/ ├── index.html ├── css/ │ └── style.css ├── js/ │ └── script.js └── data/ ├── events.json ├── fighters.json └── odds.json ``` 2. **Publish the Website**: - Upload the entire ufc-events-2025folder to your preferred hosting platform. Platforms like Netlify, Vercel, or any static web hosting service should accommodate this setup. - Make sure the files are accessible from the root directory of the hosting service. - Test the website by openingindex.html in a browser to ensure all dynamic content loads correctly. 3. **Dependencies**: - No external dependencies (like npm packages) are required for this static site. Simply upload the files as-is. Publish this website and provide the URL once it's live.", "parent": "4a008c35-1487-4acf-a927-39a602e33e1a"}
Hope these issues can be resolved, the models are really good in my tests. Can't wait to have it fully working |
Hi @jayteaftw, if I'm not mistaken,
|
|
@Erwandsn I apologize, I was wrong when saying this call looked like a v13 call This indeed looks like a v11 tool call and this PR aimed at repairing tool calls like these. Looking at the tool calls failing: This one should workI've added locally this tool call as a test case and it passes I'm not exactly sure why this one fails 🤷 This second tool call is not valid (error is expected)Looking closely, I've found some un-escaped also in the last line: I think these are causing the My wild guess is that the model you are using is struggling to generate a valid json when the tool argument is getting very big. See complete tool call from your comment: |
|
@avigny No problem, this is really a weird issue. I also think this is related with the complexity of the json (wich doesn't help for debug). But i had the case with smaller json, i think the complexity of variables escape when HTML is included in the JSON make the model mess up, (I'm on Ministral-3-8b-instruct). And my way to pass files content with JSON really suck and is a part of the problem. So i think this is not an emergency issue. Thanks for your investigations ! |
|
Is this PR related? #30332 |
Yes. there seemed to be an issue with non streaming complex tool calls like From what I understand, the regex failed to match the complete tool arguments and did stop at the first |
Purpose
Fixes #13622
Fixes #17585
Fixes #20028
This PR is similar to #16096 (hermes tool parser)
In summary
Repairs tool call in streaming mode for (older) models with tokenizer version <v11
The model output is incrementaly parsed with
ijsonemitting events used to know what is being streamed (what part of the tool call). for more details see_extract_tool_calls_streaming_pre_v11_tokenizerQuick unit tests added in
tests/tool_use/test_mistral_tool_parser.pyseetest_extract_tool_calls_streaming_pre_v11_tokenizerAdds support for tool calls in streaming mode for recent models (tokenizer version >=v11)
See
_extract_tool_calls_streamingfor implementation detailsTest added for
mistralai/Mistral-Small-3.2-24B-Instruct-2506intests/tool_use/test_mistral_tool_parser.pyQuick unit tests added in
tests/tool_use/test_mistral_tool_parser.pyseetest_extract_tool_calls_streamingTest Plan
I've added a test file
tests/tool_use/test_mistral_tool_parser.pyfor easy and fast testing. This file works similarly as the existingtests/tool_use/test_jamba_tool_parser.py.This tests the parsing functions with a mocked model output. It allows toeasily test edge cases.
Use
pytest tests/tool_use/test_mistral_tool_parser.pyto run this test file.Test added for
mistralai/Mistral-Small-3.2-24B-Instruct-2506intests/tool_use/test_mistral_tool_parser.py(Optional) Documentation Update
I believe no documentation update is needed