Skip to content

[Bugfix] Mistral tool parser streaming update#19425

Merged
DarkLight1337 merged 80 commits intovllm-project:mainfrom
avigny:mistral-tool-parser-streaming-update
Dec 3, 2025
Merged

[Bugfix] Mistral tool parser streaming update#19425
DarkLight1337 merged 80 commits intovllm-project:mainfrom
avigny:mistral-tool-parser-streaming-update

Conversation

@avigny
Copy link
Contributor

@avigny avigny commented Jun 10, 2025

Purpose

Fixes #13622
Fixes #17585
Fixes #20028

This PR is similar to #16096 (hermes tool parser)

In summary

Repairs tool call in streaming mode for (older) models with tokenizer version <v11

The model output is incrementaly parsed with ijson emitting events used to know what is being streamed (what part of the tool call). for more details see _extract_tool_calls_streaming_pre_v11_tokenizer
Quick unit tests added in tests/tool_use/test_mistral_tool_parser.py see test_extract_tool_calls_streaming_pre_v11_tokenizer

Adds support for tool calls in streaming mode for recent models (tokenizer version >=v11)

See _extract_tool_calls_streaming for implementation details
Test added for mistralai/Mistral-Small-3.2-24B-Instruct-2506 in tests/tool_use/test_mistral_tool_parser.py
Quick unit tests added in tests/tool_use/test_mistral_tool_parser.py see test_extract_tool_calls_streaming

Test Plan

I've added a test file tests/tool_use/test_mistral_tool_parser.py for easy and fast testing. This file works similarly as the existing tests/tool_use/test_jamba_tool_parser.py.

This tests the parsing functions with a mocked model output. It allows toeasily test edge cases.

Use pytest tests/tool_use/test_mistral_tool_parser.py to run this test file.

Test added for mistralai/Mistral-Small-3.2-24B-Instruct-2506 in tests/tool_use/test_mistral_tool_parser.py

(Optional) Documentation Update

I believe no documentation update is needed

@github-actions
Copy link

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

🚀

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Summary of Changes

Hello @avigny, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request addresses an issue related to streaming tool calls for Mistral models by replacing the previous partial_json_parser-based implementation with a more robust, custom stateful parser. This new approach aims to accurately extract tool call information, including names and arguments, as tokens are streamed, improving the reliability of tool use functionality in streaming mode. The changes include a significant rewrite of the parsing logic and the addition of comprehensive test coverage.

Highlights

  • Refactor Streaming Parser: The core logic for parsing Mistral tool calls during streaming has been completely rewritten.
  • Removed Dependency: The dependency on the partial_json_parser library for streaming tool call extraction has been removed.
  • New Parsing Mechanism: Introduced a custom stateful parsing mechanism using regex and json.JSONDecoder.raw_decode to incrementally extract tool call names and arguments from the raw token stream.
  • Comprehensive Tests: Added a new, extensive test file (tests/tool_use/test_mistral_tool_parser.py) with various test cases covering both streaming and non-streaming scenarios for Mistral tool calls, including single and multiple tool calls with different argument structures.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point in your pull request via creating an issue comment (i.e. comment on the pull request page) using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in issue comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist is currently in preview and may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments to provide feedback.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configureGemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request refactors the streaming tool call parsing logic for Mistral models and adds a comprehensive test suite. The core change involves replacing partial_json_parser with a custom regex and json.raw_decode-based approach for more fine-grained control over the streaming process. The new tests cover a variety of scenarios. The review includes stylistic suggestions for the tests and points for consideration regarding complexity and state management in the new parsing logic.

avigny added 4 commits June 11, 2025 10:12
Tests are similar as the ones added for Jamba models in vllm-project#9154

Signed-off-by: avigny <47987522+avigny@users.noreply.github.com>
Signed-off-by: avigny <47987522+avigny@users.noreply.github.com>
Signed-off-by: avigny <47987522+avigny@users.noreply.github.com>
Signed-off-by: avigny <47987522+avigny@users.noreply.github.com>
@avigny avigny force-pushed the mistral-tool-parser-streaming-update branch from c468495 to d6d17c1 Compare June 11, 2025 08:13
@avigny avigny marked this pull request as ready for review June 11, 2025 09:25
@avigny avigny requested a review from aarnphm as a code owner June 11, 2025 09:25
@avigny
Copy link
Contributor Author

avigny commented Jun 11, 2025

@hibukipanim I did run the test you provided in your issue description #17585 (comment) and got the following output:

ChoiceDeltaToolCall(index=0, id='j6OY9szTS', function=ChoiceDeltaToolCallFunction(arguments=None, name='mcp_confluence'), type='function')
ChoiceDeltaToolCall(index=0, id=None, function=ChoiceDeltaToolCallFunction(arguments='{"', name=None), type=None)
ChoiceDeltaToolCall(index=0, id=None, function=ChoiceDeltaToolCallFunction(arguments='query', name=None), type=None)
ChoiceDeltaToolCall(index=0, id=None, function=ChoiceDeltaToolCallFunction(arguments='":', name=None), type=None)
ChoiceDeltaToolCall(index=0, id=None, function=ChoiceDeltaToolCallFunction(arguments=' "', name=None), type=None)
ChoiceDeltaToolCall(index=0, id=None, function=ChoiceDeltaToolCallFunction(arguments='co', name=None), type=None)
ChoiceDeltaToolCall(index=0, id=None, function=ChoiceDeltaToolCallFunction(arguments='ffee', name=None), type=None)
ChoiceDeltaToolCall(index=0, id=None, function=ChoiceDeltaToolCallFunction(arguments='",', name=None), type=None)
ChoiceDeltaToolCall(index=0, id=None, function=ChoiceDeltaToolCallFunction(arguments=' "', name=None), type=None)
ChoiceDeltaToolCall(index=0, id=None, function=ChoiceDeltaToolCallFunction(arguments='limit', name=None), type=None)
ChoiceDeltaToolCall(index=0, id=None, function=ChoiceDeltaToolCallFunction(arguments='":', name=None), type=None)
ChoiceDeltaToolCall(index=0, id=None, function=ChoiceDeltaToolCallFunction(arguments=' ', name=None), type=None)
ChoiceDeltaToolCall(index=0, id=None, function=ChoiceDeltaToolCallFunction(arguments='1', name=None), type=None)
ChoiceDeltaToolCall(index=0, id=None, function=ChoiceDeltaToolCallFunction(arguments='}', name=None), type=None)

It seems to fix your issue.
Please let me know If I missed something.

@avigny avigny changed the title Mistral tool parser streaming update [Bugfix] Mistral tool parser streaming update Jun 11, 2025
@PedroMiolaSilva
Copy link

@avigny hey!

I've being trying to test your solution but with no success. This is what I'm doing:

source ../.env
export MODEL_ID=unsloth/Mistral-Small-3.2-24B-Instruct-2506-FP8
export MODEL_ID_PORT=8000
export MODEL_ID_GPU=0

docker run \
--runtime nvidia \
-e VLLM_USE_V1=1 \
--gpus all \
--ipc=host \
-p "${MODEL_ID_PORT}:8000" \
--env "HUGGING_FACE_HUB_TOKEN=${HUGGING_FACE_HUB_TOKEN}" \
--env "HF_HUB_OFFLINE=0" \
-v "${HF_HOME}:/root/.cache/huggingface" \
-v "./mistral_tool_parser.py:/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/openai/tool_parsers/mistral_tool_parser.py" \
vllm/vllm-openai:latest \
-v "$(pwd):/app" \
--model ${MODEL_ID} \
--tool-call-parser mistral \
--chat-template /app/template.jinja
--enable-auto-tool-choice \
--limit-mm-per-prompt 'image=1' \
--tokenizer_mode mistral \
--config_format mistral \
--load_format mistral \
--max-model-len 64000 \
--gpu-memory-utilization 0.8

Where template.jinja is this one and mistral_tool_parser.py is the one that you've created.

I'm using this test request:

curl -X POST \
   http://localhost:8000/v1/chat/completions \
   -H "Content-Type: application/json" \
   -d '{
   "model": "unsloth/Mistral-Small-3.2-24B-Instruct-2506-FP8",
   "messages": [
    {"role":"system","content":"You have access to the weather tool. You should call this tool when you think it makes sense"},
     {"role": "user", "content": "What'\''s the weather in New York?"}
   ],
   "tools": [
     {
       "type": "function",
       "function": {
         "name": "get_weather",
         "description": "Get the current weather in a given location",
         "parameters": {
           "type": "object",
           "properties": {
             "location": {
               "type": "string",
               "description": "The city and state, e.g. San Francisco, CA"
             }
           },
           "required": ["location"]
         }
       }
     }
   ]
 }'

When I set stream to false, I'm getting this response:

{"id":"chatcmpl-0dc2b75406114cbcb4f95735ccfdb094","object":"chat.completion","created":1751490167,"model":"unsloth/Mistral-Small-3.2-24B-Instruct-2506-FP8","choices":[{"index":0,"message":{"role":"assistant","reasoning_content":null,"content":"[TOOL_CALLS]get_weather{\"location\": \"New York, NY\"}","tool_calls":[]},"logprobs":null,"finish_reason":"stop","stop_reason":null}],"usage":{"prompt_tokens":112,"total_tokens":127,"completion_tokens":15,"prompt_tokens_details":null},"prompt_logprobs":null,"kv_transfer_params":null}

And this error:

ERROR 07-02 14:00:22 [mistral_tool_parser.py:160] Error in extracting tool call from response.
ERROR 07-02 14:00:22 [mistral_tool_parser.py:160] Traceback (most recent call last):
ERROR 07-02 14:00:22 [mistral_tool_parser.py:160]   File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/openai/tool_parsers/mistral_tool_parser.py", line 131, in extract_tool_calls
ERROR 07-02 14:00:22 [mistral_tool_parser.py:160]     function_call_arr = json.loads(tool_content)
ERROR 07-02 14:00:22 [mistral_tool_parser.py:160]                         ^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 07-02 14:00:22 [mistral_tool_parser.py:160]   File "/usr/lib/python3.12/json/__init__.py", line 346, in loads
ERROR 07-02 14:00:22 [mistral_tool_parser.py:160]     return _default_decoder.decode(s)
ERROR 07-02 14:00:22 [mistral_tool_parser.py:160]            ^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 07-02 14:00:22 [mistral_tool_parser.py:160]   File "/usr/lib/python3.12/json/decoder.py", line 338, in decode
ERROR 07-02 14:00:22 [mistral_tool_parser.py:160]     obj, end = self.raw_decode(s, idx=_w(s, 0).end())
ERROR 07-02 14:00:22 [mistral_tool_parser.py:160]                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 07-02 14:00:22 [mistral_tool_parser.py:160]   File "/usr/lib/python3.12/json/decoder.py", line 356, in raw_decode
ERROR 07-02 14:00:22 [mistral_tool_parser.py:160]     raise JSONDecodeError("Expecting value", s, err.value) from None
ERROR 07-02 14:00:22 [mistral_tool_parser.py:160] json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)
ERROR 07-02 14:00:22 [mistral_tool_parser.py:160] 
ERROR 07-02 14:00:22 [mistral_tool_parser.py:160] During handling of the above exception, another exception occurred:
ERROR 07-02 14:00:22 [mistral_tool_parser.py:160] 
ERROR 07-02 14:00:22 [mistral_tool_parser.py:160] Traceback (most recent call last):
ERROR 07-02 14:00:22 [mistral_tool_parser.py:160]   File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/openai/tool_parsers/mistral_tool_parser.py", line 137, in extract_tool_calls
ERROR 07-02 14:00:22 [mistral_tool_parser.py:160]     raw_tool_call = self.tool_call_regex.findall(tool_content)[0]
ERROR 07-02 14:00:22 [mistral_tool_parser.py:160]                     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^^^
ERROR 07-02 14:00:22 [mistral_tool_parser.py:160] IndexError: list index out of range

When I set stream=true, I dont receive any errors, but the response does not have tool calls:

data: {"id":"chatcmpl-028934e8ee754938943457f631313546","object":"chat.completion.chunk","created":1751490269,"model":"unsloth/Mistral-Small-3.2-24B-Instruct-2506-FP8","choices":[{"index":0,"delta":{"role":"assistant","content":""},"logprobs":null,"finish_reason":null}]}

Am I doing something wrong here?

@rdlh
Copy link

rdlh commented Jul 3, 2025

Looks liks this PR unfortunately don't fix issues on Mistral Small 3.2.

API Call :

{
    "stream": false,
    "temperature": 0.15,
    "top_p": 1.0,
    "tool_choice": "auto",
    "model": "mistralai/Mistral-Small-3.2-24B-Instruct-2506",
    "messages": [
        {
            "role": "user",
            "content": "Hi ! What's the result of 95478415 / 4571 ?"
        }
    ],
    "tools": [
        {
            "type":"function",
            "function": {
            "name":"calculator",
            "description":"Perform a basic calculation using ruby syntax for arithmetic operations.",
            "parameters": {
                "type":"object",
                "properties": {
                "calculation": {
                    "type":"string",
                    "description":"A basic arithmetic calculation in python language (e.g., \"2+2\", \"10*3\", \"45/9\").",
                    "required":["calculation"]
                }
                },
                "required":["calculation"]
            }
            }
        }
    ]
}

Still have this error :

ERROR 07-03 01:55:20 [mistral_tool_parser.py:166] Error in extracting tool call from response.
ERROR 07-03 01:55:20 [mistral_tool_parser.py:166] Traceback (most recent call last):
ERROR 07-03 01:55:20 [mistral_tool_parser.py:166]     function_call_arr = json.loads(tool_content)

Here are some logs :

=== model_output ===
[TOOL_CALLS]calculator{"calculation": "95478415 / 4571"}
=== tool_content ===
calculator{"calculation": "95478415 / 4571"}

Please note that this issue is NOT happening when using "tool_choice": "required".

@avigny
Copy link
Contributor Author

avigny commented Jul 3, 2025

Yes you're both right!
I believe I did branch out and started working on my fix before the changes introduced by #19193 which introduced the use of fn_name_regex from the model tokenizer.
I'll try to port this to the extract_tool_calls_streaming method.

Thanks for finding this!

@gaby
Copy link

gaby commented Jul 4, 2025

Any update on getting this merge?

@DarkLight1337
Copy link
Member

cc @aarnphm

@sjuxax
Copy link
Contributor

sjuxax commented Jul 4, 2025

So I did more complete testing and found this wasn't working that well after all -- I was getting the same errors reported above. Not sure what happened on initial testing. But, I've since taken it and have a working implementation, for streaming at least, at https://github.com/sjuxax/vllm/tree/Mistral3.2-tool-call-fix. I'm going to cherry-pick it onto #20471 in a sec. Then using that branch should work with quantized HF models and tool calling.

Copy link

@PedroMiolaSilva PedroMiolaSilva left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think replacing lines 127:139 with this below will fix it for non-streaming:

            #First, use the tool call token to split, and we discard the first item, because it is empty
            raw_tool_calls = model_output.split(self.bot_token)[1:] 
            function_call_arr = []
            for raw_tool_call in raw_tool_calls:
                tool_name = raw_tool_call.split("{")[0]
                tool_arguments_begin = raw_tool_call.find("{")
                tool_arguments = raw_tool_call[tool_arguments_begin:]
                function_call_arr.append({
                                        "name": tool_name,
                                        "arguments": json.loads(tool_arguments)
                })

sjuxax pushed a commit to sjuxax/vllm that referenced this pull request Jul 4, 2025
avigny added 3 commits July 6, 2025 21:14
Signed-off-by: avigny <47987522+avigny@users.noreply.github.com>
Signed-off-by: avigny <47987522+avigny@users.noreply.github.com>
Copy link
Collaborator

@chaunceyjiang chaunceyjiang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm very sorry I missed this PR. Let's merge it quickly so we can promptly add support for Ministral-3.

@gaby
Copy link

gaby commented Dec 3, 2025

@chaunceyjiang Thank you!

Signed-off-by: Chauncey <chaunceyjiang@gmail.com>
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>
@DarkLight1337
Copy link
Member

It's too late to include in v0.12.0 but it'll be in v0.12.1

DarkLight1337 and others added 3 commits December 3, 2025 12:40
…enerator`

Signed-off-by: avigny <47987522+avigny@users.noreply.github.com>
Signed-off-by: avigny <47987522+avigny@users.noreply.github.com>
@avigny
Copy link
Contributor Author

avigny commented Dec 3, 2025

The unit tests should be fixed now

@avigny
Copy link
Contributor Author

avigny commented Dec 3, 2025

@avigny just tested the new mistralai/Ministral-3-14B-Instruct-2512 , tool calling with non streaming works. streaming tool calling still doesn't ... any chance this fix also solves that for the new model or is specific to Mistral3.2 Small?

The new mistralai/Ministral-3-14B-Instruct-2512 for example is using a mistral tokenizer v13 (see the tekken file) and this implies yet another format (from what I recall), different from the ones repaired by this PR.
Haven't tested these new models but I believe this PR does not bring tool calls for the new mistral models.

@alew3
Copy link

alew3 commented Dec 3, 2025

@avigny just tested the new mistralai/Ministral-3-14B-Instruct-2512 , tool calling with non streaming works. streaming tool calling still doesn't ... any chance this fix also solves that for the new model or is specific to Mistral3.2 Small?

The new mistralai/Ministral-3-14B-Instruct-2512 for example is using a mistral tokenizer v13 (see the tekken file) and this implies yet another format (from what I recall), different from the ones repaired by this PR. Haven't tested these new models but I believe this PR does not bring tool calls for the new mistral models.

That's a shame, let's hope it won't be as complicated as this fix. Thanks for all your efforts and persistence!

@Erwandsn
Copy link

Erwandsn commented Dec 3, 2025

(APIServer pid=1) ERROR 12-03 21:10:40 [mistral_tool_parser.py:202] Error in extracting tool call from response.
(APIServer pid=1) ERROR 12-03 21:10:40 [mistral_tool_parser.py:202] Traceback (most recent call last):
(APIServer pid=1) ERROR 12-03 21:10:40 [mistral_tool_parser.py:202]   File "/app/vllm/vllm/entrypoints/openai/tool_parsers/mistral_tool_parser.py", line 166, in extract_tool_calls
(APIServer pid=1) ERROR 12-03 21:10:40 [mistral_tool_parser.py:202]     {"name": fn_name, "arguments": json.loads(args)}
(APIServer pid=1) ERROR 12-03 21:10:40 [mistral_tool_parser.py:202]                                    ^^^^^^^^^^^^^^^^
(APIServer pid=1) ERROR 12-03 21:10:40 [mistral_tool_parser.py:202]   File "/usr/lib/python3.12/json/__init__.py", line 346, in loads
(APIServer pid=1) ERROR 12-03 21:10:40 [mistral_tool_parser.py:202]     return _default_decoder.decode(s)
(APIServer pid=1) ERROR 12-03 21:10:40 [mistral_tool_parser.py:202]            ^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=1) ERROR 12-03 21:10:40 [mistral_tool_parser.py:202]   File "/usr/lib/python3.12/json/decoder.py", line 337, in decode
(APIServer pid=1) ERROR 12-03 21:10:40 [mistral_tool_parser.py:202]     obj, end = self.raw_decode(s, idx=_w(s, 0).end())
(APIServer pid=1) ERROR 12-03 21:10:40 [mistral_tool_parser.py:202]                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=1) ERROR 12-03 21:10:40 [mistral_tool_parser.py:202]   File "/usr/lib/python3.12/json/decoder.py", line 353, in raw_decode
(APIServer pid=1) ERROR 12-03 21:10:40 [mistral_tool_parser.py:202]     obj, end = self.scan_once(s, idx)
(APIServer pid=1) ERROR 12-03 21:10:40 [mistral_tool_parser.py:202]                ^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=1) ERROR 12-03 21:10:40 [mistral_tool_parser.py:202] json.decoder.JSONDecodeError: Invalid control character at: line 1 column 4184 (char 4183)
(APIServer pid=1) ERROR 12-03 21:10:40 [mistral_tool_parser.py:202] 
(APIServer pid=1) ERROR 12-03 21:10:40 [mistral_tool_parser.py:202] During handling of the above exception, another exception occurred:
(APIServer pid=1) ERROR 12-03 21:10:40 [mistral_tool_parser.py:202] 
(APIServer pid=1) ERROR 12-03 21:10:40 [mistral_tool_parser.py:202] Traceback (most recent call last):
(APIServer pid=1) ERROR 12-03 21:10:40 [mistral_tool_parser.py:202]   File "/app/vllm/vllm/entrypoints/openai/tool_parsers/mistral_tool_parser.py", line 175, in extract_tool_calls
(APIServer pid=1) ERROR 12-03 21:10:40 [mistral_tool_parser.py:202]     raw_tool_call = self.tool_call_regex.findall(tool_content)[0]
(APIServer pid=1) ERROR 12-03 21:10:40 [mistral_tool_parser.py:202]                     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^^^
(APIServer pid=1) ERROR 12-03 21:10:40 [mistral_tool_parser.py:202] IndexError: list index out of range
(APIServer pid=1) INFO:     22.22.22.2:58266 - "POST /v1/chat/completions HTTP/1.1" 200 OK

I worked all the day on the issue fork branch, it works well but it seems to have one last issue.

I'm facing an error in agentic task, the error appears on long context (not totally sure)
Final output in OpenAI endpoint look like unparsed message, so inference is stop here.

[TOOL_CALLS]run_agent{"agent_id": "web_coder", "prompt": "Develop a **modern, responsive landing page** [...]"}

@avigny
Copy link
Contributor Author

avigny commented Dec 4, 2025

@Erwandsn this looks like a tool call generated by a v13 tokenizer, which is not handled by this PR.
The more recent models do have tool call format a bit different from the previous models.
This issue is tracked:
#23180
#29968

(edit: this looks indeed like a v11 tool call and not a v13 tool call. I'm sorry)

@Erwandsn
Copy link

Erwandsn commented Dec 4, 2025

@avigny Thanks for your answer

@graelo
Copy link

graelo commented Dec 4, 2025

Hello, I'll soon open a PR with a fix for this that relies on mistral-common (with MistralTokenizer). There will be 2 commits:

  • the first supports both pre-v11 and v11+ tokenizers,
  • the second commit removes support for pre-v11 models such as mistral-7b, because quite frankly the tool call support for these older models is quite limited. It simplifies the code. I'll let the conversation decide.

If you want to test it in the meantime (I'd be glad to know it also works for others), here's the updated code (from main, after this PR was merged yesterday). I've been using mistralai/ministral-3-14b-instruct-2512 without an issue with this.

Details This is how I mount this for local tests, I use `docker.io/vllm/vllm-openai:latest` (so v0.12.0 from yesterday):
--volume=/path/to/local/mistral_tool_parser.py:/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/openai/tool_parsers/mistral_tool_parser.py

Here is the updated code for my upcoming PR.

# SPDX-License-Identifier: Apache-2.0
# SPDX-FileCopyrightText: Copyright contributors to the vLLM project
"""
Mistral tool call parser for v11+ models.

This implementation uses token-based parsing for streaming, leveraging the
atomic nature of special token IDs ([TOOL_CALLS], [ARGS], [CALL_ID]) to
reliably detect tool call boundaries.

Supported models: Mistral-Small-3.1+, Ministral-3+, and other v11+ models.

Note: Pre-v11 models (Mistral-7B-Instruct-v0.1/v0.2/v0.3) are not supported.
These older models have limited tool calling capabilities and require complex
text-based parsing with partial JSON handling. Users should upgrade to v11+
models for reliable tool calling support.
"""

import json
from collections.abc import Sequence
from enum import Enum, auto
from random import choices
from string import ascii_letters, digits

import regex as re
from pydantic import Field

from vllm.entrypoints.openai.protocol import (
    ChatCompletionRequest,
    DeltaFunctionCall,
    DeltaMessage,
    DeltaToolCall,
    ExtractedToolCallInformation,
    FunctionCall,
    ToolCall,
)
from vllm.entrypoints.openai.tool_parsers.abstract_tool_parser import (
    ToolParser,
)
from vllm.logger import init_logger
from vllm.tokenizers import MistralTokenizer, TokenizerLike

logger = init_logger(__name__)

ALPHANUMERIC = ascii_letters + digits


class MistralToolCall(ToolCall):
    id: str = Field(default_factory=lambda: MistralToolCall.generate_random_id())

    @staticmethod
    def generate_random_id():
        # Mistral Tool Call Ids must be alphanumeric with a length of 9.
        # https://github.com/mistralai/mistral-common/blob/21ee9f6cee3441e9bb1e6ed2d10173f90bd9b94b/src/mistral_common/protocol/instruct/validator.py#L299
        return "".join(choices(ALPHANUMERIC, k=9))

    @staticmethod
    def is_valid_id(id: str) -> bool:
        return id.isalnum() and len(id) == 9


class StreamingState(Enum):
    """Streaming state for tool call parsing."""

    CONTENT = auto()  # Before any [TOOL_CALLS] token
    PARSING_TOOL_NAME = auto()  # After [TOOL_CALLS], parsing function name
    PARSING_TOOL_ARGS = auto()  # Parsing JSON arguments
    COMPLETE = auto()  # All tools parsed


class MistralToolParser(ToolParser):
    """
    Tool call parser for Mistral v11+ models.

    Supports the v11+ format: [TOOL_CALLS]name[ARGS]{...}
    Optionally with call ID: [TOOL_CALLS]name[CALL_ID]id[ARGS]{...}

    This parser requires MistralTokenizer (tokenizer_mode=mistral) and
    models using tokenizer version 11 or higher.
    """

    def __init__(self, tokenizer: TokenizerLike):
        super().__init__(tokenizer)

        if not isinstance(self.model_tokenizer, MistralTokenizer):
            raise RuntimeError(
                "MistralToolParser requires MistralTokenizer. "
                "Please use tokenizer_mode='mistral' in your vLLM configuration. "
                "Note: Only v11+ Mistral models are supported for tool calling."
            )

        self._mistral_base_tokenizer = self.model_tokenizer.tokenizer
        self._version = self.model_tokenizer.version

        if self._version < 11:
            raise RuntimeError(
                f"MistralToolParser requires tokenizer version 11 or higher, "
                f"but got version {self._version}. Pre-v11 models "
                "(Mistral-7B-Instruct-v0.1/v0.2/v0.3) are not supported for "
                "tool calling. Please use a v11+ model such as "
                "Mistral-Small-3.1 or Ministral-3."
            )

        # Get bot token info
        self.bot_token = "[TOOL_CALLS]"
        self.bot_token_id = self.vocab.get(self.bot_token)

        if self.bot_token_id is None:
            raise RuntimeError(
                "Mistral Tool Parser could not locate the [TOOL_CALLS] token "
                "in the tokenizer!"
            )

        # Get control tokens for v11+ format
        try:
            self._args_token_id = self._mistral_base_tokenizer.get_control_token(
                "[ARGS]"
            )
        except Exception:
            raise RuntimeError(
                "Mistral Tool Parser could not locate the [ARGS] token. "
                "This token is required for v11+ tool call parsing."
            )

        self._call_id_token_id: int | None = None
        try:
            self._call_id_token_id = self._mistral_base_tokenizer.get_control_token(
                "[CALL_ID]"
            )
        except Exception:
            # [CALL_ID] is optional - some models may not have it
            pass

        # Regex for non-streaming parsing: name{args}
        self.fn_name_regex = re.compile(
            r"([a-zA-Z0-9_-]+)(\{[\s\S]*?\}+)", re.DOTALL
        )

        # Streaming state
        self._streaming_state = StreamingState.CONTENT
        self._current_tool_index = -1
        self._current_tool_id: str | None = None
        self._current_tool_name: str = ""
        self._current_tool_args: str = ""
        self._brace_depth = 0

        # For compatibility with serving_chat.py's finish_reason detection
        self.prev_tool_call_arr: list[dict] = []

    def extract_tool_calls(
        self,
        model_output: str,
        request: ChatCompletionRequest,
    ) -> ExtractedToolCallInformation:
        """
        Extract tool calls from a complete model response.

        Parses the v11+ format: [TOOL_CALLS]name{args}[TOOL_CALLS]name{args}...
        """
        # Fast path: no tool call token present
        if self.bot_token not in model_output:
            return ExtractedToolCallInformation(
                tools_called=False, tool_calls=[], content=model_output
            )

        try:
            # Get content before tool calls
            content = model_output.split(self.bot_token)[0]
            content = content if content.strip() else None

            # Parse tool calls from each segment after [TOOL_CALLS]
            function_call_arr = []
            for segment in model_output.split(self.bot_token):
                if not segment.strip():
                    continue

                matches = self.fn_name_regex.findall(segment)
                for match in matches:
                    fn_name = match[0]
                    args = match[1]
                    function_call_arr.append(
                        {"name": fn_name, "arguments": json.loads(args)}
                    )

            # Convert to MistralToolCall objects
            tool_calls: list[MistralToolCall] = [
                MistralToolCall(
                    type="function",
                    function=FunctionCall(
                        name=raw_function_call["name"],
                        arguments=json.dumps(
                            raw_function_call["arguments"], ensure_ascii=False
                        ),
                    ),
                )
                for raw_function_call in function_call_arr
            ]

            return ExtractedToolCallInformation(
                tools_called=True,
                tool_calls=tool_calls,
                content=content,
            )

        except Exception:
            logger.exception("Error in extracting tool call from response.")
            return ExtractedToolCallInformation(
                tools_called=False,
                tool_calls=[],
                content=model_output.replace(self.bot_token, "").strip(),
            )

    def extract_tool_calls_streaming(
        self,
        previous_text: str,
        current_text: str,
        delta_text: str,
        previous_token_ids: Sequence[int],
        current_token_ids: Sequence[int],
        delta_token_ids: Sequence[int],
        request: ChatCompletionRequest,
    ) -> DeltaMessage | None:
        """
        Extract tool calls from streaming output using token-based parsing.

        Token IDs are atomic - they cannot be split across chunks - which
        eliminates a whole class of parsing bugs that affect text-based parsing.
        """
        # If no tool call token seen yet, emit as content
        if self.bot_token_id not in current_token_ids:
            return DeltaMessage(content=delta_text)

        return self._stream_tool_calls(delta_token_ids)

    def _stream_tool_calls(
        self, delta_token_ids: Sequence[int]
    ) -> DeltaMessage | None:
        """
        Stream tool calls using token-based parsing.

        Detects [TOOL_CALLS] and [ARGS] tokens to identify tool call boundaries,
        then streams function names and arguments as they arrive.
        """
        from mistral_common.tokens.tokenizers.base import SpecialTokenPolicy

        delta_tool_calls: list[DeltaToolCall] = []

        for token_id in delta_token_ids:
            if token_id == self.bot_token_id:
                # Starting a new tool call
                self._current_tool_index += 1
                self._current_tool_id = MistralToolCall.generate_random_id()
                self._current_tool_name = ""
                self._current_tool_args = ""
                self._brace_depth = 0
                self._streaming_state = StreamingState.PARSING_TOOL_NAME

                # Set flag for finish_reason detection
                if not self.prev_tool_call_arr:
                    self.prev_tool_call_arr = [{"arguments": {}}]

                # Initialize streamed_args_for_tool for this tool index
                while len(self.streamed_args_for_tool) <= self._current_tool_index:
                    self.streamed_args_for_tool.append("")

            elif token_id == self._args_token_id:
                # Transition from name to arguments
                if self._streaming_state == StreamingState.PARSING_TOOL_NAME:
                    # Emit the complete function name
                    delta_tool_calls.append(
                        DeltaToolCall(
                            index=self._current_tool_index,
                            type="function",
                            id=self._current_tool_id,
                            function=DeltaFunctionCall(
                                name=self._current_tool_name.strip()
                            ).model_dump(exclude_none=True),
                        )
                    )
                    self._streaming_state = StreamingState.PARSING_TOOL_ARGS

            elif token_id == self._call_id_token_id:
                # Skip call ID tokens (they come between name and [ARGS])
                # We generate our own IDs
                pass

            elif self._streaming_state == StreamingState.CONTENT:
                # Before any tool call - shouldn't happen if bot_token_id
                # is in current_token_ids, but handle gracefully
                pass

            elif self._streaming_state == StreamingState.PARSING_TOOL_NAME:
                # Accumulate name tokens
                token_str = self._mistral_base_tokenizer.decode(
                    [token_id], special_token_policy=SpecialTokenPolicy.IGNORE
                )
                self._current_tool_name += token_str

            elif self._streaming_state == StreamingState.PARSING_TOOL_ARGS:
                # Stream argument tokens
                token_str = self._mistral_base_tokenizer.decode(
                    [token_id], special_token_policy=SpecialTokenPolicy.IGNORE
                )

                # Track brace depth for nested JSON
                for char in token_str:
                    if char == "{":
                        self._brace_depth += 1
                    elif char == "}":
                        self._brace_depth -= 1

                self._current_tool_args += token_str

                # Update streamed_args_for_tool for vLLM's finish handling
                if self._current_tool_index < len(self.streamed_args_for_tool):
                    self.streamed_args_for_tool[self._current_tool_index] = (
                        self._current_tool_args
                    )

                # Emit arguments delta
                delta_tool_calls.append(
                    DeltaToolCall(
                        index=self._current_tool_index,
                        function=DeltaFunctionCall(
                            arguments=token_str
                        ).model_dump(exclude_none=True),
                    )
                )

        # Build response
        if delta_tool_calls:
            return DeltaMessage(tool_calls=delta_tool_calls)

        return None

PS: I had already followed this approach in a PR on mlx-lm this summer. The mistral-common library is underrated.

@graelo graelo mentioned this pull request Dec 4, 2025
5 tasks
@jayteaftw
Copy link

Hi to summarize so far this added PR will fix the problem occurring for the older Mistral-Small-3.1-24B-Instruct-2503 and Devstral-Small-2505 because they use tekken version v7 however the new models including Mistral Large 3 and Ministral 3 and not so new models Mistral-Small-3.2-24B-Instruct-2506 and Devstral-Small-2507 because they use tekken v13? @avigny @graelo

@Erwandsn
Copy link

Erwandsn commented Dec 6, 2025

@graelo Great work ! I've tested your new mistral_tool_parser.py class during the last two days and it work well, but i'm still facing an issue, one case that break the parser:

Error output (json seems parsing crash, but JSON pass synthax validation):

(APIServer pid=1) ERROR 12-06 18:56:22 [mistral_tool_parser.py:205] Error in extracting tool call from response. (APIServer pid=1) ERROR 12-06 18:56:22 [mistral_tool_parser.py:205] Traceback (most recent call last): (APIServer pid=1) ERROR 12-06 18:56:22 [mistral_tool_parser.py:205] File "/app/.venv/lib/python3.12/site-packages/vllm/entrypoints/openai/tool_parsers/mistral_tool_parser.py", line 181, in extract_tool_calls (APIServer pid=1) ERROR 12-06 18:56:22 [mistral_tool_parser.py:205] {"name": fn_name, "arguments": json.loads(args)} (APIServer pid=1) ERROR 12-06 18:56:22 [mistral_tool_parser.py:205] ^^^^^^^^^^^^^^^^ (APIServer pid=1) ERROR 12-06 18:56:22 [mistral_tool_parser.py:205] File "/usr/lib/python3.12/json/__init__.py", line 346, in loads (APIServer pid=1) ERROR 12-06 18:56:22 [mistral_tool_parser.py:205] return _default_decoder.decode(s) (APIServer pid=1) ERROR 12-06 18:56:22 [mistral_tool_parser.py:205] ^^^^^^^^^^^^^^^^^^^^^^^^^^ (APIServer pid=1) ERROR 12-06 18:56:22 [mistral_tool_parser.py:205] File "/usr/lib/python3.12/json/decoder.py", line 337, in decode (APIServer pid=1) ERROR 12-06 18:56:22 [mistral_tool_parser.py:205] obj, end = self.raw_decode(s, idx=_w(s, 0).end()) (APIServer pid=1) ERROR 12-06 18:56:22 [mistral_tool_parser.py:205] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (APIServer pid=1) ERROR 12-06 18:56:22 [mistral_tool_parser.py:205] File "/usr/lib/python3.12/json/decoder.py", line 353, in raw_decode (APIServer pid=1) ERROR 12-06 18:56:22 [mistral_tool_parser.py:205] obj, end = self.scan_once(s, idx) (APIServer pid=1) ERROR 12-06 18:56:22 [mistral_tool_parser.py:205] ^^^^^^^^^^^^^^^^^^^^^^ (APIServer pid=1) ERROR 12-06 18:56:22 [mistral_tool_parser.py:205] json.decoder.JSONDecodeError: Invalid control character at: line 1 column 245 (char 244) (APIServer pid=1) INFO: 22.22.22.2:47738 - "POST /v1/chat/completions HTTP/1.1" 200 OK

Detail of the tool call producing the error:

[TOOL_CALLS]run_agent{"agent_id": "context_builder", "prompt": "You are a context builder agent. Your task is to prepare a comprehensive context for the files_publisher agent to ensure all necessary details are included in the prompt for publishing the web project. The files to be published are as follows: 1. **index.html**: The main HTML file for the UFC Events December 2025 website. It includes a responsive layout, UFC branding, and dynamic content sections. 2. **style.css**: The CSS file containing styling rules for the website, including themes, hover effects, and responsive adjustments. 3. **script.js**: The JavaScript file responsible for dynamically rendering UFC event data, fighter details, and odds based on JSON files (events.json, fighters.json, and odds.json). 4. **events.json**: A JSON file containing details about UFC events, including event names, venues, dates, times, and main card fights. 5. **fighters.json**: A JSON file containing fighter information such as nicknames, names, records, and rankings. 6. **odds.json**: A JSON file containing betting odds for key fights in UFC events. ### Key Instructions for files_publisher: - The website is a **dynamic web project** that relies on JSON data files to populate event cards, fighter details, and odds. - The files must be published in a structured manner to ensure the website functions correctly. The relative paths for the files should be maintained as follows: - All files (HTML, CSS, JS, and JSON) should be placed in the root directory (./). - The script.js file dynamically fetches data from the JSON files (events.json, fighters.json, and odds.json) and renders it on the page. - Ensure that the files_publisheragent includes all files in its prompt, specifying their roles (e.g., HTML template, stylesheet, JavaScript logic, and data files). - Verify that thefiles_publisheragent understands the dependencies between the files (e.g.,script.jsdepends on the JSON data files). ### Goal: - Provide a clear and detailed prompt for thefiles_publisher agent to publish this complete web project successfully. The published website should reflect the dynamic content as intended by the provided files.", "parent": "33a197ea-4e26-47c1-9393-ccf01413353e"}

An other response triggering the error:

[TOOL_CALLS]run_agent{"agent_id": "files_publisher", "prompt": "Please publish the UFC landing page website using the following files and structure:\n\n### Files and Contents\nThe website consists of the following files organized in the specified directory structure:\n\n```\nufc-events-2025/\n├── index.html\n├── css/\n│ └── style.css\n├── js/\n│ └── script.js\n├── data/\n│ ├── events.json\n│ ├── fighters.json\n│ └── odds.json\n```\n\nHere are the details for each file:\n\n1. **index.html**: \n```html\n<!DOCTYPE html>\n<html lang=\"en\">\n<head>\n <meta charset=\"UTF-8\">\n <meta name=\"viewport\" content=\"width=device-width, initial-scale=1.0\">\n <title>UFC Events December 2025</title>\n <link href=\"https://cdn.jsdelivr.net/npm/tailwindcss@2.2.19/dist/tailwind.min.css\" rel=\"stylesheet\">\n <link href=\"https://fonts.googleapis.com/css2?family=Roboto:wght=300;400;500;700&display=swap\" rel=\"stylesheet\">\n <link rel=\"stylesheet\" href=\"css/style.css\">\n</head>\n<body class=\"font-sans bg-midnight-slate text-white\">\n <header class=\"bg-midnight-depth py-6 px-4\">\n <div class=\"container mx-auto flex justify-center\">\n <img src=\"https://upload.wikimedia.org/wikipedia/commons/thumb/3/3b/UFC_logo.svg/1200px-UFC_logo.svg.png\" alt=\"UFC Logo\" class=\"h-16\">\n </div>\n </header>\n\n <main class=\"container mx-auto py-12 px-4\">\n <section class=\"text-center mb-16\">\n <h1 class=\"text-5xl font-bold mb-6\">UFC Events December 2025</h1>\n <p class=\"text-xl text-light-grey max-w-2xl mx-auto\">\n Exciting fights, legendary champions, and unforgettable moments. Stay updated with the latest UFC events.\n </p>\n </section>\n\n <section id=\"events-section\" class=\"grid grid-cols-1 md:grid-cols-2 lg:grid-cols-3 gap-8\">\n <!-- Event cards will be dynamically populated here -->\n </section>\n </main>\n\n <footer class=\"bg-midnight-depth py-6 px-4\">\n <div class=\"container mx-auto text-center\">\n <p class=\"text-light-grey\">© 2025 UFC. All rights reserved.</p>\n </div>\n </footer>\n\n <script src=\"js/script.js\"></script>\n</body>\n</html>\n```\n\n2. **style.css**: \n```css\n:root {\n --base-font-size: 16px;\n --default-box-shadow: rgba(118, 146, 255, 0.25) 0px 13px 27px -5px, rgba(118, 146, 255, 0.25) 0px 8px 16px -8px;\n --red: #FF3F3F;\n --deep-current: #00435B;\n --midnight-slate: #1F2937;\n --light-grey: #EDEFF1;\n --lazer-purple: #7692FF;\n --black: #000000;\n --transition-base: all 0.3s ease;\n}\n\nbody {\n font-family: 'Roboto', sans-serif;\n background-color: var(--midnight-slate);\n color: white;\n margin: 0;\n padding: 0;\n}\n\n.header {\n background-color: var(--deep-current);\n}\n\n.event-card {\n background: var(--midnight-slate);\n border: 1px solid rgba(255, 255, 255, 0.1);\n border-radius: 0.5rem;\n padding: 1.5rem;\n transition: var(--transition-base);\n box-shadow: var(--default-box-shadow);\n background-image: linear-gradient(135deg, var(--deep-current) 0%, var(--midnight-slate) 100%);\n}\n\n.event-card:hover {\n transform: translateY(-5px);\n box-shadow: 0 25px 50px -12px rgba(118, 146, 255, 0.3);\n}\n\n.event-card h3 {\n color: var(--lazer-purple);\n font-weight: 600;\n margin-bottom: 0.75rem;\n}\n\n.event-card p {\n color: var(--light-grey);\n margin-bottom: 1rem;\n}\n\n.fighter-card {\n background: rgba(255, 255, 255, 0.05);\n border: 1px solid rgba(255, 255, 255, 0.1);\n border-radius: 0.5rem;\n padding: 1rem;\n transition: var(--transition-base);\n}\n\n.fighter-card:hover {\n background: rgba(255, 255, 255, 0.1);\n}\n\n.odds-card {\n background: rgba(118, 146, 255, 0.1);\n border: 1px solid var(--lazer-purple);\n border-radius: 0.5rem;\n padding: 1rem;\n transition: var(--transition-base);\n}\n\n.odds-card:hover {\n background: rgba(118, 146, 255, 0.2);\n}\n\n.hero-section {\n background: linear-gradient(135deg, var(--deep-current) 0%, var(--midnight-slate) 100%);\n}\n\n/* Responsive Adjustments */\n@media (max-width: 768px) {\n .event-card {\n padding: 1rem;\n }\n}\n```\n\n3. **script.js**: \n```javascript\ndocument.addEventListener('DOMContentLoaded', function() {\n // Fetch data from JSON files\n async function renderData() {\n try {\n const eventsResponse = await fetch('data/events.json');\n const eventsData = await eventsResponse.json();\n\n const fightersResponse = await fetch('data/fighters.json');\n const fightersData = await fightersResponse.json();\n\n const oddsResponse = await fetch('data/odds.json');\n const oddsData = await oddsResponse.json();\n\n renderEventCards(eventsData, fightersData, oddsData);\n } catch (error) {\n console.error('Error:', error);\n }\n }\n \n // Function to render event cards\n function renderEventCards(eventsData, fightersData, oddsData) {\n const eventsSection = document.getElementById('events-section');\n eventsData.forEach(event => {\n const eventCard = document.createElement('div');\n eventCard.className = 'event-card';\n \n // Extract main card fight\n const mainFight = event.main_card[0].fight;\n const fighter1Name = mainFight.split(' vs. ')[0].trim();\n const fighter2Name = mainFight.split(' vs. ')[1].trim();\n \n // Find fighter details\n const fighter1 = fightersData.find(fighter => fighter.fighter_name === fighter1Name);\n const fighter2 = fightersData.find(fighter => fighter.fighter_name === fighter2Name);\n \n // Find odds for this fight\n let oddsText = '';\n oddsData.ufc_events.forEach(ufcEvent => {\n const eventFight = ufcEvent.key_fights.find(fight => fight.fight === mainFight);\n if (eventFight) {\n oddsText = Odds: ${fighter1Name} ${eventFight.odds[fighter1Name.toLowerCase().replace(/ /g, '')]} | ${fighter2Name} ${eventFight.odds[fighter2Name.toLowerCase().replace(/ /g, '')]};\n }\n });\n \n eventCard.innerHTML = \n

${event.event_name}

\n

Venue: ${event.venue}

\n

Date: ${event.date} | Time: ${event.start_time}

\n <div class="mt-4">\n

Main Event: ${mainFight}

\n <div class="flex flex-wrap gap-4 mt-2">\n ${fighter1 ? <div class=\"fighter-card\"><p><strong>${fighter1.nickname}:</strong> ${fighter1.record}</p></div> : ''}\n ${fighter2 ? <div class=\"fighter-card\"><p><strong>${fighter2.nickname}:</strong> ${fighter2.record}</p></div> : ''}\n \n ${oddsText ? <div class=\"odds-card mt-4\">${oddsText}</div> : ''}\n \n ;\n \n eventsSection.appendChild(eventCard);\n });\n }\n\n renderData();\n});\n```\n\n4. **events.json**: \n```json\n[\n {\n \"event_name\": \"UFC Fight Night: Royval vs. Kape\",\n \"venue\": \"UFC APEX, Las Vegas, NV, USA\",\n \"date\": \"2025-12-14\",\n \"start_time\": \"03:00 AM (UTC-8)\",\n \"main_card\": [\n {\n \"fight\": \"Brandon Royval vs. Manel Kape\",\n \"rounds\": 5,\n \"weight_class\": \"Lightweight\"\n }\n ]\n },\n {\n \"event_name\": \"UFC 323: Dvalishvili vs. Yan\",\n \"venue\": \"T-Mobile Arena, Las Vegas, NV, USA\",\n \"date\": \"2025-12-06\",\n \"start_time\": \"03:00 AM (UTC-8)\",\n \"main_card\": [\n {\n \"fight\": \"Merab Dvalishvili vs. Petr Yan\",\n \"rounds\": 5,\n \"weight_class\": \"Bantamweight\"\n }\n ]\n }\n]\n```\n\n5. **fighters.json**: \n```json\n[\n {\n \"nickname\": \"Raw Dawg\",\n \"fighter_name\": \"Brandon Royval\",\n \"record\": \"17-8-0 (UFC: 12-6-0)\",\n \"ranking\": \"Undisputed\"\n },\n {\n \"nickname\": \"Starboy\",\n \"fighter_name\": \"Manel Kape\",\n \"record\": \"21-7-0 (UFC: 13-7-0)\",\n \"ranking\": \"Undisputed\"\n },\n {\n \"nickname\": \"The Machine\",\n \"fighter_name\": \"Merab Dvalishvili\",\n \"record\": \"21-4-0 (UFC: 14-3-0)\",\n \"ranking\": \"UFC Bantamweight Champion\"\n },\n {\n \"nickname\": \"No Mercy\",\n \"fighter_name\": \"Petr Yan\",\n \"record\": \"19-5-0 (UFC: 10-4-0)\",\n \"ranking\": \"#2 in UFC Bantamweight\"\n }\n]\n```\n\n6. **odds.json**: \n```json\n{\n \"ufc_events\": [\n {\n \"event_name\": \"UFC Fight Night: Royval vs. Kape\",\n \"key_fights\": [\n {\n \"fight\": \"Brandon Royval vs. Manel Kape\",\n \"odds\": {\n \"brandon_royval\": 2.10,\n \"manel_kape\": 1.90,\n \"draw\": 12.00\n }\n }\n ]\n },\n {\n \"event_name\": \"UFC 323: Dvalishvili vs. Yan\",\n \"key_fights\": [\n {\n \"fight\": \"Merab Dvalishvili vs. Petr Yan\",\n \"odds\": {\n \"merab_dvalishvili\": 1.80,\n \"petr_yan\": 3.50,\n \"draw\": 12.00\n }\n }\n ]\n }\n ]\n}\n```\n\n### Instructions for Publishing\n\n1. **Directory Structure**: - Ensure the files are organized in the correct structure: ``` ufc-events-2025/ ├── index.html ├── css/ │ └── style.css ├── js/ │ └── script.js └── data/ ├── events.json ├── fighters.json └── odds.json ``` 2. **Publish the Website**: - Upload the entire ufc-events-2025folder to your preferred hosting platform. Platforms like Netlify, Vercel, or any static web hosting service should accommodate this setup. - Make sure the files are accessible from the root directory of the hosting service. - Test the website by openingindex.html in a browser to ensure all dynamic content loads correctly. 3. **Dependencies**: - No external dependencies (like npm packages) are required for this static site. Simply upload the files as-is. Publish this website and provide the URL once it's live.", "parent": "4a008c35-1487-4acf-a927-39a602e33e1a"}

Hope these issues can be resolved, the models are really good in my tests. Can't wait to have it fully working

@graelo
Copy link

graelo commented Dec 6, 2025

@Erwandsn thanks for the detailed report. Can you try with the version from my new PR #30063 and report there? I've already highjacked this thread more than I should have. 😅

@avigny
Copy link
Contributor Author

avigny commented Dec 7, 2025

Hi to summarize so far this added PR will fix the problem occurring for the older Mistral-Small-3.1-24B-Instruct-2503 and Devstral-Small-2505 because they use tekken version v7 however the new models including Mistral Large 3 and Ministral 3 and not so new models Mistral-Small-3.2-24B-Instruct-2506 and Devstral-Small-2507 because they use tekken v13? @avigny @graelo

Hi @jayteaftw, if I'm not mistaken,

  • Mistral-Small-3.2-24B-Instruct-2506 does use tekken v11 so tool calls with this model should work.
  • Devstral-Small-2507 uses tekken v13 and this PR does not address this.
  • mistralai/Devstral-Small-2505 uses tekken v7 so tool calls with this model should work.
  • mistralai/Mistral-Small-3.1-24B-Instruct-2503 uses tekken v7 so tool calls with this model should work.

@avigny
Copy link
Contributor Author

avigny commented Dec 7, 2025

@Erwandsn I apologize, I was wrong when saying this call looked like a v13 call

This indeed looks like a v11 tool call and this PR aimed at repairing tool calls like these.

Looking at the tool calls failing:

This one should work

I've added locally this tool call as a test case and it passes

[TOOL_CALLS]run_agent{"agent_id": "context_builder", "prompt": "You are a context builder agent. Your task is to prepare a comprehensive context for the files_publisher agent to ensure all necessary details are included in the prompt for publishing the web project. The files to be published are as follows: 1. **`index.html`**: The main HTML file for the UFC Events December 2025 website. It includes a responsive layout, UFC branding, and dynamic content sections. 2. **`style.css`**: The CSS file containing styling rules for the website, including themes, hover effects, and responsive adjustments. 3. **`script.js`**: The JavaScript file responsible for dynamically rendering UFC event data, fighter details, and odds based on JSON files (`events.json`, `fighters.json`, and `odds.json`). 4. **`events.json`**: A JSON file containing details about UFC events, including event names, venues, dates, times, and main card fights. 5. **`fighters.json`**: A JSON file containing fighter information such as nicknames, names, records, and rankings. 6. **`odds.json`**: A JSON file containing betting odds for key fights in UFC events. ### Key Instructions for `files_publisher`: - The website is a **dynamic web project** that relies on JSON data files to populate event cards, fighter details, and odds. - The files must be published in a structured manner to ensure the website functions correctly. The relative paths for the files should be maintained as follows: - All files (HTML, CSS, JS, and JSON) should be placed in the root directory (`./`). - The `script.js` file dynamically fetches data from the JSON files (`events.json`, `fighters.json`, and `odds.json`) and renders it on the page. - Ensure that the `files_publisher`agent includes all files in its prompt, specifying their roles (e.g., HTML template, stylesheet, JavaScript logic, and data files). - Verify that the`files_publisher`agent understands the dependencies between the files (e.g.,`script.js`depends on the JSON data files). ### Goal: - Provide a clear and detailed prompt for the`files_publisher` agent to publish this complete web project successfully. The published website should reflect the dynamic content as intended by the provided files.", "parent": "33a197ea-4e26-47c1-9393-ccf01413353e"}

I'm not exactly sure why this one fails 🤷

This second tool call is not valid (error is expected)

Looking closely, I've found some un-escaped " in the end of the tool call.

\n <div class="mt-4">\n

also in the last line:

\n <div class="flex flex-wrap gap-4 mt-2">\n

I think these are causing the JSONDecodeError you've been getting.

My wild guess is that the model you are using is struggling to generate a valid json when the tool argument is getting very big.

See complete tool call from your comment:
[TOOL_CALLS]run_agent{"agent_id": "files_publisher", "prompt": "Please publish the UFC landing page website using the following files and structure:\n\n### Files and Contents\nThe website consists of the following files organized in the specified directory structure:\n\n```\nufc-events-2025/\n├── index.html\n├── css/\n│ └── style.css\n├── js/\n│ └── script.js\n├── data/\n│ ├── events.json\n│ ├── fighters.json\n│ └── odds.json\n```\n\nHere are the details for each file:\n\n1. **index.html**: \n```html\n<!DOCTYPE html>\n<html lang=\"en\">\n<head>\n <meta charset=\"UTF-8\">\n <meta name=\"viewport\" content=\"width=device-width, initial-scale=1.0\">\n <title>UFC Events December 2025</title>\n <link href=\"https://cdn.jsdelivr.net/npm/tailwindcss@2.2.19/dist/tailwind.min.css\" rel=\"stylesheet\">\n <link href=\"https://fonts.googleapis.com/css2?family=Roboto:wght=300;400;500;700&display=swap\" rel=\"stylesheet\">\n <link rel=\"stylesheet\" href=\"css/style.css\">\n</head>\n<body class=\"font-sans bg-midnight-slate text-white\">\n <header class=\"bg-midnight-depth py-6 px-4\">\n <div class=\"container mx-auto flex justify-center\">\n <img src=\"https://upload.wikimedia.org/wikipedia/commons/thumb/3/3b/UFC_logo.svg/1200px-UFC_logo.svg.png\" alt=\"UFC Logo\" class=\"h-16\">\n </div>\n </header>\n\n <main class=\"container mx-auto py-12 px-4\">\n <section class=\"text-center mb-16\">\n <h1 class=\"text-5xl font-bold mb-6\">UFC Events December 2025</h1>\n <p class=\"text-xl text-light-grey max-w-2xl mx-auto\">\n Exciting fights, legendary champions, and unforgettable moments. Stay updated with the latest UFC events.\n </p>\n </section>\n\n <section id=\"events-section\" class=\"grid grid-cols-1 md:grid-cols-2 lg:grid-cols-3 gap-8\">\n <!-- Event cards will be dynamically populated here -->\n </section>\n </main>\n\n <footer class=\"bg-midnight-depth py-6 px-4\">\n <div class=\"container mx-auto text-center\">\n <p class=\"text-light-grey\">© 2025 UFC. All rights reserved.</p>\n </div>\n </footer>\n\n <script src=\"js/script.js\"></script>\n</body>\n</html>\n```\n\n2. **style.css**: \n```css\n:root {\n --base-font-size: 16px;\n --default-box-shadow: rgba(118, 146, 255, 0.25) 0px 13px 27px -5px, rgba(118, 146, 255, 0.25) 0px 8px 16px -8px;\n --red: #FF3F3F;\n --deep-current: #00435B;\n --midnight-slate: #1F2937;\n --light-grey: #EDEFF1;\n --lazer-purple: #7692FF;\n --black: #000000;\n --transition-base: all 0.3s ease;\n}\n\nbody {\n font-family: 'Roboto', sans-serif;\n background-color: var(--midnight-slate);\n color: white;\n margin: 0;\n padding: 0;\n}\n\n.header {\n background-color: var(--deep-current);\n}\n\n.event-card {\n background: var(--midnight-slate);\n border: 1px solid rgba(255, 255, 255, 0.1);\n border-radius: 0.5rem;\n padding: 1.5rem;\n transition: var(--transition-base);\n box-shadow: var(--default-box-shadow);\n background-image: linear-gradient(135deg, var(--deep-current) 0%, var(--midnight-slate) 100%);\n}\n\n.event-card:hover {\n transform: translateY(-5px);\n box-shadow: 0 25px 50px -12px rgba(118, 146, 255, 0.3);\n}\n\n.event-card h3 {\n color: var(--lazer-purple);\n font-weight: 600;\n margin-bottom: 0.75rem;\n}\n\n.event-card p {\n color: var(--light-grey);\n margin-bottom: 1rem;\n}\n\n.fighter-card {\n background: rgba(255, 255, 255, 0.05);\n border: 1px solid rgba(255, 255, 255, 0.1);\n border-radius: 0.5rem;\n padding: 1rem;\n transition: var(--transition-base);\n}\n\n.fighter-card:hover {\n background: rgba(255, 255, 255, 0.1);\n}\n\n.odds-card {\n background: rgba(118, 146, 255, 0.1);\n border: 1px solid var(--lazer-purple);\n border-radius: 0.5rem;\n padding: 1rem;\n transition: var(--transition-base);\n}\n\n.odds-card:hover {\n background: rgba(118, 146, 255, 0.2);\n}\n\n.hero-section {\n background: linear-gradient(135deg, var(--deep-current) 0%, var(--midnight-slate) 100%);\n}\n\n/* Responsive Adjustments */\n@media (max-width: 768px) {\n .event-card {\n padding: 1rem;\n }\n}\n```\n\n3. **script.js**: \n```javascript\ndocument.addEventListener('DOMContentLoaded', function() {\n // Fetch data from JSON files\n async function renderData() {\n try {\n const eventsResponse = await fetch('data/events.json');\n const eventsData = await eventsResponse.json();\n\n const fightersResponse = await fetch('data/fighters.json');\n const fightersData = await fightersResponse.json();\n\n const oddsResponse = await fetch('data/odds.json');\n const oddsData = await oddsResponse.json();\n\n renderEventCards(eventsData, fightersData, oddsData);\n } catch (error) {\n console.error('Error:', error);\n }\n }\n \n // Function to render event cards\n function renderEventCards(eventsData, fightersData, oddsData) {\n const eventsSection = document.getElementById('events-section');\n eventsData.forEach(event => {\n const eventCard = document.createElement('div');\n eventCard.className = 'event-card';\n \n // Extract main card fight\n const mainFight = event.main_card[0].fight;\n const fighter1Name = mainFight.split(' vs. ')[0].trim();\n const fighter2Name = mainFight.split(' vs. ')[1].trim();\n \n // Find fighter details\n const fighter1 = fightersData.find(fighter => fighter.fighter_name === fighter1Name);\n const fighter2 = fightersData.find(fighter => fighter.fighter_name === fighter2Name);\n \n // Find odds for this fight\n let oddsText = '';\n oddsData.ufc_events.forEach(ufcEvent => {\n const eventFight = ufcEvent.key_fights.find(fight => fight.fight === mainFight);\n if (eventFight) {\n oddsText = ``Odds: ${fighter1Name} ${eventFight.odds[fighter1Name.toLowerCase().replace(/ /g, '_')]} | ${fighter2Name} ${eventFight.odds[fighter2Name.toLowerCase().replace(/ /g, '_')]}`;\n }\n });\n \n eventCard.innerHTML = `\n

### ${event.event_name}
\n
**Venue:** ${event.venue}

\n
**Date:** ${event.date} | **Time:** ${event.start_time}

\n <div class="mt-4">\n
#### Main Event: ${mainFight}
\n <div class="flex flex-wrap gap-4 mt-2">\n ${fighter1 ? `<div class=\"fighter-card\"><p><strong>${fighter1.nickname}:</strong> ${fighter1.record}</p></div>` : ''}\n ${fighter2 ? `<div class=\"fighter-card\"><p><strong>${fighter2.nickname}:</strong> ${fighter2.record}</p></div>` : ''}\n \n ${oddsText ? `<div class=\"odds-card mt-4\">${oddsText}</div>` : ''}\n \n `` ;\n \n eventsSection.appendChild(eventCard);\n });\n }\n\n renderData();\n});\n```\n\n4. **events.json**: \n```json\n[\n {\n \"event_name\": \"UFC Fight Night: Royval vs. Kape\",\n \"venue\": \"UFC APEX, Las Vegas, NV, USA\",\n \"date\": \"2025-12-14\",\n \"start_time\": \"03:00 AM (UTC-8)\",\n \"main_card\": [\n {\n \"fight\": \"Brandon Royval vs. Manel Kape\",\n \"rounds\": 5,\n \"weight_class\": \"Lightweight\"\n }\n ]\n },\n {\n \"event_name\": \"UFC 323: Dvalishvili vs. Yan\",\n \"venue\": \"T-Mobile Arena, Las Vegas, NV, USA\",\n \"date\": \"2025-12-06\",\n \"start_time\": \"03:00 AM (UTC-8)\",\n \"main_card\": [\n {\n \"fight\": \"Merab Dvalishvili vs. Petr Yan\",\n \"rounds\": 5,\n \"weight_class\": \"Bantamweight\"\n }\n ]\n }\n]\n```\n\n5. **fighters.json**: \n```json\n[\n {\n \"nickname\": \"Raw Dawg\",\n \"fighter_name\": \"Brandon Royval\",\n \"record\": \"17-8-0 (UFC: 12-6-0)\",\n \"ranking\": \"Undisputed\"\n },\n {\n \"nickname\": \"Starboy\",\n \"fighter_name\": \"Manel Kape\",\n \"record\": \"21-7-0 (UFC: 13-7-0)\",\n \"ranking\": \"Undisputed\"\n },\n {\n \"nickname\": \"The Machine\",\n \"fighter_name\": \"Merab Dvalishvili\",\n \"record\": \"21-4-0 (UFC: 14-3-0)\",\n \"ranking\": \"UFC Bantamweight Champion\"\n },\n {\n \"nickname\": \"No Mercy\",\n \"fighter_name\": \"Petr Yan\",\n \"record\": \"19-5-0 (UFC: 10-4-0)\",\n \"ranking\": \"#2 in UFC Bantamweight\"\n }\n]\n```\n\n6. **odds.json**: \n```json\n{\n \"ufc_events\": [\n {\n \"event_name\": \"UFC Fight Night: Royval vs. Kape\",\n \"key_fights\": [\n {\n \"fight\": \"Brandon Royval vs. Manel Kape\",\n \"odds\": {\n \"brandon_royval\": 2.10,\n \"manel_kape\": 1.90,\n \"draw\": 12.00\n }\n }\n ]\n },\n {\n \"event_name\": \"UFC 323: Dvalishvili vs. Yan\",\n \"key_fights\": [\n {\n \"fight\": \"Merab Dvalishvili vs. Petr Yan\",\n \"odds\": {\n \"merab_dvalishvili\": 1.80,\n \"petr_yan\": 3.50,\n \"draw\": 12.00\n }\n }\n ]\n }\n ]\n}\n```\n\n### Instructions for Publishing\n\n1. **Directory Structure**: - Ensure the files are organized in the correct structure: ``` ufc-events-2025/ ├── index.html ├── css/ │ └── style.css ├── js/ │ └── script.js └── data/ ├── events.json ├── fighters.json └── odds.json ``` 2. **Publish the Website**: - Upload the entire ``ufc-events-2025`folder to your preferred hosting platform. Platforms like Netlify, Vercel, or any static web hosting service should accommodate this setup. - Make sure the files are accessible from the root directory of the hosting service. - Test the website by opening`index.html` in a browser to ensure all dynamic content loads correctly. 3. **Dependencies**: - No external dependencies (like npm packages) are required for this static site. Simply upload the files as-is. Publish this website and provide the URL once it's live.", "parent": "4a008c35-1487-4acf-a927-39a602e33e1a"}

@Erwandsn
Copy link

Erwandsn commented Dec 7, 2025

@avigny No problem, this is really a weird issue. I also think this is related with the complexity of the json (wich doesn't help for debug).

But i had the case with smaller json, i think the complexity of variables escape when HTML is included in the JSON make the model mess up, (I'm on Ministral-3-8b-instruct).

And my way to pass files content with JSON really suck and is a part of the problem. So i think this is not an emergency issue.

Thanks for your investigations !

@jayteaftw
Copy link

Is this PR related? #30332

@avigny
Copy link
Contributor Author

avigny commented Dec 10, 2025

@jayteaftw

Is this PR related? #30332

Yes. there seemed to be an issue with non streaming complex tool calls like

[TOOL_CALLS]bash{"command": "print(\\"hello world!\\")\\nre.compile(r\'{}\')"}

From what I understand, the regex failed to match the complete tool arguments and did stop at the first }.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ci/build frontend ready ONLY add when PR is ready to merge/full CI is needed tool-calling

Projects

Status: Done
Status: Done