[BUGFIX] Mistral tool call parser v11+#30332
Conversation
Signed-off-by: juliendenize <julien.denize@mistral.ai>
|
Codex usage limits have been reached for code reviews. Please check with the admins of this repo to increase the limits by adding credits. |
There was a problem hiding this comment.
Code Review
This pull request fixes an edge case in the Mistral tool parser for v11+ tokenizers by replacing a brittle regex with a simpler string splitting logic. This is a good change that correctly handles curly braces within tool argument string literals, and it's great that a test case was added to cover this. However, I've identified a critical issue in the new implementation. The argument parsing is not robust against trailing text after the JSON arguments, and the error handling for this case is broken, which can lead to a crash. I've provided a detailed comment with a suggested code change to make the parsing more robust and handle errors gracefully.
| end_name = single_tool_content.find("{") | ||
| fn_name, args = ( | ||
| single_tool_content[:end_name], | ||
| single_tool_content[end_name:], | ||
| ) | ||
|
|
||
| # fn_name is encoded outside serialized json dump | ||
| # only arguments are serialized | ||
| function_call_arr.append( | ||
| {"name": fn_name, "arguments": json.loads(args)} | ||
| ) |
There was a problem hiding this comment.
This parsing logic is not robust against model outputs that include trailing text after the JSON arguments. args is assigned the rest of the string from the first {, so if there's any extra text, json.loads(args) will fail.
When json.loads fails, the except json.JSONDecodeError block at line 166 is triggered. However, this fallback logic is only compatible with the pre-v11 tokenizer format and will raise an IndexError for the v11+ format, causing the entire request to fail.
A more robust approach is to parse the JSON arguments non-greedily and handle decoding errors locally for each tool call. Using json.JSONDecoder().raw_decode() will correctly handle trailing text, and a local try...except block will prevent a single malformed tool call from failing the entire extraction process.
| end_name = single_tool_content.find("{") | |
| fn_name, args = ( | |
| single_tool_content[:end_name], | |
| single_tool_content[end_name:], | |
| ) | |
| # fn_name is encoded outside serialized json dump | |
| # only arguments are serialized | |
| function_call_arr.append( | |
| {"name": fn_name, "arguments": json.loads(args)} | |
| ) | |
| end_name = single_tool_content.find("{") | |
| fn_name, args_str = ( | |
| single_tool_content[:end_name], | |
| single_tool_content[end_name:], | |
| ) | |
| # fn_name is encoded outside serialized json dump | |
| # only arguments are serialized | |
| try: | |
| # Use raw_decode to robustly parse JSON and ignore trailing text | |
| decoder = json.JSONDecoder() | |
| args_json, _ = decoder.raw_decode(args_str) | |
| function_call_arr.append( | |
| {"name": fn_name, "arguments": args_json} | |
| ) | |
| except json.JSONDecodeError: | |
| # Log and skip malformed tool calls | |
| logger.warning("Failed to decode tool arguments for tool %s: %s", fn_name, args_str) | |
| continue |
chaunceyjiang
left a comment
There was a problem hiding this comment.
Thanks @juliendenize lgtm
|
@juliendenize the issue seems to appear only in non streaming, but the test was added in a streaming test case. Shouldn't we add the test also for non streaming in |
Signed-off-by: juliendenize <julien.denize@mistral.ai> Signed-off-by: dsuhinin <suhinin.dmitriy@gmail.com>
Purpose
This PR fixes an edge case for the mistral tool parser for tool edge cases
Test Plan
A test has been added.
Test Result
All previous and new tests pass.
Essential Elements of an Effective PR Description Checklist
supported_models.mdandexamplesfor a new model.