[Tool Parser] Qwen3Coder: stop string streaming at next <parameter=> boundary#37829
[Tool Parser] Qwen3Coder: stop string streaming at next <parameter=> boundary#37829ec-jt wants to merge 1 commit into
Conversation
Signed-off-by: ec-jt <james.trappett@elementalcompute.com>
There was a problem hiding this comment.
Code Review
This pull request refactors the streaming tool call parsing for Qwen3CoderToolParser to correctly handle malformed streams where a parameter's closing tag is missing. The changes introduce a new state for streaming string values and add logic to treat the start of a new parameter as a boundary, which resolves the described bug. The implementation is comprehensive, adding several helper methods and improving state management for streaming. However, I've identified a critical regression in the non-streaming parsing logic caused by a change in a regular expression. Please see my detailed comment.
| # This allows </parameter> to appear in parameter content | ||
| self.tool_call_parameter_regex = re.compile( | ||
| r"<parameter=(.*?)(?:</parameter>|(?=<parameter=)|(?=</function>)|$)", | ||
| r"<parameter=(.*?)(?:(?=<parameter=)|(?=</function>)|$)", |
There was a problem hiding this comment.
The change to tool_call_parameter_regex seems to introduce a regression in parsing non-streaming tool calls. By removing </parameter> as a delimiter, the regex will now incorrectly include any text between a valid </parameter> tag and the next structural boundary (<parameter=, </function>, or end of string) as part of the parameter's value.
For example, with the model output ...<parameter=p1>value</parameter> garbage<parameter=p2>...:
- The old regex would correctly parse the value of
p1as"value". - The new regex will parse the value as
"value</parameter> garbage". The subsequentre.subin_parse_xml_function_callwill not remove this, as it only handles</parameter>at the very end of the string.
This could lead to corrupted parameter values. While the intent is to allow </parameter> within string content, this change has a significant side effect. Consider reverting this regex and handling the case of </parameter> inside string values specifically within the streaming logic, to avoid affecting the non-streaming path.
| r"<parameter=(.*?)(?:(?=<parameter=)|(?=</function>)|$)", | |
| r"<parameter=(.*?)(?:</parameter>|(?=<parameter=)|(?=</function>)|$)", |
|
This pull request has merge conflicts that must be resolved before it can be |
Summary
This PR fixes a malformed-stream boundary bug in
Qwen3CoderToolParserforstream:truetool calls.When a string parameter is missing
</parameter>and the next parameter starts immediately, the parser could treat the next<parameter=...>token as part of the previous string value.The fix makes the streaming string path treat the next parameter start token as a hard boundary.
Root cause
In the string-streaming branch, boundary detection handled:
</parameter></function></tool_call>but not
<parameter=.So malformed output like:
could leak
"<parameter=state>..."intocity.Fix
In
vllm/tool_parsers/qwen3coder_tool_parser.py:self.parameter_prefix(<parameter=) in boundary candidate detection for streaming string values.self.parameter_prefixin partial-boundary holdback logic so partial<parameter=fragments are not emitted as string content.Scope (intentionally minimal)
vllm/tool_parsers/qwen3coder_tool_parser.pymainValidation
Notes