Skip to content

[Tool Parser] Qwen3Coder: stop string streaming at next <parameter=> boundary#37829

Open
ec-jt wants to merge 1 commit into
vllm-project:mainfrom
ec-jt:pr/qwen3coder-next-param-boundary-clean
Open

[Tool Parser] Qwen3Coder: stop string streaming at next <parameter=> boundary#37829
ec-jt wants to merge 1 commit into
vllm-project:mainfrom
ec-jt:pr/qwen3coder-next-param-boundary-clean

Conversation

@ec-jt
Copy link
Copy Markdown

@ec-jt ec-jt commented Mar 22, 2026

Summary

This PR fixes a malformed-stream boundary bug in Qwen3CoderToolParser for stream:true tool calls.

When a string parameter is missing </parameter> and the next parameter starts immediately, the parser could treat the next <parameter=...> token as part of the previous string value.

The fix makes the streaming string path treat the next parameter start token as a hard boundary.

Root cause

In the string-streaming branch, boundary detection handled:

  • </parameter>
  • </function>
  • </tool_call>

but not <parameter=.

So malformed output like:

<parameter=city>
Dallas
<parameter=state>
TX

could leak "<parameter=state>..." into city.

Fix

In vllm/tool_parsers/qwen3coder_tool_parser.py:

  • Include self.parameter_prefix (<parameter=) in boundary candidate detection for streaming string values.
  • Include self.parameter_prefix in partial-boundary holdback logic so partial <parameter= fragments are not emitted as string content.

Scope (intentionally minimal)

  • Single file: vllm/tool_parsers/qwen3coder_tool_parser.py
  • Single commit on top of main

Validation

  • Full pre-commit hook chain passed on commit (ruff, mypy-local, repo checks).

Notes

  • Previous PR with broader history was closed to keep this one clean and reviewable.

Signed-off-by: ec-jt <james.trappett@elementalcompute.com>
@mergify mergify Bot added qwen Related to Qwen models tool-calling labels Mar 22, 2026
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request refactors the streaming tool call parsing for Qwen3CoderToolParser to correctly handle malformed streams where a parameter's closing tag is missing. The changes introduce a new state for streaming string values and add logic to treat the start of a new parameter as a boundary, which resolves the described bug. The implementation is comprehensive, adding several helper methods and improving state management for streaming. However, I've identified a critical regression in the non-streaming parsing logic caused by a change in a regular expression. Please see my detailed comment.

# This allows </parameter> to appear in parameter content
self.tool_call_parameter_regex = re.compile(
r"<parameter=(.*?)(?:</parameter>|(?=<parameter=)|(?=</function>)|$)",
r"<parameter=(.*?)(?:(?=<parameter=)|(?=</function>)|$)",
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

The change to tool_call_parameter_regex seems to introduce a regression in parsing non-streaming tool calls. By removing </parameter> as a delimiter, the regex will now incorrectly include any text between a valid </parameter> tag and the next structural boundary (<parameter=, </function>, or end of string) as part of the parameter's value.

For example, with the model output ...<parameter=p1>value</parameter> garbage<parameter=p2>...:

  • The old regex would correctly parse the value of p1 as "value".
  • The new regex will parse the value as "value</parameter> garbage". The subsequent re.sub in _parse_xml_function_call will not remove this, as it only handles </parameter> at the very end of the string.

This could lead to corrupted parameter values. While the intent is to allow </parameter> within string content, this change has a significant side effect. Consider reverting this regex and handling the case of </parameter> inside string values specifically within the streaming logic, to avoid affecting the non-streaming path.

Suggested change
r"<parameter=(.*?)(?:(?=<parameter=)|(?=</function>)|$)",
r"<parameter=(.*?)(?:</parameter>|(?=<parameter=)|(?=</function>)|$)",

@mergify
Copy link
Copy Markdown
Contributor

mergify Bot commented May 23, 2026

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @ec-jt.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

@mergify mergify Bot added the needs-rebase label May 23, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

Status: No status

Development

Successfully merging this pull request may close these issues.

1 participant