[Bugfix][Tool Parser] Fix Qwen3 Coder parser to stream tool call arguments#32536
[Bugfix][Tool Parser] Fix Qwen3 Coder parser to stream tool call arguments#32536karanb192 wants to merge 1 commit into
Conversation
…ments When using Qwen3 Coder for tool calls, the parser previously waited until the entire parameter value was complete before sending any stream chunks. This caused the stream to halt for long parameters like code blocks, with no indication of progress. This change enables incremental streaming of tool call arguments by: - Setting `self.in_param = True` when a parameter starts but isn't complete - Streaming the parameter key header and partial values as they arrive - Properly handling multiple end markers (</parameter>, <parameter=, </function>) - JSON-escaping streamed values correctly Fixes vllm-project#30439 Signed-off-by: Karan Bansal <karanb192@gmail.com>
There was a problem hiding this comment.
Code Review
This pull request successfully addresses a bug where the Qwen3 Coder tool parser would halt when streaming tool calls with long parameter values. The changes enable incremental streaming of tool call arguments, which is a significant improvement. The implementation correctly handles the complexities of parsing the XML-like format and generating JSON argument chunks on the fly. However, I have identified a critical performance issue in the logic for streaming parameter values. The current approach has quadratic time complexity, which will negatively impact performance for the very long parameter values this change is intended to support. My review comment provides details on this issue.
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 3 potential issues.
Bugbot Autofix is OFF. To automatically fix reported issues with Cloud Agents, enable Autofix in the Cursor dashboard.
Comment @cursor review or bugbot run to trigger another review on this PR
| else: | ||
| # Still streaming, wait for more content | ||
| return None | ||
| # Parameter is incomplete - start streaming |
There was a problem hiding this comment.
Missing string close when function end detected during streaming
Medium Severity
When </function> is detected in the accumulated tool_text while in streaming mode (in_param = True), the code outputs } to close the JSON but never closes the open string value with a closing quote. This produces malformed JSON like {"param": "partial_value} because in_param is not reset and no " is emitted before the closing brace. The new streaming code path at lines 636-680 can now trigger this scenario when the function end marker is split across tokens.
Additional Locations (1)
| idx = delta_text.find(marker) | ||
| if idx != -1 and (end_idx == -1 or idx < end_idx): | ||
| end_idx = idx | ||
| end_marker = marker |
There was a problem hiding this comment.
Split XML markers cause corrupted parameter values in stream
Medium Severity
When in streaming mode, end markers (</parameter>, <parameter=, </function>) are only detected if complete within a single delta_text. If a marker is tokenized across multiple deltas (e.g., </para then meter>), the partial marker content is streamed as part of the parameter value. This corrupts the output—for example, a value hello could become hello</parametermeter> in the streamed JSON. The code at lines 800-821 unconditionally adds the entire delta_text to current_param_value when no complete marker is found.
Additional Locations (1)
| if self.param_count == 0: | ||
| key_fragment = f'"{self.current_param_name}": "' | ||
| else: | ||
| key_fragment = f', "{self.current_param_name}": "' |
There was a problem hiding this comment.
Streamed parameters always output as strings ignoring type
Low Severity
When entering streaming mode, key_fragment hardcodes an opening quote for a string value (e.g., "param": "), committing the output to JSON string type. The non-streaming path at lines 700-711 calls _convert_param_value() for proper type conversion (integers, booleans, objects). This creates inconsistent output: the same parameter value could be "count": 42 (number) if it completes quickly or "count": "42" (string) if streaming is triggered.
Additional Locations (1)
|
Hi @DarkLight1337 - could you please review this tool parser fix? It addresses issue #30439 where Qwen3 Coder parser wasn't streaming arguments correctly. Thanks! |
|
Please stop pinging me, I'm not able to review tool parsers. You can ask @chaunceyjiang instead. |
|
This pull request has merge conflicts that must be resolved before it can be |
Summary
Changes
self.in_param = True) instead of waiting"code": ") immediately</parameter>,<parameter=, and</function>Test Plan
Fixes #30439