[Bugfix] Qwen3Coder streaming tool call JSON missing opening brace in arguments#35716
[Bugfix] Qwen3Coder streaming tool call JSON missing opening brace in arguments#35716KrxGu wants to merge 3 commits intovllm-project:mainfrom
Conversation
When the first <parameter=> tag arrived in the same streaming delta as
the function header, the opening brace was skipped but json_started was
still flipped True. This caused arguments like `"cmd": "..."}` instead
of `{"cmd": "..."}`, breaking any client that calls json.loads() on
function.arguments. Fixes vllm-project#35266.
- Remove the `parameter_prefix not in delta_text` gate that suppressed
`{` emission
- Remove the premature json_started=True fallback that ran without emitting `{`
- Append closing `}` to the last param fragment when `</function>` is
already present in tool_text so the stream is never left unclosed
- Add regression test reproducing the exact chunk-boundary failure mode
(no GPU or real tokenizer required)
Signed-off-by: KrxGu <krishom70@gmail.com>
There was a problem hiding this comment.
Code Review
This pull request fixes a bug in Qwen3CoderToolParser where streaming tool calls could produce invalid JSON by omitting the opening brace for arguments. The fix correctly ensures the opening brace is always emitted and handles edge cases around chunk boundaries. A comprehensive regression test has been added to prevent this issue from recurring.
My review focuses on the implementation of the fix. While the logic is correct, I've identified a significant amount of duplicated code that was introduced. I've suggested refactoring this into a helper method to improve maintainability. I also pointed out that silently catching all exceptions is risky and should be replaced with logging to provide better visibility into potential parsing errors.
There was a problem hiding this comment.
Pull request overview
Fixes malformed JSON in streamed tool-call arguments produced by Qwen3CoderToolParser (missing { / unclosed }) so downstream clients can reliably json.loads() the concatenated function.arguments.
Changes:
- Make the opening
{emission unconditional on first entry to the function body (remove the prior gating + fallback state corruption). - Reorder/gate function-end handling so parameters aren’t skipped when
</function>arrives in the same accumulated text. - Add a regression test intended to reproduce the missing-opening-brace failure mode (#35266).
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.
| File | Description |
|---|---|
vllm/tool_parsers/qwen3coder_tool_parser.py |
Adjusts the streaming state machine to always emit { once, avoid premature JSON-start state, and ensure } is emitted even in “last-chunk” boundary cases. |
tests/tool_parsers/test_qwen3coder_tool_parser.py |
Adds a regression test for the missing-opening-brace streaming bug. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
…nto helper Addresses review feedback: the duplicated logic for updating prev_tool_call_arr with final parsed arguments was extracted into _update_final_tool_arguments(), called from both the deferred-close path and the inline-close path. Also replaced bare except-pass with logger.warning(exc_info=True) so failures are visible in logs without interrupting the stream. Signed-off-by: KrxGu <krishom70@gmail.com>
…ct#35266 bug The 4-chunk split had <tool_call><function=bash> as one delta. When tool_call_start_token is in delta_text the parser returns None immediately, so the header was emitted on the next chunk whose delta was <parameter=...>. The opening-brace logic then fired on </function> which has no <parameter=, meaning the old gate (parameter_prefix not in delta_text) would pass and the test gave a false negative on the unfixed code. Fix: split into 5 chunks so <tool_call> and <function=bash> are separate deltas. The header is now emitted when delta_text is <function=bash>, and the very next delta is <parameter=command>... which contains <parameter= and is the exact condition the old gate suppressed. Adds detailed inline comment explaining the state machine transitions. Signed-off-by: KrxGu <krishom70@gmail.com>
|
This PR is ready for review. CI looks good, Good To Merge!! |
|
Sure, Tested #35615 locally (GPU-free, mock tokenizer).
Burst failure details: Actual fragments emitted: Root cause: On the burst delta, Also can you please lmk how are we planning to move ahead with my PR(35716)? |
|
This pull request has merge conflicts that must be resolved before it can be |
|
@chaunceyjiang am i supposed to close this PR as #35615 is already merged? |
|
Tool calls still fail for me (nvfp4) |
|
@chaunceyjiang closing this PR. |
Fixes #35266
Problem
When streaming tool calls with
Qwen3CoderToolParser, thefunction.argumentsfield was sometimes missing the opening{, producing invalid JSON like:"command": "echo hi"}
instead of:
{"command": "echo hi"}
This caused
json.loads(arguments)to fail in any client that parses arguments on completion (opencode, claudecode, etc.).Root Cause
Two bugs in
extract_tool_calls_streaming:{emission was gated onself.parameter_prefix not in delta_text, so when the first<parameter=>tag arrived in the same delta as the function header (a valid chunk boundary), the opening brace was silently skipped.json_started = Trueregardless, so the state machine believed JSON had started even though{was never emitted.</function>arrived after a parameter was processed, the closing}had no subsequent call to live in, leaving the stream permanently unclosed.Fix
parameter_prefix not in delta_textgate{is now always emitted once, unconditionally, on first entry into function body.json_started = Truefallback.</function>is already present intool_text, append}to the same delta fragment instead of deferring it to a call that may never come.Testing
Added a regression test in
tests/tool_parsers/test_qwen3coder_tool_parser.pythat directly callsextract_tool_calls_streamingwith hand-crafted chunk boundaries reproducing the exact failure mode (function header and first<parameter=>in the same delta). No GPU or model download required.Verified locally with a mock-tokenizer script against the fixed parser, both the bug-trigger split and normal split produce valid JSON.