Skip to content

[Bugfix] Fix Qwen3Coder prev_tool_call_arr double-emission on parse failure#41466

Draft
ToastyTheBot wants to merge 1 commit into
vllm-project:mainfrom
ToastyTheBot:fix/qwen3coder-prev-tool-call-arr-fallback
Draft

[Bugfix] Fix Qwen3Coder prev_tool_call_arr double-emission on parse failure#41466
ToastyTheBot wants to merge 1 commit into
vllm-project:mainfrom
ToastyTheBot:fix/qwen3coder-prev-tool-call-arr-fallback

Conversation

@ToastyTheBot
Copy link
Copy Markdown

@ToastyTheBot ToastyTheBot commented May 1, 2026

Summary

When _parse_xml_function_call fails during streaming (returns None or throws), prev_tool_call_arr still holds the "{}" placeholder from the header-sent step. The serving layer's remaining-args check then sees "{}" as the expected argument and computes a wrong remainder, causing {"arguments": "{}"} to be emitted twice.

Context

We run Qwen3.6-27B-FP8 with MTP=3 in production and observed a suite of tool-calling issues with speculative decoding. After applying several open PRs together, tool call reliability improved dramatically:

We also opened two other sibling draft PRs in an attempt to perfectly fix the issue:

This PR addresses a remaining edge case where XML parsing fails mid-stream and the placeholder "{}" leaks through to the client.

Root Cause Analysis

During streaming tool call parsing, the Qwen3CoderToolParser sends a tool call header with "{}" as the placeholder arguments in prev_tool_call_arr. After the header is sent, incremental text is parsed via _parse_xml_function_call to extract the actual arguments.

When _parse_xml_function_call fails (returns None or throws an exception), prev_tool_call_arr still contains the "{}" placeholder. The serving layer's _should_check_for_unstreamed_tool_arg_tokens check then sees "{}" as the expected arguments and computes a remainder of zero — but the incremental streamed_args_for_tool contains the actual streamed text. This mismatch causes {"arguments": "{}"} to be emitted as a delta, double-emitting the empty arguments.

Changes

  • Adds a parse_succeeded flag to track whether _parse_xml_function_call returned a valid result
  • After the try/except block, if parsing failed, copies the incrementally streamed arguments from streamed_args_for_tool into prev_tool_call_arr plus the closing } (appended because line 1335 adds } to streamed_args_for_tool after the fallback runs — without it the serving layer's remainder check loses the brace and produces broken JSON)
  • Uses an explicit flag to avoid false positives where "{}" is a legitimate empty-parameter result

Why This Is Safe

  • Only activates on the failure path (_parse_xml_function_call returns None or throws)
  • The success path (normal parsing) is completely unchanged
  • The fallback produces the same result as a successful parse would have — the streamed arguments are copied directly
  • Bounds checks on current_tool_index prevent index out of range

Reproduction

Stream a tool-calling request with the qwen3_coder parser where the model's XML output is malformed enough to cause _parse_xml_function_call to fail. The double-emission of "{}" appears in the streamed deltas.

Test Plan

  • Verify streaming tool calls with qwen3_coder parser still work normally when parsing succeeds
  • Verify that when _parse_xml_function_call fails, the streamed arguments are used as fallback instead of "{}"
  • Run pytest tests/tool_parsers/test_qwen3coder_tool_parser.py

Copy link
Copy Markdown

@claude claude Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Claude Code Review

This pull request is from a fork — automated review is disabled. A repository maintainer can comment @claude review to run a one-time review.

@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 1, 2026

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

PRs do not trigger a full CI run by default. Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

If you have any questions, please reach out to us on Slack at https://slack.vllm.ai.

Agent Guidelines

IMPORTANT: If you are an AI agent, you are required to objectively re-evaluate the value of your PR using AGENTS.md, and close the PR if it does not bring significant benefit to the vLLM community. Failure to do so may result in an immediate ban.

🚀

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a fallback mechanism in the qwen3coder_tool_parser to handle failures in XML function call parsing by utilizing incrementally streamed arguments. Feedback indicates that the fallback value should include a closing brace to ensure the serving layer correctly identifies the remainder and maintains JSON integrity.

Comment thread vllm/tool_parsers/qwen3coder_tool_parser.py Outdated
@ToastyTheBot ToastyTheBot marked this pull request as draft May 1, 2026 19:36
@ToastyTheBot ToastyTheBot force-pushed the fix/qwen3coder-prev-tool-call-arr-fallback branch from 1cd5cbe to 63cbb1b Compare May 3, 2026 13:18
@mergify mergify Bot added the v1 label May 3, 2026
…ilure

When _parse_xml_function_call fails during streaming, prev_tool_call_arr
still holds the "{}" placeholder from the header-sent step. The serving
layer's remaining-args check then sees "{}" as the expected argument and
computes a wrong remainder, causing {"arguments": "{}"} to be emitted
twice.

Add a parse_succeeded flag to track whether parsing succeeded, and
fall back to the incrementally streamed arguments from
streamed_args_for_tool when parsing fails. The closing brace is
appended to match the serving layer's remainder check expectations.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working qwen Related to Qwen models tool-calling v1

Projects

Status: No status

Development

Successfully merging this pull request may close these issues.

2 participants