Skip to content

[Bugfix] Qwen3 XML parser: interleaved text emission and streaming ID management#40787

Closed
ExtReMLapin wants to merge 6 commits into
vllm-project:mainfrom
ExtReMLapin:qwen3_xml_content_toolcall_order
Closed

[Bugfix] Qwen3 XML parser: interleaved text emission and streaming ID management#40787
ExtReMLapin wants to merge 6 commits into
vllm-project:mainfrom
ExtReMLapin:qwen3_xml_content_toolcall_order

Conversation

@ExtReMLapin
Copy link
Copy Markdown
Contributor

@ExtReMLapin ExtReMLapin commented Apr 24, 2026

Purpose

This PR fixes critical ordering and buffering issues in the qwen3_xml tool parser during streaming. It ensures that free text appearing before or between tool calls is emitted immediately rather than being delayed until the end of the generation. It also corrects how tool call IDs are handled in the OpenAI-compatible stream.

Key Changes

  • Immediate Content Flushing: Modified the parser to flush the text_content_buffer as soon as a new <tool_call> is detected. This allows for correct interleaving of text and tool calls in the output.
  • Single ID Emission: Introduced id_emitted state to ensure that the tool call id is only sent in the first delta of a call. Subsequent deltas for the same call will have id=None, following the OpenAI streaming protocol and preventing client-side issues.
  • Robust Delta Merging: Updated _merge_new_deltas_to_single_response to merge tool call fragments based on their index rather than their id. This is necessary because IDs are now only present in the initial fragment.
  • Fixed Systemic Streaming Issues: Removed several xfail markers from the test suite as this refactor resolves the underlying streaming bugs that were previously causing failures.

Test Plan

  • Added test_qwen3xml_async_streaming_free_text to verify that text between tool calls is emitted in the correct order.
  • Added test_qwen3xml_streaming_text_after_tool_call to ensure trailing text is not lost.
  • Verified that all existing qwen3_xml tests now pass without xfail.

Signed-off-by: CNE Pierre FICHEPOIL <pierre-1.fichepoil@gendarmerie.interieur.gouv.fr>
Copy link
Copy Markdown

@claude claude Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Claude Code Review

This pull request is from a fork — automated review is disabled. A repository maintainer can comment @claude review to run a one-time review.

@mergify mergify Bot added qwen Related to Qwen models tool-calling bug Something isn't working labels Apr 24, 2026
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request updates the Qwen3XMLToolParser to ensure that text content appearing between multiple tool calls is correctly emitted by removing the tool_call_index == 0 constraint. A new test case is added to verify this behavior. Feedback suggests that the fix is incomplete as a similar check exists elsewhere that might block text after the final tool call. Furthermore, the test should be refactored to use the provided tokenizer fixture instead of downloading one from the hub, and the unnecessary async markers should be removed.

Comment thread vllm/tool_parsers/qwen3xml_tool_parser.py
Comment thread tests/tool_parsers/test_qwen3xml_tool_parser.py Outdated
CNE Pierre FICHEPOIL added 2 commits April 24, 2026 10:48
Signed-off-by: CNE Pierre FICHEPOIL <pierre-1.fichepoil@gendarmerie.interieur.gouv.fr>
…er tool calls

Signed-off-by: CNE Pierre FICHEPOIL <pierre-1.fichepoil@gendarmerie.interieur.gouv.fr>
@ExtReMLapin
Copy link
Copy Markdown
Contributor Author

Step 3.5 seems to have the same issue @csy0225

@ExtReMLapin ExtReMLapin changed the title [Bugfix] Fix delayed text emission between tool calls in Qwen3XML [Bugfix] Fix delayed text emission between tool calls in qwen3_xml Apr 24, 2026
CNE Pierre FICHEPOIL added 2 commits April 24, 2026 15:24
Signed-off-by: CNE Pierre FICHEPOIL <pierre-1.fichepoil@gendarmerie.interieur.gouv.fr>
Signed-off-by: CNE Pierre FICHEPOIL <pierre-1.fichepoil@gendarmerie.interieur.gouv.fr>
@ExtReMLapin ExtReMLapin changed the title [Bugfix] Fix delayed text emission between tool calls in qwen3_xml [Bugfix] Qwen3 XML parser: interleaved text emission and streaming ID management Apr 24, 2026
@ExtReMLapin
Copy link
Copy Markdown
Contributor Author

CC : @bbrowning re-enabled your xfail tests !

… fallback

Signed-off-by: CNE Pierre FICHEPOIL <pierre-1.fichepoil@gendarmerie.interieur.gouv.fr>
@ExtReMLapin ExtReMLapin marked this pull request as draft April 24, 2026 22:16
@ExtReMLapin
Copy link
Copy Markdown
Contributor Author

superseded by #40861

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working qwen Related to Qwen models tool-calling

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

1 participant