fix: Responses API streaming tool call support for non-harmony models#36445
fix: Responses API streaming tool call support for non-harmony models#36445herve-ves wants to merge 1 commit intovllm-project:mainfrom
Conversation
|
Hi @herve-ves, the pre-commit checks have failed. Please run: uv pip install pre-commit
pre-commit install
pre-commit run --all-filesThen, commit the changes and push to your branch. For future commits, Tip Is
|
There was a problem hiding this comment.
Code Review
This pull request effectively addresses a bug in the Responses API where streaming tool calls for non-harmony models were not working correctly. The changes correctly handle models with both reasoning and tool parsers by processing them sequentially and introduce the necessary logic to emit proper function call events. My main feedback concerns a potentially fragile mechanism for detecting the end of tool call arguments, which could be improved for better robustness and modularity.
Note: Security Review did not run due to the size of the PR.
| if args_delta == "}": | ||
| tc_idx = tc.index if tc.index is not None else 0 | ||
| if tc_idx < len( | ||
| tool_parser.prev_tool_call_arr | ||
| ): | ||
| tc_info = ( | ||
| tool_parser.prev_tool_call_arr[ | ||
| tc_idx | ||
| ] | ||
| ) | ||
| for event in ( | ||
| emit_function_call_done_events( | ||
| tc_info.get("name", fn_name), | ||
| tc_info.get( | ||
| "arguments", "{}" | ||
| ), | ||
| tool_streaming_state, | ||
| ) | ||
| ): | ||
| yield ( | ||
| _increment_sequence_number_and_return( | ||
| event | ||
| ) | ||
| ) |
There was a problem hiding this comment.
The logic to detect the end of a function call's arguments by checking if args_delta == "}" is quite fragile. It creates a tight coupling between this serving logic and the specific implementation details of the tool parser.
This approach assumes that a tool parser will always emit the final closing brace of arguments as a standalone delta. This contract is implicit and might be violated by other current or future tool parsers, which could lead to ResponseFunctionCallArgumentsDoneEvent not being emitted. For instance, a parser might stream "{}" for an empty object, or the final } might be part of a larger token.
A more robust design would be to have the parser explicitly signal the completion of a tool call's arguments within the DeltaMessage. For example, the ToolCall object in DeltaMessage could have a done: bool flag. This would decouple the serving logic from the parser's output tokenization and create a clearer interface.
There was a problem hiding this comment.
Good observation. The args_delta == "}" check mirrors how the Chat Completions streaming path handles tool call completion — it's the existing contract between Qwen3CoderToolParser and the serving layer (the parser explicitly emits "}" as a standalone delta at qwen3coder_tool_parser.py:689-696).
I agree that an explicit done signal in DeltaMessage / DeltaToolCall would be a cleaner interface, but that would require changes to the tool parser protocol and all existing parsers — a larger refactor better suited for a follow-up PR. This PR focuses on bringing the Responses API streaming path to parity with the existing Chat Completions streaming behavior.
Add tool parser integration to _process_simple_streaming_events so that tool call XML is parsed and emitted as proper function_call_arguments delta/done events instead of leaking into output_text events. - Initialize tool_parser from self.parser.tool_parser_cls - Call extract_tool_calls_streaming for each delta - Emit ResponseFunctionCallArgumentsDeltaEvent / DoneEvent via existing emit_function_call_delta_events / emit_function_call_done_events helpers - Suppress text events once tool calls are detected (tools_streamed flag) - Close message output item before function call events when content precedes tool calls - Sync tool_streaming_state.current_output_index with main output index
776d948 to
aaa917d
Compare
|
Hi @herve-ves, the pre-commit checks have failed. Please run: uv pip install pre-commit
pre-commit install
pre-commit run --all-filesThen, commit the changes and push to your branch. For future commits, Tip Is
|
|
see #29947 |
| if args_delta == "}" and tc_idx < len( | ||
| tool_parser.prev_tool_call_arr | ||
| ): | ||
| tc_info = tool_parser.prev_tool_call_arr[tc_idx] |
There was a problem hiding this comment.
|
This pull request has merge conflicts that must be resolved before it can be |
Purpose
Fix Responses API streaming tool call support for non-harmony models.
Fixes #36435
_process_simple_streaming_eventscurrently does not emit properresponse.function_call_arguments.delta/response.function_call_arguments.doneevents during streaming. Instead, raw tool call XML (e.g.<tool_call><function=name>...) leaks intoresponse.output_text.deltaevents.Two root causes:
reasoning_parserandtool_parserare in a mutually exclusiveif/elifchain. When a model has both (e.g. Qwen3.5 with<think>tags), theelif tool_parser:branch is never reached — after reasoning ends, tool call XML falls through as plain content.Even without a reasoning parser, there was no code to convert
DeltaMessage.tool_callsinto Responses API function call events. The original code had# todo(kebe7jun) tool call support.This PR modifies
_process_simple_streaming_eventsto:is_reasoning_end(), matching the Chat Completions streaming behavior)ResponseFunctionCallArgumentsDeltaEvent/ResponseFunctionCallArgumentsDoneEventvia existingemit_function_call_delta_events/emit_function_call_done_eventshelperstool_streaming_state.current_output_indexwith the main output indexResponseTextDeltaEventonce tool calls are detectedTest Plan
Test Result
Before (broken): tool call XML in `response.output_text.delta`
After (fixed): proper function call events
ResponseCompletedEventalso correctly containsResponseFunctionToolCalloutput items (via the existing non-streaming_make_response_output_itemspath).Essential Elements of an Effective PR Description Checklist
supported_models.mdandexamplesfor a new model.