[Frontend] Add streaming tool-call support to Responses API (non-Harmony) by sumitaryal · Pull Request #29726 · vllm-project/vllm

sumitaryal · 2025-11-29T12:18:12Z

Purpose

Fix for #29725,

Summary

This pull request fixes an issue where non harmony models using the Responses API with streaming and tools emit only ResponseTextDeltaEvent events, instead of ResponseFunctionCallArgumentsDeltaEvent when a tool call is selected. This prevents clients from reliably detecting and parsing tool call arguments from the stream.

Fix

This change updates the streaming path for non harmony models so that:

When the model selects a tool call, the arguments are surfaced as ResponseFunctionCallArgumentsDeltaEvent instead of plain text deltas. The event structure is now consistent with harmony models and with the non streaming Responses API behavior. With this, clients can treat harmony and non harmony models uniformly when handling tool calls during streaming.

Test Plan

Added a e2e test

Test Result:

For the example mentioned in the issue, the events emitted are as:

ResponseOutputItemAddedEvent(item=ResponseFunctionToolCall(arguments='', call_id='call_89716b1d95f08274', name='get_weather', type='function_call', id='chatcmpl-tool-a36680e5d0778655', status='in_progress'), output_index=0, sequence_number=2, type='response.output_item.added')
ResponseFunctionCallArgumentsDeltaEvent(delta='{"location": "', item_id='chatcmpl-tool-a36680e5d0778655', output_index=0, sequence_number=3, type='response.function_call_arguments.delta')
ResponseFunctionCallArgumentsDeltaEvent(delta='Paris', item_id='chatcmpl-tool-a36680e5d0778655', output_index=0, sequence_number=4, type='response.function_call_arguments.delta')
ResponseFunctionCallArgumentsDeltaEvent(delta='"}', item_id='chatcmpl-tool-a36680e5d0778655', output_index=0, sequence_number=5, type='response.function_call_arguments.delta')
ResponseFunctionCallArgumentsDoneEvent(arguments='{"location": "Paris"}', item_id='chatcmpl-tool-a36680e5d0778655', name='get_weather', output_index=0, sequence_number=6, type='response.function_call_arguments.done')
ResponseOutputItemDoneEvent(item=ResponseFunctionToolCall(arguments='{"location": "Paris"}', call_id='call_89716b1d95f08274', name='get_weather', type='function_call', id='chatcmpl-tool-a36680e5d0778655', status='completed'), output_index=0, sequence_number=7, type='response.output_item.done')

Essential Elements of an Effective PR Description Checklist

The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
The test plan, such as providing test command.
The test results, such as pasting the results comparison before and after, or e2e results
(Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
(Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

chatgpt-codex-connector · 2025-11-29T12:18:20Z

Codex usage limits have been reached for code reviews. Please check with the admins of this repo to increase the limits by adding credits.

gemini-code-assist

Code Review

This pull request adds support for streaming tool calls in the Responses API for non-Harmony models, which is a great enhancement. The changes introduce new logic to parse tool calls from the streaming response and emit the correct events, aligning the behavior with Harmony models.

My review focuses on the new parsing logic. I've identified a critical issue in the manual JSON parsing implementation that could lead to incorrect behavior with certain tool call arguments. I've also pointed out an area where error handling could be improved to make debugging easier.

The addition of an end-to-end test for streaming tool calls is a good step towards ensuring correctness.

gemini-code-assist · 2025-11-29T12:20:29Z

vllm/entrypoints/openai/serving_responses.py

+    def _bracket_level(s: str, opening: str = "{", closing: str = "}") -> int:
+        """Calculate the current level of nested brackets in a given string."""
+        level = 0
+        for char in s:
+            if char == opening:
+                level += 1
+            elif char == closing:
+                level -= 1
+        return level


The current implementation of _bracket_level does not account for JSON string contents. If a string within the JSON contains the opening ({) or closing (}) characters, the bracket level count will be incorrect. This will cause _filter_delta_text to fail, and ultimately _extract_tool_call_required_streaming will produce incorrect output or fail when streaming tool calls.

For example, a tool call with an argument like {"code": "function foo() { return 1; }"} would be misparsed.

This is a critical issue as it can lead to incorrect parsing of tool calls. For robustness, the parsing logic should be made aware of JSON string boundaries. Consider using partial_json_parser more extensively to handle JSON parsing instead of manual bracket counting, as it is already a dependency and is designed to handle such cases robustly.

gemini-code-assist · 2025-11-29T12:20:30Z

vllm/entrypoints/openai/serving_responses.py

+            try:
+                tool_parser = self.tool_parser(tokenizer)
+            except Exception:
+                logger.exception("Error in tool parser creation.")
+                tool_choice_auto = False
+                tool_parser = None


Catching a broad Exception and silently disabling tool_choice_auto can hide important errors from the user. If the tool parser fails to initialize due to a misconfiguration or a bug, the request will unexpectedly fall back to not using tools, which can be difficult to debug. It would be better to log this as an error and potentially fail the request with an informative message, rather than silently changing its behavior.

ApostaC · 2025-12-01T20:22:37Z

cc @DarkLight1337 @robertgshaw2-redhat

chaunceyjiang · 2025-12-03T06:22:41Z

vllm/entrypoints/openai/serving_responses.py

        )

+    @staticmethod
+    def _bracket_level(s: str, opening: str = "{", closing: str = "}") -> int:


I’m not quite sure I understand. It seems like you’re re-implementing a streaming parser for tool_call?

This isn’t a new parser; it’s the same streaming tool-call handling that Chat already has. For tool_choice='auto' we invoke the existing ToolParser.extract_tool_calls_streaming; the manual code is only for tool_choice='required' which mirrors the Chat code path for tool_choice='required'

chaunceyjiang

Thanks @sumitaryal
I think this is overly complicated and doesn’t seem to reuse the existing tool call parser.

Currently, a community contributor has already implemented a simpler and more straightforward solution.

#20874 (comment)

mergify · 2025-12-05T13:04:37Z

Hi @sumitaryal, the pre-commit checks have failed. Please run:

uv pip install pre-commit
pre-commit install
pre-commit run --all-files

Then, commit the changes and push to your branch.

For future commits, pre-commit will run automatically on changed files before each commit.

[Responses] Implement streaming tool call support for non-harmony models

bdb09f5

sumitaryal requested review from DarkLight1337, NickLucche, aarnphm, chaunceyjiang and robertgshaw2-redhat as code owners November 29, 2025 12:18

mergify bot added frontend gpt-oss Related to GPT-OSS models labels Nov 29, 2025

github-project-automation bot added this to gpt-oss Issues & Enhancements Nov 29, 2025

github-project-automation bot moved this to To Triage in gpt-oss Issues & Enhancements Nov 29, 2025

gemini-code-assist bot reviewed Nov 29, 2025

View reviewed changes

sumitaryal mentioned this pull request Nov 29, 2025

[Bug]: Responses API: Streaming returns ResponseTextDeltaEvent instead of ResponseFunctionCallArgumentsDeltaEvent for tool calls while using non-harmony models #29725

Open

1 task

chaunceyjiang reviewed Dec 3, 2025

View reviewed changes

sumitaryal closed this Feb 8, 2026

github-project-automation bot moved this from To Triage to Done in gpt-oss Issues & Enhancements Feb 8, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Frontend] Add streaming tool-call support to Responses API (non-Harmony)#29726

[Frontend] Add streaming tool-call support to Responses API (non-Harmony)#29726
sumitaryal wants to merge 1 commit intovllm-project:mainfrom
sumitaryal:feature/responses-streaming-tool-calls-non-harmony-models

sumitaryal commented Nov 29, 2025 •

edited by github-actions bot

Loading

Uh oh!

chatgpt-codex-connector bot commented Nov 29, 2025

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Nov 29, 2025

Uh oh!

gemini-code-assist bot Nov 29, 2025

Uh oh!

ApostaC commented Dec 1, 2025

Uh oh!

chaunceyjiang Dec 3, 2025

Uh oh!

sumitaryal Dec 3, 2025 •

edited

Loading

Uh oh!

chaunceyjiang left a comment •

edited

Loading

Uh oh!

mergify bot commented Dec 5, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

Conversation

sumitaryal commented Nov 29, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Summary

Fix

Test Plan

Test Result:

Uh oh!

chatgpt-codex-connector bot commented Nov 29, 2025

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Nov 29, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Nov 29, 2025

Choose a reason for hiding this comment

Uh oh!

ApostaC commented Dec 1, 2025

Uh oh!

chaunceyjiang Dec 3, 2025

Choose a reason for hiding this comment

Uh oh!

sumitaryal Dec 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

chaunceyjiang left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mergify bot commented Dec 5, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

sumitaryal commented Nov 29, 2025 •

edited by github-actions bot

Loading

sumitaryal Dec 3, 2025 •

edited

Loading

chaunceyjiang left a comment •

edited

Loading