[Bugfix] remove duplicate tokens streamed in required tool choice streaming #23312

Jason-CKY · 2025-08-21T06:18:10Z

Purpose

each streamed token is duplicated into previous_texts[i] when running a model with tool_parser set, required tool choice and streaming. This will cause JSONDecodeError when running partial_json_parser.loads(current_text) in extract_tool_call_required_streaming method.

This PR removes the duplicate addition in the required tool choice code path introduced in https://github.com/vllm-project/vllm/blame/v0.10.1.1/vllm/entrypoints/openai/serving_chat.py#L873-L876 by removing the changes to previous_texts in the required tool call code path.

The change was part of the feature in 8e8e0b6#diff-f3135631994e5e8f63fff36f5fb493f404a7c253c004183613a007548156e558

The previous_texts[i] is updated in both https://github.com/vllm-project/vllm/blob/v0.10.1/vllm/entrypoints/openai/serving_chat.py#L764-L765 and https://github.com/vllm-project/vllm/blob/v0.10.1/vllm/entrypoints/openai/serving_chat.py#L873-L876 for the case of non-reasoning model, tool call parser set, tool call used with required tool choice and streaming enabled.

Test Plan

Test Result

(Optional) Documentation Update

Essential Elements of an Effective PR Description Checklist

The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
The test plan, such as providing test command.
The test results, such as pasting the results comparison before and after, or e2e results
(Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.

gemini-code-assist

Code Review

This pull request addresses a bug in the streaming logic for required tool choice, where previous_texts was being updated twice, leading to duplicated content and a subsequent JSONDecodeError. The change correctly removes the redundant update within the required tool choice code block. This fix relies on the centralized update logic at the end of the loop, which correctly handles both cases with and without a reasoning_parser. The change is a clean and effective fix for the reported issue.

github-actions · 2025-08-21T06:21:16Z

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

🚀

… choice streaming Signed-off-by: Jason Cheng <[email protected]>

Jason-CKY · 2025-08-22T14:04:31Z

@aarnphm this PR is ready for review

Jason-CKY · 2025-08-29T04:59:51Z

@aarnphm this PR is ready for review

@aarnphm can you take a look at this issue? This is causing all streaming tool calls with tool_choice=required to fail with JSONDecodeError currently.

chaunceyjiang

Can you provide the minimal steps to reproduce this?
I'd like to test it locally. This code segment has existed for a long time.

n0gu-furiosa · 2025-09-15T07:57:50Z

Hi @chaunceyjiang and @Jason-CKY,

I'm not the original author of this PR, but I ran into the same issue and have been eagerly waiting for this to get merged. I also have a few related fixes around the required tool choice, which depend on this PR being merged first. Thanks @Jason-CKY for opening it.

Here are the minimal steps to reproduce:

# Prerequisite: run the server, e.g. `vllm serve meta-llama/Llama-3.1-8B-Instruct`

from openai import OpenAI

client = OpenAI(base_url="http://localhost:8000/v1", api_key="...")

tools = [{
    "type": "function",
    "function": {
        "name": "get_weather",
        "description": "Get the current weather in a given location",
        "parameters": {
            "type": "object",
            "properties": {
                "location": {"type": "string", "description": "City and state, e.g., 'San Francisco, CA'"},
                "unit": {"type": "string", "enum": ["celsius", "fahrenheit"]}
            },
            "required": ["location", "unit"]
        }
    }
}]

for response in client.chat.completions.create(
    model=client.models.list().data[0].id,
    messages=[{"role": "user", "content": "What's the weather like in San Francisco?"}],
    tools=tools,
    tool_choice="required",
    stream=True,
):
    print(response)

Without this fix, the code breaks with the following error (right after the role chunk is sent):

openai.APIError: Expecting property name enclosed in double quotes: line 1 column 3 (char 2)

If you're curious, you can print current_text like below, and you'll quickly see why the bug happens:

diff --git a/vllm/entrypoints/openai/serving_chat.py b/vllm/entrypoints/openai/serving_chat.py
index 579f6f537..d7a2ea5fc 100644
--- a/vllm/entrypoints/openai/serving_chat.py
+++ b/vllm/entrypoints/openai/serving_chat.py
@@ -807,6 +807,7 @@ class OpenAIServingChat(OpenAIServing):
                         previous_text = previous_texts[i]
                         current_text = previous_text + delta_text
                         fn_name_returned = function_name_returned[i]
+                        print(current_text)

Thank you both in advance.

chaunceyjiang · 2025-09-15T08:26:28Z

Hi @n0gu-furiosa, thank you for providing the reproduction steps. I have noticed this issue. It seems to have been introduced by #20707.

chaunceyjiang

Thanks.

I ran the tests locally, and everything worked correctly.

vllm serve /home/jovyan/qwen3-8b  --reasoning-parser qwen3  --enable-auto-tool-choice --tool-call-parser hermes

and

vllm serve /home/jovyan/qwen3-8b   --enable-auto-tool-choice --tool-call-parser hermes

It makes sense to record previous_texts here in a unified way.

/LGTM

…eaming (vllm-project#23312) Signed-off-by: Jason Cheng <[email protected]> Co-authored-by: Chauncey <[email protected]>

…eaming (vllm-project#23312) Signed-off-by: Jason Cheng <[email protected]> Co-authored-by: Chauncey <[email protected]> Signed-off-by: xuebwang-amd <[email protected]>

…eaming (vllm-project#23312) Signed-off-by: Jason Cheng <[email protected]> Co-authored-by: Chauncey <[email protected]>

…eaming (vllm-project#23312) Signed-off-by: Jason Cheng <[email protected]> Co-authored-by: Chauncey <[email protected]> Signed-off-by: xuebwang-amd <[email protected]>

Jason-CKY requested a review from aarnphm as a code owner August 21, 2025 06:18

mergify bot added the frontend label Aug 21, 2025

gemini-code-assist bot reviewed Aug 21, 2025

View reviewed changes

remove duplicate appending of text to previous_texts in required tool…

e14c18c

… choice streaming Signed-off-by: Jason Cheng <[email protected]>

Jason-CKY force-pushed the fix-required-tool-use-streaming-error branch from 700eb93 to e14c18c Compare August 21, 2025 06:31

Merge branch 'main' into fix-required-tool-use-streaming-error

39e7ef1

Jason-CKY changed the title ~~remove duplicate appending of text to previous_texts in required tool…~~ [Bugfix] remove duplicate appending of text to previous_texts in required tool… Aug 21, 2025

Jason-CKY changed the title ~~[Bugfix] remove duplicate appending of text to previous_texts in required tool…~~ [Bugfix] remove duplicate tokens streamed in required tool choice streaming Aug 21, 2025

Merge branch 'main' into fix-required-tool-use-streaming-error

a1d599e

Jason-CKY added 4 commits August 24, 2025 15:46

Merge branch 'main' into fix-required-tool-use-streaming-error

e4d3d6c

Merge branch 'main' into fix-required-tool-use-streaming-error

d96f275

Merge branch 'main' into fix-required-tool-use-streaming-error

0ec6729

Merge branch 'main' into fix-required-tool-use-streaming-error

86f8ad6

Merge branch 'main' into fix-required-tool-use-streaming-error

edd0bb2

shijun-yin mentioned this pull request Sep 13, 2025

[BugFix] enable DOTALL to match multi-line tool_call parameters in extract_tool_call_required_streaming #24668

Merged

5 tasks

chaunceyjiang reviewed Sep 15, 2025

View reviewed changes

chaunceyjiang mentioned this pull request Sep 15, 2025

[Chore] Add E2E test for 'required' tool_choice with streaming #24579

Open

chaunceyjiang approved these changes Sep 15, 2025

View reviewed changes

Merge branch 'main' into fix-required-tool-use-streaming-error

3ab27d3

chaunceyjiang added the ready ONLY add when PR is ready to merge/full CI is needed label Sep 15, 2025

avigny mentioned this pull request Sep 15, 2025

[Bug]: Tool call fails with streaming enabled and tool_choice "required" #24509

Closed

1 task

chaunceyjiang merged commit 68dbde5 into vllm-project:main Sep 16, 2025
54 of 55 checks passed

FeiDaLI pushed a commit to FeiDaLI/vllm that referenced this pull request Sep 25, 2025

[Bugfix] remove duplicate tokens streamed in required tool choice str…

3f79f6b

…eaming (vllm-project#23312) Signed-off-by: Jason Cheng <[email protected]> Co-authored-by: Chauncey <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[Bugfix] remove duplicate tokens streamed in required tool choice streaming #23312

[Bugfix] remove duplicate tokens streamed in required tool choice streaming #23312

Uh oh!

Jason-CKY commented Aug 21, 2025 •

edited by github-actions bot

Loading

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

github-actions bot commented Aug 21, 2025

Uh oh!

Jason-CKY commented Aug 22, 2025

Uh oh!

Jason-CKY commented Aug 29, 2025

Uh oh!

chaunceyjiang left a comment

Uh oh!

n0gu-furiosa commented Sep 15, 2025 •

edited

Loading

Uh oh!

chaunceyjiang commented Sep 15, 2025

Uh oh!

chaunceyjiang left a comment •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

[Bugfix] remove duplicate tokens streamed in required tool choice streaming #23312

[Bugfix] remove duplicate tokens streamed in required tool choice streaming #23312

Uh oh!

Conversation

Jason-CKY commented Aug 21, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Test Plan

Test Result

(Optional) Documentation Update

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

github-actions bot commented Aug 21, 2025

Uh oh!

Jason-CKY commented Aug 22, 2025

Uh oh!

Jason-CKY commented Aug 29, 2025

Uh oh!

chaunceyjiang left a comment

Choose a reason for hiding this comment

Uh oh!

n0gu-furiosa commented Sep 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

chaunceyjiang commented Sep 15, 2025

Uh oh!

chaunceyjiang left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Jason-CKY commented Aug 21, 2025 •

edited by github-actions bot

Loading

n0gu-furiosa commented Sep 15, 2025 •

edited

Loading

chaunceyjiang left a comment •

edited

Loading