Skip to content

Conversation

@Jason-CKY
Copy link
Contributor

@Jason-CKY Jason-CKY commented Aug 21, 2025

Purpose

each streamed token is duplicated into previous_texts[i] when running a model with tool_parser set, required tool choice and streaming. This will cause JSONDecodeError when running partial_json_parser.loads(current_text) in extract_tool_call_required_streaming method.

This PR removes the duplicate addition in the required tool choice code path introduced in https://github.com/vllm-project/vllm/blame/v0.10.1.1/vllm/entrypoints/openai/serving_chat.py#L873-L876 by removing the changes to previous_texts in the required tool call code path.

The change was part of the feature in 8e8e0b6#diff-f3135631994e5e8f63fff36f5fb493f404a7c253c004183613a007548156e558

The previous_texts[i] is updated in both https://github.com/vllm-project/vllm/blob/v0.10.1/vllm/entrypoints/openai/serving_chat.py#L764-L765 and https://github.com/vllm-project/vllm/blob/v0.10.1/vllm/entrypoints/openai/serving_chat.py#L873-L876 for the case of non-reasoning model, tool call parser set, tool call used with required tool choice and streaming enabled.

Test Plan

Test Result

(Optional) Documentation Update


Essential Elements of an Effective PR Description Checklist
  • The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
  • The test plan, such as providing test command.
  • The test results, such as pasting the results comparison before and after, or e2e results
  • (Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.

@Jason-CKY Jason-CKY requested a review from aarnphm as a code owner August 21, 2025 06:18
@mergify mergify bot added the frontend label Aug 21, 2025
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request addresses a bug in the streaming logic for required tool choice, where previous_texts was being updated twice, leading to duplicated content and a subsequent JSONDecodeError. The change correctly removes the redundant update within the required tool choice code block. This fix relies on the centralized update logic at the end of the loop, which correctly handles both cases with and without a reasoning_parser. The change is a clean and effective fix for the reported issue.

@github-actions
Copy link

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

🚀

@Jason-CKY Jason-CKY force-pushed the fix-required-tool-use-streaming-error branch from 700eb93 to e14c18c Compare August 21, 2025 06:31
@Jason-CKY Jason-CKY changed the title remove duplicate appending of text to previous_texts in required tool… [Bugfix] remove duplicate appending of text to previous_texts in required tool… Aug 21, 2025
@Jason-CKY Jason-CKY changed the title [Bugfix] remove duplicate appending of text to previous_texts in required tool… [Bugfix] remove duplicate tokens streamed in required tool choice streaming Aug 21, 2025
@Jason-CKY
Copy link
Contributor Author

@aarnphm this PR is ready for review

@Jason-CKY
Copy link
Contributor Author

@aarnphm this PR is ready for review

@aarnphm can you take a look at this issue? This is causing all streaming tool calls with tool_choice=required to fail with JSONDecodeError currently.

Copy link
Collaborator

@chaunceyjiang chaunceyjiang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you provide the minimal steps to reproduce this?
I'd like to test it locally. This code segment has existed for a long time.

@n0gu-furiosa
Copy link
Contributor

n0gu-furiosa commented Sep 15, 2025

Hi @chaunceyjiang and @Jason-CKY,

I'm not the original author of this PR, but I ran into the same issue and have been eagerly waiting for this to get merged. I also have a few related fixes around the required tool choice, which depend on this PR being merged first. Thanks @Jason-CKY for opening it.

Here are the minimal steps to reproduce:

# Prerequisite: run the server, e.g. `vllm serve meta-llama/Llama-3.1-8B-Instruct`

from openai import OpenAI

client = OpenAI(base_url="http://localhost:8000/v1", api_key="...")

tools = [{
    "type": "function",
    "function": {
        "name": "get_weather",
        "description": "Get the current weather in a given location",
        "parameters": {
            "type": "object",
            "properties": {
                "location": {"type": "string", "description": "City and state, e.g., 'San Francisco, CA'"},
                "unit": {"type": "string", "enum": ["celsius", "fahrenheit"]}
            },
            "required": ["location", "unit"]
        }
    }
}]

for response in client.chat.completions.create(
    model=client.models.list().data[0].id,
    messages=[{"role": "user", "content": "What's the weather like in San Francisco?"}],
    tools=tools,
    tool_choice="required",
    stream=True,
):
    print(response)

Without this fix, the code breaks with the following error (right after the role chunk is sent):

openai.APIError: Expecting property name enclosed in double quotes: line 1 column 3 (char 2)

If you're curious, you can print current_text like below, and you'll quickly see why the bug happens:

diff --git a/vllm/entrypoints/openai/serving_chat.py b/vllm/entrypoints/openai/serving_chat.py
index 579f6f537..d7a2ea5fc 100644
--- a/vllm/entrypoints/openai/serving_chat.py
+++ b/vllm/entrypoints/openai/serving_chat.py
@@ -807,6 +807,7 @@ class OpenAIServingChat(OpenAIServing):
                         previous_text = previous_texts[i]
                         current_text = previous_text + delta_text
                         fn_name_returned = function_name_returned[i]
+                        print(current_text)

Thank you both in advance.

@chaunceyjiang
Copy link
Collaborator

Hi @n0gu-furiosa, thank you for providing the reproduction steps. I have noticed this issue. It seems to have been introduced by #20707.

Copy link
Collaborator

@chaunceyjiang chaunceyjiang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks.

I ran the tests locally, and everything worked correctly.

vllm serve /home/jovyan/qwen3-8b  --reasoning-parser qwen3  --enable-auto-tool-choice --tool-call-parser hermes

and

vllm serve /home/jovyan/qwen3-8b   --enable-auto-tool-choice --tool-call-parser hermes

It makes sense to record previous_texts here in a unified way.

/LGTM

@chaunceyjiang chaunceyjiang added the ready ONLY add when PR is ready to merge/full CI is needed label Sep 15, 2025
@chaunceyjiang chaunceyjiang merged commit 68dbde5 into vllm-project:main Sep 16, 2025
54 of 55 checks passed
FeiDaLI pushed a commit to FeiDaLI/vllm that referenced this pull request Sep 25, 2025
xuebwang-amd pushed a commit to xuebwang-amd/vllm that referenced this pull request Oct 10, 2025
…eaming (vllm-project#23312)

Signed-off-by: Jason Cheng <[email protected]>
Co-authored-by: Chauncey <[email protected]>
Signed-off-by: xuebwang-amd <[email protected]>
choprahetarth pushed a commit to Tandemn-Labs/vllm that referenced this pull request Oct 11, 2025
xuebwang-amd pushed a commit to xuebwang-amd/vllm that referenced this pull request Oct 24, 2025
…eaming (vllm-project#23312)

Signed-off-by: Jason Cheng <[email protected]>
Co-authored-by: Chauncey <[email protected]>
Signed-off-by: xuebwang-amd <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

frontend ready ONLY add when PR is ready to merge/full CI is needed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants