-
-
Notifications
You must be signed in to change notification settings - Fork 11.9k
[Bugfix] remove duplicate tokens streamed in required tool choice streaming #23312
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Bugfix] remove duplicate tokens streamed in required tool choice streaming #23312
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request addresses a bug in the streaming logic for required tool choice, where previous_texts was being updated twice, leading to duplicated content and a subsequent JSONDecodeError. The change correctly removes the redundant update within the required tool choice code block. This fix relies on the centralized update logic at the end of the loop, which correctly handles both cases with and without a reasoning_parser. The change is a clean and effective fix for the reported issue.
|
👋 Hi! Thank you for contributing to the vLLM project. 💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels. Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging. To run CI, PR reviewers can either: Add 🚀 |
… choice streaming Signed-off-by: Jason Cheng <[email protected]>
700eb93 to
e14c18c
Compare
|
@aarnphm this PR is ready for review |
chaunceyjiang
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you provide the minimal steps to reproduce this?
I'd like to test it locally. This code segment has existed for a long time.
|
Hi @chaunceyjiang and @Jason-CKY, I'm not the original author of this PR, but I ran into the same issue and have been eagerly waiting for this to get merged. I also have a few related fixes around the Here are the minimal steps to reproduce: # Prerequisite: run the server, e.g. `vllm serve meta-llama/Llama-3.1-8B-Instruct`
from openai import OpenAI
client = OpenAI(base_url="http://localhost:8000/v1", api_key="...")
tools = [{
"type": "function",
"function": {
"name": "get_weather",
"description": "Get the current weather in a given location",
"parameters": {
"type": "object",
"properties": {
"location": {"type": "string", "description": "City and state, e.g., 'San Francisco, CA'"},
"unit": {"type": "string", "enum": ["celsius", "fahrenheit"]}
},
"required": ["location", "unit"]
}
}
}]
for response in client.chat.completions.create(
model=client.models.list().data[0].id,
messages=[{"role": "user", "content": "What's the weather like in San Francisco?"}],
tools=tools,
tool_choice="required",
stream=True,
):
print(response)Without this fix, the code breaks with the following error (right after the If you're curious, you can print Thank you both in advance. |
|
Hi @n0gu-furiosa, thank you for providing the reproduction steps. I have noticed this issue. It seems to have been introduced by #20707. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks.
I ran the tests locally, and everything worked correctly.
vllm serve /home/jovyan/qwen3-8b --reasoning-parser qwen3 --enable-auto-tool-choice --tool-call-parser hermes
and
vllm serve /home/jovyan/qwen3-8b --enable-auto-tool-choice --tool-call-parser hermes
It makes sense to record previous_texts here in a unified way.
/LGTM
…eaming (vllm-project#23312) Signed-off-by: Jason Cheng <[email protected]> Co-authored-by: Chauncey <[email protected]>
…eaming (vllm-project#23312) Signed-off-by: Jason Cheng <[email protected]> Co-authored-by: Chauncey <[email protected]> Signed-off-by: xuebwang-amd <[email protected]>
…eaming (vllm-project#23312) Signed-off-by: Jason Cheng <[email protected]> Co-authored-by: Chauncey <[email protected]>
…eaming (vllm-project#23312) Signed-off-by: Jason Cheng <[email protected]> Co-authored-by: Chauncey <[email protected]> Signed-off-by: xuebwang-amd <[email protected]>
Purpose
each streamed token is duplicated into
previous_texts[i]when running a model withtool_parserset,requiredtool choice andstreaming. This will causeJSONDecodeErrorwhen runningpartial_json_parser.loads(current_text)inextract_tool_call_required_streamingmethod.This PR removes the duplicate addition in the
requiredtool choice code path introduced in https://github.com/vllm-project/vllm/blame/v0.10.1.1/vllm/entrypoints/openai/serving_chat.py#L873-L876 by removing the changes toprevious_textsin therequiredtool call code path.The change was part of the feature in 8e8e0b6#diff-f3135631994e5e8f63fff36f5fb493f404a7c253c004183613a007548156e558
The
previous_texts[i]is updated in both https://github.com/vllm-project/vllm/blob/v0.10.1/vllm/entrypoints/openai/serving_chat.py#L764-L765 and https://github.com/vllm-project/vllm/blob/v0.10.1/vllm/entrypoints/openai/serving_chat.py#L873-L876 for the case of non-reasoning model, tool call parser set, tool call used withrequiredtool choice and streaming enabled.Test Plan
Test Result
(Optional) Documentation Update
Essential Elements of an Effective PR Description Checklist
supported_models.mdandexamplesfor a new model.