[Bugfix] Fix tool call arguments parsed as content/reasoning in harmony streaming#35449
[Bugfix] Fix tool call arguments parsed as content/reasoning in harmony streaming#35449jfrery wants to merge 1 commit intovllm-project:mainfrom
Conversation
|
👋 Hi! Thank you for contributing to the vLLM project. 💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels. Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run You ask your reviewers to trigger select CI tests on top of Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging. To run CI, PR reviewers can either: Add If you have any questions, please reach out to us on Slack at https://slack.vllm.ai. 🚀 |
There was a problem hiding this comment.
Code Review
This pull request refactors the harmony streaming delta extraction to use a more robust parser-level state diffing mechanism, replacing the previous per-token heuristic. The introduction of HarmonyStreamingState to persist state across chunks is a solid approach that should fix the described issues with tool call indexing and argument streaming across arbitrary chunk boundaries. The changes in stream_harmony.py are well-structured, and the tests have been updated thoroughly to cover the new implementation. I have one concern regarding a potential issue in serving.py where a remnant of the old logic might lead to incorrect behavior.
|
Hi @jfrery, the pre-commit checks have failed. Please run: uv pip install pre-commit
pre-commit install
pre-commit run --all-filesThen, commit the changes and push to your branch. For future commits, Tip Is
|
f54ebfb to
e7920ad
Compare
a56825a to
35fc70d
Compare
35fc70d to
adbcb67
Compare
|
Hi @jfrery, the pre-commit checks have failed. Please run: uv pip install pre-commit
pre-commit install
pre-commit run --all-filesThen, commit the changes and push to your branch. For future commits, Tip Is
|
adbcb67 to
3e30474
Compare
| tool_call_info = tool_parser.extract_tool_calls( | ||
| "", | ||
| request=request, | ||
| ) |
There was a problem hiding this comment.
For harmony models, won't this throw an exception (that we then log below) because we're not passing token ids into extract_tool_calls? See
vllm/vllm/tool_parsers/openai_tool_parser.py
Lines 40 to 43 in cc0d565
There was a problem hiding this comment.
Good catch, you're right. Fixed by forwarding token_ids=token_ids. Added # type: ignore to match the non-streaming path at L1514
| return DeltaMessage(content=content), False | ||
|
|
||
| if request.include_reasoning and reasoning: | ||
| return DeltaMessage(content=reasoning), False |
There was a problem hiding this comment.
This should be DeltaMessage(reasoning=reasoning) since this is meant to emit reasoning, right?
There was a problem hiding this comment.
Indeed, fixed now uses DeltaMessage(reasoning=reasoning)
3e30474 to
a85fbbf
Compare
|
This pull request has merge conflicts that must be resolved before it can be |
a85fbbf to
a9dcefa
Compare
1ef748d to
4859131
Compare
…op boundaries Signed-off-by: jfrery <jordan.frery@zama.ai>
4859131 to
9e4263b
Compare
|
Hi @jfrery and @bbrowning do you have any update about this PR? |
Purpose
Fix streaming tool calling with
openai/gpt-oss-120b. Tool call arguments were not being parsed properly in streaming mode -- they ended up in reasoning/content instead of being routed as propertool_callsdeltas.The root cause is that
extract_harmony_streaming_deltaused a per-token-group heuristic that tracked tool call transitions viaprev_recipientwithin a single chunk. This broke when:analysistocommentarywithfunctions.*recipient arrived in multi-token batches, causing tool call JSON to be dumped intoreasoning_contentinstead of structuredtool_calls([Bug]: Streaming tool call randomly failed when using gpt-oss-120b/20b #27641)stream_interval > 1caused multiple messages to complete in one yield, losing tool call arguments or emittingarguments: "{}"([Bug]: --stream-interval > 1 causes tool call arguments to be empty/lost #31501)if not cur_channel and delta_text: cur_channel = "final"fallback misrouted reasoning tokens into contentThis PR replaces the per-token-group heuristic with parser-level message diffing via a persistent
HarmonyStreamingStatedataclass that tracks emitted message count, tool call indices, and in-progress message state across chunks.Fixes #27641
Fixes #31501
Related (closed): #28635, #30099, #25560
Related (open): #30222, #30204
Test Plan
Test Result
Essential Elements of an Effective PR Description Checklist
supported_models.mdandexamplesfor a new model.