[Bugfix] Fix tool call arguments parsed as content/reasoning in harmony streaming by jfrery · Pull Request #35449 · vllm-project/vllm

jfrery · 2026-02-26T22:24:08Z

Purpose

Fix streaming tool calling with openai/gpt-oss-120b. Tool call arguments were not being parsed properly in streaming mode -- they ended up in reasoning/content instead of being routed as proper tool_calls deltas.

The root cause is that extract_harmony_streaming_delta used a per-token-group heuristic that tracked tool call transitions via prev_recipient within a single chunk. This broke when:

The model's channel transition from analysis to commentary with functions.* recipient arrived in multi-token batches, causing tool call JSON to be dumped into reasoning_content instead of structured tool_calls ([Bug]: Streaming tool call randomly failed when using gpt-oss-120b/20b #27641)
stream_interval > 1 caused multiple messages to complete in one yield, losing tool call arguments or emitting arguments: "{}" ([Bug]: --stream-interval > 1 causes tool call arguments to be empty/lost #31501)
The if not cur_channel and delta_text: cur_channel = "final" fallback misrouted reasoning tokens into content

This PR replaces the per-token-group heuristic with parser-level message diffing via a persistent HarmonyStreamingState dataclass that tracks emitted message count, tool call indices, and in-progress message state across chunks.

Fixes #27641
Fixes #31501
Related (closed): #28635, #30099, #25560
Related (open): #30222, #30204

Test Plan

python -m pytest tests/entrypoints/openai/test_serving_chat_stream_harmony.py -v

Test Result

18 passed

Essential Elements of an Effective PR Description Checklist

The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
The test plan, such as providing test command.
The test results, such as pasting the results comparison before and after, or e2e results
(Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
(Optional) Release notes update.

github-actions · 2026-02-26T22:24:16Z

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors.

You ask your reviewers to trigger select CI tests on top of fastcheck CI.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

If you have any questions, please reach out to us on Slack at https://slack.vllm.ai.

🚀

gemini-code-assist

Code Review

This pull request refactors the harmony streaming delta extraction to use a more robust parser-level state diffing mechanism, replacing the previous per-token heuristic. The introduction of HarmonyStreamingState to persist state across chunks is a solid approach that should fix the described issues with tool call indexing and argument streaming across arbitrary chunk boundaries. The changes in stream_harmony.py are well-structured, and the tests have been updated thoroughly to cover the new implementation. I have one concern regarding a potential issue in serving.py where a remnant of the old logic might lead to incorrect behavior.

vllm/entrypoints/openai/chat_completion/serving.py

mergify · 2026-02-26T22:28:04Z

Hi @jfrery, the pre-commit checks have failed. Please run:

uv pip install pre-commit
pre-commit install
pre-commit run --all-files

Then, commit the changes and push to your branch.

For future commits, pre-commit will run automatically on changed files before each commit.

Tip

Is mypy or markdownlint failing?

mypy and markdownlint are run differently in CI. If the failure is related to either of these checks, please use the following commands to run them locally:

# For mypy (substitute "3.10" with the failing version if needed)
pre-commit run --hook-stage manual mypy-3.10
# For markdownlint
pre-commit run --hook-stage manual markdownlint

ehfd · 2026-02-27T10:47:36Z

CC @bbrowning @chaunceyjiang

mergify · 2026-03-02T08:07:47Z

Hi @jfrery, the pre-commit checks have failed. Please run:

uv pip install pre-commit
pre-commit install
pre-commit run --all-files

Then, commit the changes and push to your branch.

For future commits, pre-commit will run automatically on changed files before each commit.

Tip

Is mypy or markdownlint failing?

mypy and markdownlint are run differently in CI. If the failure is related to either of these checks, please use the following commands to run them locally:

# For mypy (substitute "3.10" with the failing version if needed)
pre-commit run --hook-stage manual mypy-3.10
# For markdownlint
pre-commit run --hook-stage manual markdownlint

bbrowning · 2026-03-02T17:57:16Z

vllm/entrypoints/openai/chat_completion/serving.py

+                tool_call_info = tool_parser.extract_tool_calls(
+                    "",
+                    request=request,
+                )


For harmony models, won't this throw an exception (that we then log below) because we're not passing token ids into extract_tool_calls? See

vllm/vllm/tool_parsers/openai_tool_parser.py

Lines 40 to 43 in cc0d565

if token_ids is None:

raise NotImplementedError(

"OpenAIToolParser requires token IDs and does not support text-based extraction." # noqa: E501

)

Good catch, you're right. Fixed by forwarding token_ids=token_ids. Added # type: ignore to match the non-streaming path at L1514

bbrowning · 2026-03-02T17:58:58Z

vllm/entrypoints/openai/chat_completion/serving.py

+            return DeltaMessage(content=content), False
+
+        if request.include_reasoning and reasoning:
+            return DeltaMessage(content=reasoning), False


This should be DeltaMessage(reasoning=reasoning) since this is meant to emit reasoning, right?

Indeed, fixed now uses DeltaMessage(reasoning=reasoning)

mergify · 2026-03-05T17:41:03Z

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @jfrery.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

…op boundaries Signed-off-by: jfrery <jordan.frery@zama.ai>

ikaadil · 2026-03-07T22:27:52Z

Hi @jfrery and @bbrowning do you have any update about this PR?

jfrery requested review from DarkLight1337, NickLucche, aarnphm, chaunceyjiang, robertgshaw2-redhat and russellb as code owners February 26, 2026 22:24

mergify bot added frontend gpt-oss Related to GPT-OSS models bug Something isn't working labels Feb 26, 2026

github-project-automation bot added this to gpt-oss Issues & Enhancements Feb 26, 2026

github-project-automation bot moved this to To Triage in gpt-oss Issues & Enhancements Feb 26, 2026

gemini-code-assist bot reviewed Feb 26, 2026

View reviewed changes

vllm/entrypoints/openai/chat_completion/serving.py Outdated Show resolved Hide resolved

jfrery force-pushed the fix/harmony-stream-parser-state-diffing branch 3 times, most recently from f54ebfb to e7920ad Compare February 26, 2026 22:45

jfrery mentioned this pull request Feb 27, 2026

fix(cvm): patch vLLM Harmony streaming tool-call fallback concrete-security/umbra#78

Merged

2 tasks

jfrery force-pushed the fix/harmony-stream-parser-state-diffing branch 3 times, most recently from a56825a to 35fc70d Compare February 27, 2026 08:09

jfrery changed the title ~~[Bugfix] Refactor harmony streaming delta to use parser-level state diffing~~ [Bugfix] Fix tool call arguments parsed as content/reasoning in harmony streaming Feb 27, 2026

jfrery force-pushed the fix/harmony-stream-parser-state-diffing branch from 35fc70d to adbcb67 Compare March 2, 2026 08:02

jfrery force-pushed the fix/harmony-stream-parser-state-diffing branch from adbcb67 to 3e30474 Compare March 2, 2026 09:18

bbrowning reviewed Mar 2, 2026

View reviewed changes

jfrery force-pushed the fix/harmony-stream-parser-state-diffing branch from 3e30474 to a85fbbf Compare March 2, 2026 19:17

jfrery requested a review from bbrowning March 2, 2026 19:20

will-deines mentioned this pull request Mar 4, 2026

[Bugfix] Fix Harmony streaming cross-channel delta accumulation #36011

Open

5 tasks

mergify bot added the needs-rebase label Mar 5, 2026

jfrery force-pushed the fix/harmony-stream-parser-state-diffing branch from a85fbbf to a9dcefa Compare March 6, 2026 11:14

mergify bot removed the needs-rebase label Mar 6, 2026

jfrery force-pushed the fix/harmony-stream-parser-state-diffing branch 2 times, most recently from 1ef748d to 4859131 Compare March 6, 2026 12:51

[Bugfix] Fix Harmony streaming tool call recovery across chunk and st…

9e4263b

…op boundaries Signed-off-by: jfrery <jordan.frery@zama.ai>

jfrery force-pushed the fix/harmony-stream-parser-state-diffing branch from 4859131 to 9e4263b Compare March 6, 2026 12:58

Pradyun92 mentioned this pull request Mar 14, 2026

[Bugfix] Fix harmony streaming tool call crash and argument splitting #37070

Open

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Bugfix] Fix tool call arguments parsed as content/reasoning in harmony streaming#35449

[Bugfix] Fix tool call arguments parsed as content/reasoning in harmony streaming#35449
jfrery wants to merge 1 commit intovllm-project:mainfrom
jfrery:fix/harmony-stream-parser-state-diffing

jfrery commented Feb 26, 2026 •

edited

Loading

Uh oh!

github-actions bot commented Feb 26, 2026

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

Uh oh!

mergify bot commented Feb 26, 2026

Uh oh!

ehfd commented Feb 27, 2026

Uh oh!

mergify bot commented Mar 2, 2026

Uh oh!

bbrowning Mar 2, 2026

Uh oh!

jfrery Mar 2, 2026

Uh oh!

bbrowning Mar 2, 2026 •

edited

Loading

Uh oh!

jfrery Mar 2, 2026

Uh oh!

mergify bot commented Mar 5, 2026

Uh oh!

ikaadil commented Mar 7, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

	if token_ids is None:
	raise NotImplementedError(
	"OpenAIToolParser requires token IDs and does not support text-based extraction." # noqa: E501
	)

Uh oh!

Conversation

jfrery commented Feb 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Test Plan

Test Result

Uh oh!

github-actions bot commented Feb 26, 2026

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

mergify bot commented Feb 26, 2026

Uh oh!

ehfd commented Feb 27, 2026

Uh oh!

mergify bot commented Mar 2, 2026

Uh oh!

bbrowning Mar 2, 2026

Choose a reason for hiding this comment

Uh oh!

jfrery Mar 2, 2026

Choose a reason for hiding this comment

Uh oh!

bbrowning Mar 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jfrery Mar 2, 2026

Choose a reason for hiding this comment

Uh oh!

mergify bot commented Mar 5, 2026

Uh oh!

ikaadil commented Mar 7, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

jfrery commented Feb 26, 2026 •

edited

Loading

bbrowning Mar 2, 2026 •

edited

Loading