Skip to content

[Bugfix] Fix tool call arguments parsed as content/reasoning in harmony streaming#35449

Open
jfrery wants to merge 1 commit intovllm-project:mainfrom
jfrery:fix/harmony-stream-parser-state-diffing
Open

[Bugfix] Fix tool call arguments parsed as content/reasoning in harmony streaming#35449
jfrery wants to merge 1 commit intovllm-project:mainfrom
jfrery:fix/harmony-stream-parser-state-diffing

Conversation

@jfrery
Copy link
Copy Markdown

@jfrery jfrery commented Feb 26, 2026

Purpose

Fix streaming tool calling with openai/gpt-oss-120b. Tool call arguments were not being parsed properly in streaming mode -- they ended up in reasoning/content instead of being routed as proper tool_calls deltas.

The root cause is that extract_harmony_streaming_delta used a per-token-group heuristic that tracked tool call transitions via prev_recipient within a single chunk. This broke when:

This PR replaces the per-token-group heuristic with parser-level message diffing via a persistent HarmonyStreamingState dataclass that tracks emitted message count, tool call indices, and in-progress message state across chunks.

Fixes #27641
Fixes #31501
Related (closed): #28635, #30099, #25560
Related (open): #30222, #30204

Test Plan

python -m pytest tests/entrypoints/openai/test_serving_chat_stream_harmony.py -v

Test Result

18 passed

Essential Elements of an Effective PR Description Checklist
  • The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
  • The test plan, such as providing test command.
  • The test results, such as pasting the results comparison before and after, or e2e results
  • (Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
  • (Optional) Release notes update.

@github-actions
Copy link
Copy Markdown

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors.

You ask your reviewers to trigger select CI tests on top of fastcheck CI.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

If you have any questions, please reach out to us on Slack at https://slack.vllm.ai.

🚀

@mergify mergify bot added frontend gpt-oss Related to GPT-OSS models bug Something isn't working labels Feb 26, 2026
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request refactors the harmony streaming delta extraction to use a more robust parser-level state diffing mechanism, replacing the previous per-token heuristic. The introduction of HarmonyStreamingState to persist state across chunks is a solid approach that should fix the described issues with tool call indexing and argument streaming across arbitrary chunk boundaries. The changes in stream_harmony.py are well-structured, and the tests have been updated thoroughly to cover the new implementation. I have one concern regarding a potential issue in serving.py where a remnant of the old logic might lead to incorrect behavior.

@mergify
Copy link
Copy Markdown

mergify bot commented Feb 26, 2026

Hi @jfrery, the pre-commit checks have failed. Please run:

uv pip install pre-commit
pre-commit install
pre-commit run --all-files

Then, commit the changes and push to your branch.

For future commits, pre-commit will run automatically on changed files before each commit.

Tip

Is mypy or markdownlint failing?
mypy and markdownlint are run differently in CI. If the failure is related to either of these checks, please use the following commands to run them locally:
# For mypy (substitute "3.10" with the failing version if needed)
pre-commit run --hook-stage manual mypy-3.10
# For markdownlint
pre-commit run --hook-stage manual markdownlint

@jfrery jfrery force-pushed the fix/harmony-stream-parser-state-diffing branch 3 times, most recently from f54ebfb to e7920ad Compare February 26, 2026 22:45
@jfrery jfrery force-pushed the fix/harmony-stream-parser-state-diffing branch 3 times, most recently from a56825a to 35fc70d Compare February 27, 2026 08:09
@jfrery jfrery changed the title [Bugfix] Refactor harmony streaming delta to use parser-level state diffing [Bugfix] Fix tool call arguments parsed as content/reasoning in harmony streaming Feb 27, 2026
@ehfd
Copy link
Copy Markdown
Contributor

ehfd commented Feb 27, 2026

CC @bbrowning @chaunceyjiang

@jfrery jfrery force-pushed the fix/harmony-stream-parser-state-diffing branch from 35fc70d to adbcb67 Compare March 2, 2026 08:02
@mergify
Copy link
Copy Markdown

mergify bot commented Mar 2, 2026

Hi @jfrery, the pre-commit checks have failed. Please run:

uv pip install pre-commit
pre-commit install
pre-commit run --all-files

Then, commit the changes and push to your branch.

For future commits, pre-commit will run automatically on changed files before each commit.

Tip

Is mypy or markdownlint failing?
mypy and markdownlint are run differently in CI. If the failure is related to either of these checks, please use the following commands to run them locally:
# For mypy (substitute "3.10" with the failing version if needed)
pre-commit run --hook-stage manual mypy-3.10
# For markdownlint
pre-commit run --hook-stage manual markdownlint

@jfrery jfrery force-pushed the fix/harmony-stream-parser-state-diffing branch from adbcb67 to 3e30474 Compare March 2, 2026 09:18
Comment on lines +2020 to +2023
tool_call_info = tool_parser.extract_tool_calls(
"",
request=request,
)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For harmony models, won't this throw an exception (that we then log below) because we're not passing token ids into extract_tool_calls? See

if token_ids is None:
raise NotImplementedError(
"OpenAIToolParser requires token IDs and does not support text-based extraction." # noqa: E501
)

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch, you're right. Fixed by forwarding token_ids=token_ids. Added # type: ignore to match the non-streaming path at L1514

return DeltaMessage(content=content), False

if request.include_reasoning and reasoning:
return DeltaMessage(content=reasoning), False
Copy link
Copy Markdown
Contributor

@bbrowning bbrowning Mar 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be DeltaMessage(reasoning=reasoning) since this is meant to emit reasoning, right?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed, fixed now uses DeltaMessage(reasoning=reasoning)

@jfrery jfrery force-pushed the fix/harmony-stream-parser-state-diffing branch from 3e30474 to a85fbbf Compare March 2, 2026 19:17
@jfrery jfrery requested a review from bbrowning March 2, 2026 19:20
@mergify
Copy link
Copy Markdown

mergify bot commented Mar 5, 2026

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @jfrery.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

@mergify mergify bot added the needs-rebase label Mar 5, 2026
@jfrery jfrery force-pushed the fix/harmony-stream-parser-state-diffing branch from a85fbbf to a9dcefa Compare March 6, 2026 11:14
@mergify mergify bot removed the needs-rebase label Mar 6, 2026
@jfrery jfrery force-pushed the fix/harmony-stream-parser-state-diffing branch 2 times, most recently from 1ef748d to 4859131 Compare March 6, 2026 12:51
…op boundaries

Signed-off-by: jfrery <jordan.frery@zama.ai>
@jfrery jfrery force-pushed the fix/harmony-stream-parser-state-diffing branch from 4859131 to 9e4263b Compare March 6, 2026 12:58
@ikaadil
Copy link
Copy Markdown
Contributor

ikaadil commented Mar 7, 2026

Hi @jfrery and @bbrowning do you have any update about this PR?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working frontend gpt-oss Related to GPT-OSS models

Projects

Status: To Triage

Development

Successfully merging this pull request may close these issues.

[Bug]: --stream-interval > 1 causes tool call arguments to be empty/lost [Bug]: Streaming tool call randomly failed when using gpt-oss-120b/20b

4 participants