[Bugfix] Fix tool call streaming for gpt-oss/Harmony models by alexbi29 · Pull Request #33520 · vllm-project/vllm

alexbi29 · 2026-02-01T20:26:20Z

Summary

This PR fixes several issues with tool call handling for gpt-oss models using the Harmony streaming parser:

IndexError in streaming generator: Added auto_tools_called check before accessing prev_tool_call_arr to prevent IndexError when the array is empty.
Missing tool call IDs in non-streaming responses: Added proper ID generation for named tool choice and auto tool choice cases that were missing the required id field.
Split tool calls in streaming: Fixed an issue where a single tool call was being split into multiple entries because:
- Tool call IDs are now stored by recipient name (e.g., functions.glob) instead of index number, since base_index changes between streaming calls as messages complete.
- Continuation chunks now include the same ID as the opening chunk, allowing clients to properly merge them.
- DeltaToolCalls with the same index are merged before sending to avoid multiple entries in one SSE chunk.

Test plan

Tested with opencode client against gpt-oss-120b model
Tool calls now properly stream with consistent IDs across chunks
Clients can correctly aggregate streaming tool call arguments

🤖 Generated with Claude Code

This makes opencode tool calls work properly with v1 chat completion api

github-actions · 2026-02-01T20:26:29Z

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors.

You ask your reviewers to trigger select CI tests on top of fastcheck CI.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

If you have any questions, please reach out to us on Slack at https://slack.vllm.ai.

🚀

dosubot · 2026-02-01T20:26:30Z

Related Documentation

No published documentation to review for changes on this repository.

Write your first living document

^{How did I do? Any feedback?}

mergify · 2026-02-01T20:26:56Z

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @alexbi29.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

gemini-code-assist

Code Review

This pull request introduces several important bug fixes for tool call handling in streaming and non-streaming modes for gpt-oss/Harmony models. The changes address an IndexError during streaming, add missing tool call IDs in non-streaming responses, and fix issues with split tool calls in streaming by improving how tool call IDs are managed and by merging delta tool calls. The implementation looks solid and correctly addresses the described issues. The code is well-structured and the fixes are robust. I have no major concerns.

mergify · 2026-02-01T20:35:50Z

Hi @alexbi29, the pre-commit checks have failed. Please run:

uv pip install pre-commit
pre-commit install
pre-commit run --all-files

Then, commit the changes and push to your branch.

For future commits, pre-commit will run automatically on changed files before each commit.

Tip

Is mypy or markdownlint failing?

mypy and markdownlint are run differently in CI. If the failure is related to either of these checks, please use the following commands to run them locally:

# For mypy (substitute "3.10" with the failing version if needed)
pre-commit run --hook-stage manual mypy-3.10
# For markdownlint
pre-commit run --hook-stage manual markdownlint

chaunceyjiang

Could you share a minimal example to reproduce the problem?

alexbi29 · 2026-02-02T07:08:23Z

@chaunceyjiang here it is.

import argparse, json
from openai import OpenAI

TOOLS = [{
    "type": "function",
    "function": {
        "name": "search_files",
        "description": "Search for files matching a pattern",
        "parameters": {
            "type": "object",
            "properties": {
                "pattern": {"type": "string"},
                "path": {"type": "string"},
            },
            "required": ["pattern", "path"],
        },
    },
}]

MSG = [{
    "role": "user",
    "content": "Search for all JavaScript files in "
               "/home/user/my-project/src/components/dashboard",
}]

def main(host: str, port: int, model: str | None):
    c = OpenAI(base_url=f"http://{host}:{port}/v1", api_key="not-needed")
    if not model:
        ms = c.models.list()
        model = ms.data[0].id if ms.data else "gpt-4"
        print(f"Auto-detected model: {model}")

    print(f"Host: {host}:{port}\nRequest: stream=True with tools\n" + "=" * 60)

    agg: dict[int, dict] = {}

    try:
        stream = c.chat.completions.create(
            model=model,
            messages=MSG,
            tools=TOOLS,
            tool_choice="required",
            stream=True,
        )
    except Exception as e:
        print(f"Error creating request: {e}")
        return 1

    print("Streaming chunks:\n" + "-" * 40)
    try:
        for ch in stream:
            choice = ch.choices[0] if ch.choices else None
            if choice and choice.finish_reason:
                print(f"  Finish: {choice.finish_reason}")

            tcs = getattr(getattr(choice, "delta", None), "tool_calls", None) or []
            for tc in tcs:
                fn = tc.function
                print(f"  Chunk: index={tc.index}, id={tc.id!r}, "
                      f"name={(fn.name if fn else None)!r}, "
                      f"args={(fn.arguments if fn else None)!r}")

                rec = agg.setdefault(tc.index, {"id": None, "name": None, "arguments": ""})
                if tc.id:
                    rec["id"] = tc.id
                if fn and fn.name:
                    rec["name"] = fn.name
                if fn and fn.arguments:
                    rec["arguments"] += fn.arguments
    except Exception as e:
        print(f"Error during streaming: {e}")

    print("-" * 40)
    print("Aggregated tool calls (by index):\n" + "-" * 40)

    for i, rec in sorted(agg.items()):
        print(f"  Index {i}: id={rec['id']!r} name={rec['name']!r}")
        print(f"    arguments: {rec['arguments']!r}")
        try:
            print(f"    parsed: {json.loads(rec['arguments'])}")
        except json.JSONDecodeError as e:
            print(f"    ERROR: Invalid JSON - {e}")

    print("\n" + "=" * 60 + "\nBUG CHECK:\n" + "=" * 60)

    bugs = []
    if not agg:
        bugs.append("No tool calls received (stream may have crashed).")
    if len(agg) > 1:
        bugs.append(f"Multiple tool call indices (expected 1, got {len(agg)}).")
    for i, rec in agg.items():
        if not rec["id"]:
            bugs.append(f"Tool call at index {i} has null/empty id.")
        try:
            json.loads(rec["arguments"])
        except json.JSONDecodeError:
            bugs.append(f"Tool call at index {i} has invalid JSON arguments.")

    if bugs:
        print("❌ BUG DETECTED:")
        for b in bugs:
            print("  -", b)
        return 1

    print("✓ No bugs detected - tool call streaming working correctly")
    return 0

if __name__ == "__main__":
    ap = argparse.ArgumentParser()
    ap.add_argument("--host", default="pook.lan")
    ap.add_argument("--port", type=int, default=8000)
    ap.add_argument("--model")
    args = ap.parse_args()
    raise SystemExit(main(args.host, args.port, args.model))

ubuntu@epyc:~$ /home/ubuntu/vllm-env/bin/python /home/ubuntu/vllm-src/test_tool_call_streaming.py --host pook.lan --port 8000
Auto-detected model: openai/gpt-oss-20b
Host: pook.lan:8000
Request: stream=True with tools
============================================================
Streaming chunks:
----------------------------------------
  Finish: stop
----------------------------------------
Aggregated tool calls (by index):
----------------------------------------

============================================================
BUG CHECK:
============================================================
❌ BUG DETECTED:
  - No tool calls received (stream may have crashed).

This PR fixes several issues with tool call handling for gpt-oss models using the Harmony streaming parser: 1. **IndexError in streaming generator**: Added `auto_tools_called` check before accessing `prev_tool_call_arr` to prevent IndexError when the array is empty. 2. **Missing tool call IDs in non-streaming responses**: Added proper ID generation for named tool choice and auto tool choice cases that were missing the required `id` field. 3. **Split tool calls in streaming**: Fixed an issue where a single tool call was being split into multiple entries because: - Tool call IDs are now stored by recipient name (e.g., "functions.glob") instead of index number, since `base_index` changes between streaming calls as messages complete. - Continuation chunks now include the same ID as the opening chunk, allowing clients to properly merge them. - DeltaToolCalls with the same index are merged before sending to avoid multiple entries in one SSE chunk. Tested with opencode client against gpt-oss-120b model.

mergify · 2026-03-16T10:35:35Z

Hi @alexbi29, the pre-commit checks have failed. Please run:

uv pip install pre-commit>=4.5.1
pre-commit install
pre-commit run --all-files

Then, commit the changes and push to your branch.

For future commits, pre-commit will run automatically on changed files before each commit.

Tip

Is mypy failing?

mypy is run differently in CI. If the failure is related to this check, please use the following command to run it locally:

# For mypy (substitute "3.10" with the failing version if needed)
pre-commit run --hook-stage manual mypy-3.10

alexbi29 requested review from aarnphm and chaunceyjiang as code owners February 1, 2026 20:26

alexbi29 force-pushed the fix-harmony-tool-call-streaming branch from 4c43d67 to fa4c429 Compare February 1, 2026 20:26

mergify bot added frontend gpt-oss Related to GPT-OSS models labels Feb 1, 2026

mergify bot added the needs-rebase label Feb 1, 2026

github-project-automation bot added this to gpt-oss Issues & Enhancements Feb 1, 2026

mergify bot added the bug Something isn't working label Feb 1, 2026

github-project-automation bot moved this to To Triage in gpt-oss Issues & Enhancements Feb 1, 2026

gemini-code-assist bot reviewed Feb 1, 2026

View reviewed changes

alexbi29 force-pushed the fix-harmony-tool-call-streaming branch from fa4c429 to d78fb82 Compare February 1, 2026 20:31

mergify bot removed the needs-rebase label Feb 1, 2026

chaunceyjiang self-assigned this Feb 2, 2026

chaunceyjiang reviewed Feb 2, 2026

View reviewed changes

will-deines mentioned this pull request Mar 4, 2026

[Bugfix] Fix Harmony streaming cross-channel delta accumulation #36011

Open

5 tasks

Pradyun92 mentioned this pull request Mar 14, 2026

[Bugfix] Fix harmony streaming tool call crash and argument splitting #37070

Open

5 tasks

alexbi29 force-pushed the fix-harmony-tool-call-streaming branch from d78fb82 to 2e1d598 Compare March 16, 2026 10:02

alexbi29 requested review from DarkLight1337 and russellb as code owners March 16, 2026 10:02

Merge branch 'main' into fix-harmony-tool-call-streaming

9c246d7

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Bugfix] Fix tool call streaming for gpt-oss/Harmony models#33520

[Bugfix] Fix tool call streaming for gpt-oss/Harmony models#33520
alexbi29 wants to merge 2 commits intovllm-project:mainfrom
alexbi29:fix-harmony-tool-call-streaming

alexbi29 commented Feb 1, 2026 •

edited by github-actions bot

Loading

Uh oh!

github-actions bot commented Feb 1, 2026

Uh oh!

dosubot bot commented Feb 1, 2026

Uh oh!

mergify bot commented Feb 1, 2026

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

mergify bot commented Feb 1, 2026

Uh oh!

chaunceyjiang left a comment

Uh oh!

alexbi29 commented Feb 2, 2026 •

edited

Loading

Uh oh!

mergify bot commented Mar 16, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

alexbi29 commented Feb 1, 2026 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Test plan

Uh oh!

github-actions bot commented Feb 1, 2026

Uh oh!

dosubot bot commented Feb 1, 2026

Uh oh!

mergify bot commented Feb 1, 2026

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

mergify bot commented Feb 1, 2026

Uh oh!

chaunceyjiang left a comment

Choose a reason for hiding this comment

Uh oh!

alexbi29 commented Feb 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mergify bot commented Mar 16, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

alexbi29 commented Feb 1, 2026 •

edited by github-actions bot

Loading

alexbi29 commented Feb 2, 2026 •

edited

Loading