Skip to content

[Bugfix] Fix tool call streaming for gpt-oss/Harmony models#33520

Open
alexbi29 wants to merge 2 commits intovllm-project:mainfrom
alexbi29:fix-harmony-tool-call-streaming
Open

[Bugfix] Fix tool call streaming for gpt-oss/Harmony models#33520
alexbi29 wants to merge 2 commits intovllm-project:mainfrom
alexbi29:fix-harmony-tool-call-streaming

Conversation

@alexbi29
Copy link
Copy Markdown

@alexbi29 alexbi29 commented Feb 1, 2026

Summary

This PR fixes several issues with tool call handling for gpt-oss models using the Harmony streaming parser:

  1. IndexError in streaming generator: Added auto_tools_called check before accessing prev_tool_call_arr to prevent IndexError when the array is empty.

  2. Missing tool call IDs in non-streaming responses: Added proper ID generation for named tool choice and auto tool choice cases that were missing the required id field.

  3. Split tool calls in streaming: Fixed an issue where a single tool call was being split into multiple entries because:

    • Tool call IDs are now stored by recipient name (e.g., functions.glob) instead of index number, since base_index changes between streaming calls as messages complete.
    • Continuation chunks now include the same ID as the opening chunk, allowing clients to properly merge them.
    • DeltaToolCalls with the same index are merged before sending to avoid multiple entries in one SSE chunk.

Test plan

  • Tested with opencode client against gpt-oss-120b model
  • Tool calls now properly stream with consistent IDs across chunks
  • Clients can correctly aggregate streaming tool call arguments

🤖 Generated with Claude Code

This makes opencode tool calls work properly with v1 chat completion api

@github-actions
Copy link
Copy Markdown

github-actions bot commented Feb 1, 2026

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors.

You ask your reviewers to trigger select CI tests on top of fastcheck CI.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

If you have any questions, please reach out to us on Slack at https://slack.vllm.ai.

🚀

@dosubot
Copy link
Copy Markdown

dosubot bot commented Feb 1, 2026

Related Documentation

No published documentation to review for changes on this repository.

Write your first living document

How did I do? Any feedback?  Join Discord

@alexbi29 alexbi29 force-pushed the fix-harmony-tool-call-streaming branch from 4c43d67 to fa4c429 Compare February 1, 2026 20:26
@mergify mergify bot added frontend gpt-oss Related to GPT-OSS models labels Feb 1, 2026
@mergify
Copy link
Copy Markdown

mergify bot commented Feb 1, 2026

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @alexbi29.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces several important bug fixes for tool call handling in streaming and non-streaming modes for gpt-oss/Harmony models. The changes address an IndexError during streaming, add missing tool call IDs in non-streaming responses, and fix issues with split tool calls in streaming by improving how tool call IDs are managed and by merging delta tool calls. The implementation looks solid and correctly addresses the described issues. The code is well-structured and the fixes are robust. I have no major concerns.

@alexbi29 alexbi29 force-pushed the fix-harmony-tool-call-streaming branch from fa4c429 to d78fb82 Compare February 1, 2026 20:31
@mergify mergify bot removed the needs-rebase label Feb 1, 2026
@mergify
Copy link
Copy Markdown

mergify bot commented Feb 1, 2026

Hi @alexbi29, the pre-commit checks have failed. Please run:

uv pip install pre-commit
pre-commit install
pre-commit run --all-files

Then, commit the changes and push to your branch.

For future commits, pre-commit will run automatically on changed files before each commit.

Tip

Is mypy or markdownlint failing?
mypy and markdownlint are run differently in CI. If the failure is related to either of these checks, please use the following commands to run them locally:
# For mypy (substitute "3.10" with the failing version if needed)
pre-commit run --hook-stage manual mypy-3.10
# For markdownlint
pre-commit run --hook-stage manual markdownlint

@chaunceyjiang chaunceyjiang self-assigned this Feb 2, 2026
Copy link
Copy Markdown
Collaborator

@chaunceyjiang chaunceyjiang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you share a minimal example to reproduce the problem?

@alexbi29
Copy link
Copy Markdown
Author

alexbi29 commented Feb 2, 2026

@chaunceyjiang here it is.

import argparse, json
from openai import OpenAI

TOOLS = [{
    "type": "function",
    "function": {
        "name": "search_files",
        "description": "Search for files matching a pattern",
        "parameters": {
            "type": "object",
            "properties": {
                "pattern": {"type": "string"},
                "path": {"type": "string"},
            },
            "required": ["pattern", "path"],
        },
    },
}]

MSG = [{
    "role": "user",
    "content": "Search for all JavaScript files in "
               "/home/user/my-project/src/components/dashboard",
}]

def main(host: str, port: int, model: str | None):
    c = OpenAI(base_url=f"http://{host}:{port}/v1", api_key="not-needed")
    if not model:
        ms = c.models.list()
        model = ms.data[0].id if ms.data else "gpt-4"
        print(f"Auto-detected model: {model}")

    print(f"Host: {host}:{port}\nRequest: stream=True with tools\n" + "=" * 60)

    agg: dict[int, dict] = {}

    try:
        stream = c.chat.completions.create(
            model=model,
            messages=MSG,
            tools=TOOLS,
            tool_choice="required",
            stream=True,
        )
    except Exception as e:
        print(f"Error creating request: {e}")
        return 1

    print("Streaming chunks:\n" + "-" * 40)
    try:
        for ch in stream:
            choice = ch.choices[0] if ch.choices else None
            if choice and choice.finish_reason:
                print(f"  Finish: {choice.finish_reason}")

            tcs = getattr(getattr(choice, "delta", None), "tool_calls", None) or []
            for tc in tcs:
                fn = tc.function
                print(f"  Chunk: index={tc.index}, id={tc.id!r}, "
                      f"name={(fn.name if fn else None)!r}, "
                      f"args={(fn.arguments if fn else None)!r}")

                rec = agg.setdefault(tc.index, {"id": None, "name": None, "arguments": ""})
                if tc.id:
                    rec["id"] = tc.id
                if fn and fn.name:
                    rec["name"] = fn.name
                if fn and fn.arguments:
                    rec["arguments"] += fn.arguments
    except Exception as e:
        print(f"Error during streaming: {e}")

    print("-" * 40)
    print("Aggregated tool calls (by index):\n" + "-" * 40)

    for i, rec in sorted(agg.items()):
        print(f"  Index {i}: id={rec['id']!r} name={rec['name']!r}")
        print(f"    arguments: {rec['arguments']!r}")
        try:
            print(f"    parsed: {json.loads(rec['arguments'])}")
        except json.JSONDecodeError as e:
            print(f"    ERROR: Invalid JSON - {e}")

    print("\n" + "=" * 60 + "\nBUG CHECK:\n" + "=" * 60)

    bugs = []
    if not agg:
        bugs.append("No tool calls received (stream may have crashed).")
    if len(agg) > 1:
        bugs.append(f"Multiple tool call indices (expected 1, got {len(agg)}).")
    for i, rec in agg.items():
        if not rec["id"]:
            bugs.append(f"Tool call at index {i} has null/empty id.")
        try:
            json.loads(rec["arguments"])
        except json.JSONDecodeError:
            bugs.append(f"Tool call at index {i} has invalid JSON arguments.")

    if bugs:
        print("❌ BUG DETECTED:")
        for b in bugs:
            print("  -", b)
        return 1

    print("✓ No bugs detected - tool call streaming working correctly")
    return 0

if __name__ == "__main__":
    ap = argparse.ArgumentParser()
    ap.add_argument("--host", default="pook.lan")
    ap.add_argument("--port", type=int, default=8000)
    ap.add_argument("--model")
    args = ap.parse_args()
    raise SystemExit(main(args.host, args.port, args.model))
ubuntu@epyc:~$ /home/ubuntu/vllm-env/bin/python /home/ubuntu/vllm-src/test_tool_call_streaming.py --host pook.lan --port 8000
Auto-detected model: openai/gpt-oss-20b
Host: pook.lan:8000
Request: stream=True with tools
============================================================
Streaming chunks:
----------------------------------------
  Finish: stop
----------------------------------------
Aggregated tool calls (by index):
----------------------------------------

============================================================
BUG CHECK:
============================================================
❌ BUG DETECTED:
  - No tool calls received (stream may have crashed).

This PR fixes several issues with tool call handling for gpt-oss models
using the Harmony streaming parser:

1. **IndexError in streaming generator**: Added `auto_tools_called` check
   before accessing `prev_tool_call_arr` to prevent IndexError when the
   array is empty.

2. **Missing tool call IDs in non-streaming responses**: Added proper ID
   generation for named tool choice and auto tool choice cases that were
   missing the required `id` field.

3. **Split tool calls in streaming**: Fixed an issue where a single tool
   call was being split into multiple entries because:
   - Tool call IDs are now stored by recipient name (e.g., "functions.glob")
     instead of index number, since `base_index` changes between streaming
     calls as messages complete.
   - Continuation chunks now include the same ID as the opening chunk,
     allowing clients to properly merge them.
   - DeltaToolCalls with the same index are merged before sending to avoid
     multiple entries in one SSE chunk.

Tested with opencode client against gpt-oss-120b model.
@alexbi29 alexbi29 force-pushed the fix-harmony-tool-call-streaming branch from d78fb82 to 2e1d598 Compare March 16, 2026 10:02
@mergify
Copy link
Copy Markdown

mergify bot commented Mar 16, 2026

Hi @alexbi29, the pre-commit checks have failed. Please run:

uv pip install pre-commit>=4.5.1
pre-commit install
pre-commit run --all-files

Then, commit the changes and push to your branch.

For future commits, pre-commit will run automatically on changed files before each commit.

Tip

Is mypy failing?
mypy is run differently in CI. If the failure is related to this check, please use the following command to run it locally:
# For mypy (substitute "3.10" with the failing version if needed)
pre-commit run --hook-stage manual mypy-3.10

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working frontend gpt-oss Related to GPT-OSS models

Projects

Status: To Triage

Development

Successfully merging this pull request may close these issues.

2 participants