Skip to content

Conversation

@aarnphm
Copy link
Collaborator

@aarnphm aarnphm commented Aug 6, 2025

This PR brings openai-harmony tool support and reasoning for /chat/completions.

vllm serve openai/gpt-oss-120b -tp 2 --tool-call-parser openai --reasoning-parser openai_gptoss --enable-auto-tool-choice

Currently, it is yet to support tool='required'

Signed-off-by: Aaron Pham [email protected]

@github-actions
Copy link

github-actions bot commented Aug 6, 2025

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

🚀

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request adds support for OpenAI Harmony tool calls in /chat/completions. It introduces a new OpenAIToolParser and integrates it into the chat serving logic for both streaming and non-streaming responses. The changes also include new tests for the parser. The review identifies a critical logic flaw in the non-streaming tool call extraction that could lead to missed tool calls, and a high-severity issue in the tests related to type correctness. Suggestions are provided to fix these issues.

@aarnphm aarnphm requested a review from simon-mo August 6, 2025 19:47
@aarnphm aarnphm changed the title [Frontend] Harmony tool supports for /chat/completions [Frontend] Harmony tool supports for /chat/completions [1/n] Aug 6, 2025
@aarnphm aarnphm changed the title [Frontend] Harmony tool supports for /chat/completions [1/n] [gpt-oss] tool parser supports for /chat/completions [1/n] Aug 6, 2025
Copy link

@Ubospica Ubospica left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The openai tool calling parser part looks good to me. We may also want to consider tool calling in analysis part, according to harmony's cookbook:

built-in tools will normally be triggered on the analysis channel

Openai's structural generation has a different format than tool calling. vllm should have a parser for that too.

@mergify mergify bot added the gpt-oss Related to GPT-OSS models label Aug 11, 2025
@longern
Copy link

longern commented Aug 11, 2025

I tested gpt-oss-120b and found that sometimes custom tools (not the builtin ones) also appear in the analysis channel when builtin tools are not available. This may lead to a parsing error and should be taken into consideration. I'm currently detecting functions. prefix as a walkaround.

luis5tb added a commit to luis5tb/llama-stack that referenced this pull request Sep 8, 2025
After [0] got merge to add support for tool parsing on VLLM for
GPT-OSS with chat completions, extra corner cases sre needed at
openai_compat file to properly translate the tool call content

[0] vllm-project/vllm#22386
luis5tb added a commit to luis5tb/llama-stack that referenced this pull request Sep 8, 2025
After [0] got merge to add support for tool parsing on VLLM for
GPT-OSS with chat completions, extra corner cases sre needed at
openai_compat file to properly translate the tool call content

[0] vllm-project/vllm#22386
eicherseiji pushed a commit to eicherseiji/vllm that referenced this pull request Sep 9, 2025
@davidallada
Copy link

We will release 0.10.2 promptly
Any updates on pushing the docker image for this?

@jakehlee
Copy link

@davidallada I've been tracking https://github.com/vllm-project/vllm/milestone/12 to see the progress of the 0.10.2 release.

@davidallada
Copy link

@davidallada I've been tracking https://github.com/vllm-project/vllm/milestone/12 to see the progress of the 0.10.2 release.

Thank you!

@timjwhite
Copy link

I've been doing the same, waiting eagerly, although an interim update to the gptoss container would also be welcome!

@furkanc
Copy link

furkanc commented Sep 12, 2025

Hello, when I run vllm/vllm-openai:v0.10.1.1 image with following command
openai/gpt-oss-120b --gpu-memory-utilization 0.9 --max-model-len 16384 --tool-call-parser openai --enable-auto-tool-choice
It gives error

KeyError: 'invalid tool call parser: openai (chose from { deepseek_v3,glm45,granite-20b-fc,granite,hermes,hunyuan_a13b,internlm,jamba,kimi_k2,llama4_pythonic,llama4_json,llama3_json,minimax,mistral,phi4_mini_json,pythonic,qwen3_coder,step3,xlam })'

Am I missing something ?

@Blake-Martin-code
Copy link

Hello, when I run vllm/vllm-openai:v0.10.1.1 image with following command

openai/gpt-oss-120b --gpu-memory-utilization 0.9 --max-model-len 16384 --tool-call-parser openai --enable-auto-tool-choice

It gives error


KeyError: 'invalid tool call parser: openai (chose from { deepseek_v3,glm45,granite-20b-fc,granite,hermes,hunyuan_a13b,internlm,jamba,kimi_k2,llama4_pythonic,llama4_json,llama3_json,minimax,mistral,phi4_mini_json,pythonic,qwen3_coder,step3,xlam })'

Am I missing something ?

I believe you need to wait until they push the new image to docker for this to be available. However, You can build from this branch to create it yourself

skyloevil pushed a commit to skyloevil/vllm that referenced this pull request Sep 13, 2025
@Blake-Martin-code
Copy link

@aarnphm The image just got pushed and it is successfully calling tools with chat completion. However, there is a bug. The function call is appearing in both the content and tool_call keys of the models response

@Blake-Martin-code
Copy link

Blake-Martin-code commented Sep 14, 2025

@aarnphm The image just got pushed and it is successfully calling tools with chat completion. However, there is a bug. The function call is appearing in both the content and tool_call keys of the models response.

Here are the engine args I passed: --model openai/gpt-oss-20b --served-model-name openai/gpt-oss-20b --tensor-parallel-size 2 --tool-call-parser openai --enable-auto-tool-choice

ENV VARS: VLLM_ATTENTION_BACKEND=TRITON_ATTN_VLLM_V1

from openai import OpenAI
 
client = OpenAI(
    base_url="https://runPodId-8000.proxy.runpod.net/v1",
    api_key="EMPTY"
)

tools = [
    {
        "type": "function",
        "function": {
            "name": "get_weather",
            "description": "Get current weather in a given city",
            "parameters": {
                "type": "object",
                "properties": {"city": {"type": "string"}},
                "required": ["city"]
            },
        },
    }
]
 
response = client.chat.completions.create(
    model="openai/gpt-oss-20b",
    messages=[{"role": "user", "content": "What's the weather in Berlin right now?"}],
    tools=tools
)
 
print(response.choices[0].message)

ChatCompletionMessage(content='{"city":"Berlin"}', refusal=None, role='assistant', annotations=None, audio=None, function_call=None, tool_calls=[ChatCompletionMessageFunctionToolCall(id='chatcmpl-tool-474e480646b04c5daf0adfd18a32c78b', function=Function(arguments='{"city":"Berlin"}', name='get_weather'), type='function')], reasoning_content='We need to fetch weather. We can use get_weather function.')

@andresC98
Copy link

I am running vllm 0.10.2 passing the configs --enable-auto-tool-choice --tool-call-parser openai --reasoning-parser openai_gptoss and still get 500 internal error 'Unexpected token 12606 while expecting start token 200006

calling it with the same example @Blake-Martin-code shown

@fabienric
Copy link

I am running vllm 0.10.2 passing the configs --enable-auto-tool-choice --tool-call-parser openai --reasoning-parser openai_gptoss and still get 500 internal error 'Unexpected token 12606 while expecting start token 200006

calling it with the same example @Blake-Martin-code shown

Have you tried redownloading the model from HF? OpenAI fixed the chat template and generation config files after the first release.

@andresC98
Copy link

andresC98 commented Sep 16, 2025

I am running vllm 0.10.2 passing the configs --enable-auto-tool-choice --tool-call-parser openai --reasoning-parser openai_gptoss and still get 500 internal error 'Unexpected token 12606 while expecting start token 200006
calling it with the same example @Blake-Martin-code shown

Have you tried redownloading the model from HF? OpenAI fixed the chat template and generation config files after the first release.

fair point, noticed my file was from August 12th so I am missing this change in the generation_config.json file https://huggingface.co/openai/gpt-oss-20b/commit/d666cf3b67006cf8227666739edf25164aaffdeb . Not sure if this is the fix but I will try!

EDIT: After redownloading and bringing the updated .json file it seems it is now working as expected! Thank you so much for pointing this out @fabienric!

@fabienric
Copy link

yes, this is the one 😄

@Blake-Martin-code
Copy link

I am running vllm 0.10.2 passing the configs --enable-auto-tool-choice --tool-call-parser openai --reasoning-parser openai_gptoss and still get 500 internal error 'Unexpected token 12606 while expecting start token 200006

calling it with the same example @Blake-Martin-code shown

Have you tried redownloading the model from HF? OpenAI fixed the chat template and generation config files after the first release.

fair point, noticed my file was from August 12th so I am missing this change in the generation_config.json file https://huggingface.co/openai/gpt-oss-20b/commit/d666cf3b67006cf8227666739edf25164aaffdeb . Not sure if this is the fix but I will try!

EDIT: After redownloading and bringing the updated .json file it seems it is now working as expected! Thank you so much for pointing this out @fabienric!

I am using run pod. So the vllm container and hugging face model gets redownloaded everytime. Has anyone encountered the tool call arguments appearing in the content everytime?

FeiDaLI pushed a commit to FeiDaLI/vllm that referenced this pull request Sep 25, 2025
bbrowning added a commit to bbrowning/vllm that referenced this pull request Oct 1, 2025
This fixes an issue identified in vllm-project#22337 that was not entirely fixed
by vllm-project#22386 around tool call parsing in gpt-oss models in the
non-streaming case and the model output returned to the user.

The main change here is to remove the parsed tool call out of the
ChatCompletion message `content`, so that the generated tool call only
shows up in its parsed form as an entry in `tool_calls` instead of in
both `tool_calls` and `content`.

A small related change is to ensure we're not sending newline
characters in the JSON arguments string of the parsed tool call, since
we don't do this for other tool calls.

Together these should get non-streaming tool calls via the Chat
Completions API working for gpt-oss models for typical
use-cases. There may still be some edge cases the
openai_tool_parser.py and/or serving_chat.py code need to handle, but
this at least closes some known gaps.

A couple of unit tests were tweaked here to test for both of these
fixes. I ensured the tests failed before my fixes, and that they now
pass with the fixes.

Signed-off-by: Ben Browning <[email protected]>
xuebwang-amd pushed a commit to xuebwang-amd/vllm that referenced this pull request Oct 10, 2025
xuebwang-amd pushed a commit to xuebwang-amd/vllm that referenced this pull request Oct 24, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

deepseek Related to DeepSeek models frontend gpt-oss Related to GPT-OSS models llama Related to Llama models qwen Related to Qwen models ready ONLY add when PR is ready to merge/full CI is needed tool-calling

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.