[gpt-oss] tool parser supports for /chat/completions [1/n] #22386

aarnphm · 2025-08-06T19:32:42Z

This PR brings openai-harmony tool support and reasoning for /chat/completions.

vllm serve openai/gpt-oss-120b -tp 2 --tool-call-parser openai --reasoning-parser openai_gptoss --enable-auto-tool-choice

Currently, it is yet to support tool='required'

Signed-off-by: Aaron Pham [email protected]

github-actions · 2025-08-06T19:32:53Z

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

🚀

gemini-code-assist

Code Review

This pull request adds support for OpenAI Harmony tool calls in /chat/completions. It introduces a new OpenAIToolParser and integrates it into the chat serving logic for both streaming and non-streaming responses. The changes also include new tests for the parser. The review identifies a critical logic flaw in the non-streaming tool call extraction that could lead to missed tool calls, and a high-severity issue in the tests related to type correctness. Suggestions are provided to fix these issues.

vllm/entrypoints/openai/tool_parsers/openai_tool_parser.py

tests/tool_use/test_openai_tool_parser.py

vllm/entrypoints/openai/tool_parsers/openai_tool_parser.py

vllm/entrypoints/openai/serving_chat.py

vllm/entrypoints/openai/tool_parsers/openai_tool_parser.py

vllm/entrypoints/openai/serving_chat.py

Ubospica

The openai tool calling parser part looks good to me. We may also want to consider tool calling in analysis part, according to harmony's cookbook:

built-in tools will normally be triggered on the analysis channel

Openai's structural generation has a different format than tool calling. vllm should have a parser for that too.

longern · 2025-08-11T14:58:24Z

I tested gpt-oss-120b and found that sometimes custom tools (not the builtin ones) also appear in the analysis channel when builtin tools are not available. This may lead to a parsing error and should be taken into consideration. I'm currently detecting functions. prefix as a walkaround.

After [0] got merge to add support for tool parsing on VLLM for GPT-OSS with chat completions, extra corner cases sre needed at openai_compat file to properly translate the tool call content [0] vllm-project/vllm#22386

…ect#22386) Signed-off-by: Aaron Pham <[email protected]> Co-authored-by: Simon Mo <[email protected]>

davidallada · 2025-09-11T14:57:59Z

We will release 0.10.2 promptly
Any updates on pushing the docker image for this?

jakehlee · 2025-09-11T22:33:25Z

@davidallada I've been tracking https://github.com/vllm-project/vllm/milestone/12 to see the progress of the 0.10.2 release.

davidallada · 2025-09-11T22:37:16Z

@davidallada I've been tracking https://github.com/vllm-project/vllm/milestone/12 to see the progress of the 0.10.2 release.

Thank you!

timjwhite · 2025-09-11T22:38:54Z

I've been doing the same, waiting eagerly, although an interim update to the gptoss container would also be welcome!

furkanc · 2025-09-12T15:27:26Z

Hello, when I run vllm/vllm-openai:v0.10.1.1 image with following command
openai/gpt-oss-120b --gpu-memory-utilization 0.9 --max-model-len 16384 --tool-call-parser openai --enable-auto-tool-choice
It gives error

KeyError: 'invalid tool call parser: openai (chose from { deepseek_v3,glm45,granite-20b-fc,granite,hermes,hunyuan_a13b,internlm,jamba,kimi_k2,llama4_pythonic,llama4_json,llama3_json,minimax,mistral,phi4_mini_json,pythonic,qwen3_coder,step3,xlam })'

Am I missing something ?

Blake-Martin-code · 2025-09-12T15:33:09Z

Hello, when I run vllm/vllm-openai:v0.10.1.1 image with following command

openai/gpt-oss-120b --gpu-memory-utilization 0.9 --max-model-len 16384 --tool-call-parser openai --enable-auto-tool-choice

It gives error
KeyError: 'invalid tool call parser: openai (chose from { deepseek_v3,glm45,granite-20b-fc,granite,hermes,hunyuan_a13b,internlm,jamba,kimi_k2,llama4_pythonic,llama4_json,llama3_json,minimax,mistral,phi4_mini_json,pythonic,qwen3_coder,step3,xlam })'
Am I missing something ?

I believe you need to wait until they push the new image to docker for this to be available. However, You can build from this branch to create it yourself

…ect#22386) Signed-off-by: Aaron Pham <[email protected]> Co-authored-by: Simon Mo <[email protected]>

Blake-Martin-code · 2025-09-14T14:21:52Z

@aarnphm The image just got pushed and it is successfully calling tools with chat completion. However, there is a bug. The function call is appearing in both the content and tool_call keys of the models response

Blake-Martin-code · 2025-09-14T14:24:33Z

@aarnphm The image just got pushed and it is successfully calling tools with chat completion. However, there is a bug. The function call is appearing in both the content and tool_call keys of the models response.

Here are the engine args I passed: --model openai/gpt-oss-20b --served-model-name openai/gpt-oss-20b --tensor-parallel-size 2 --tool-call-parser openai --enable-auto-tool-choice

ENV VARS: VLLM_ATTENTION_BACKEND=TRITON_ATTN_VLLM_V1

from openai import OpenAI
 
client = OpenAI(
    base_url="https://runPodId-8000.proxy.runpod.net/v1",
    api_key="EMPTY"
)

tools = [
    {
        "type": "function",
        "function": {
            "name": "get_weather",
            "description": "Get current weather in a given city",
            "parameters": {
                "type": "object",
                "properties": {"city": {"type": "string"}},
                "required": ["city"]
            },
        },
    }
]
 
response = client.chat.completions.create(
    model="openai/gpt-oss-20b",
    messages=[{"role": "user", "content": "What's the weather in Berlin right now?"}],
    tools=tools
)
 
print(response.choices[0].message)

ChatCompletionMessage(content='{"city":"Berlin"}', refusal=None, role='assistant', annotations=None, audio=None, function_call=None, tool_calls=[ChatCompletionMessageFunctionToolCall(id='chatcmpl-tool-474e480646b04c5daf0adfd18a32c78b', function=Function(arguments='{"city":"Berlin"}', name='get_weather'), type='function')], reasoning_content='We need to fetch weather. We can use get_weather function.')

andresC98 · 2025-09-16T13:06:37Z

I am running vllm 0.10.2 passing the configs --enable-auto-tool-choice --tool-call-parser openai --reasoning-parser openai_gptoss and still get 500 internal error 'Unexpected token 12606 while expecting start token 200006

calling it with the same example @Blake-Martin-code shown

fabienric · 2025-09-16T14:11:20Z

I am running vllm 0.10.2 passing the configs --enable-auto-tool-choice --tool-call-parser openai --reasoning-parser openai_gptoss and still get 500 internal error 'Unexpected token 12606 while expecting start token 200006

calling it with the same example @Blake-Martin-code shown

Have you tried redownloading the model from HF? OpenAI fixed the chat template and generation config files after the first release.

andresC98 · 2025-09-16T14:51:38Z

I am running vllm 0.10.2 passing the configs --enable-auto-tool-choice --tool-call-parser openai --reasoning-parser openai_gptoss and still get 500 internal error 'Unexpected token 12606 while expecting start token 200006
calling it with the same example @Blake-Martin-code shown

Have you tried redownloading the model from HF? OpenAI fixed the chat template and generation config files after the first release.

fair point, noticed my file was from August 12th so I am missing this change in the generation_config.json file https://huggingface.co/openai/gpt-oss-20b/commit/d666cf3b67006cf8227666739edf25164aaffdeb . Not sure if this is the fix but I will try!

EDIT: After redownloading and bringing the updated .json file it seems it is now working as expected! Thank you so much for pointing this out @fabienric!

fabienric · 2025-09-16T16:47:25Z

yes, this is the one 😄

Blake-Martin-code · 2025-09-18T14:48:06Z

I am running vllm 0.10.2 passing the configs --enable-auto-tool-choice --tool-call-parser openai --reasoning-parser openai_gptoss and still get 500 internal error 'Unexpected token 12606 while expecting start token 200006

calling it with the same example @Blake-Martin-code shown

Have you tried redownloading the model from HF? OpenAI fixed the chat template and generation config files after the first release.

fair point, noticed my file was from August 12th so I am missing this change in the generation_config.json file https://huggingface.co/openai/gpt-oss-20b/commit/d666cf3b67006cf8227666739edf25164aaffdeb . Not sure if this is the fix but I will try!

EDIT: After redownloading and bringing the updated .json file it seems it is now working as expected! Thank you so much for pointing this out @fabienric!

I am using run pod. So the vllm container and hugging face model gets redownloaded everytime. Has anyone encountered the tool call arguments appearing in the content everytime?

…ect#22386) Signed-off-by: Aaron Pham <[email protected]> Co-authored-by: Simon Mo <[email protected]>

This fixes an issue identified in vllm-project#22337 that was not entirely fixed by vllm-project#22386 around tool call parsing in gpt-oss models in the non-streaming case and the model output returned to the user. The main change here is to remove the parsed tool call out of the ChatCompletion message `content`, so that the generated tool call only shows up in its parsed form as an entry in `tool_calls` instead of in both `tool_calls` and `content`. A small related change is to ensure we're not sending newline characters in the JSON arguments string of the parsed tool call, since we don't do this for other tool calls. Together these should get non-streaming tool calls via the Chat Completions API working for gpt-oss models for typical use-cases. There may still be some edge cases the openai_tool_parser.py and/or serving_chat.py code need to handle, but this at least closes some known gaps. A couple of unit tests were tweaked here to test for both of these fixes. I ensured the tests failed before my fixes, and that they now pass with the fixes. Signed-off-by: Ben Browning <[email protected]>

…ect#22386) Signed-off-by: Aaron Pham <[email protected]> Co-authored-by: Simon Mo <[email protected]> Signed-off-by: xuebwang-amd <[email protected]>

aarnphm requested a review from WoosukKwon August 6, 2025 19:32

mergify bot added frontend tool-calling labels Aug 6, 2025

github-project-automation bot added this to Tool Calling Aug 6, 2025

gemini-code-assist bot reviewed Aug 6, 2025

View reviewed changes

vllm/entrypoints/openai/tool_parsers/openai_tool_parser.py Outdated Show resolved Hide resolved

tests/tool_use/test_openai_tool_parser.py Outdated Show resolved Hide resolved

aarnphm requested a review from simon-mo August 6, 2025 19:47

aarnphm changed the title ~~[Frontend] Harmony tool supports for /chat/completions~~ [Frontend] Harmony tool supports for /chat/completions [1/n] Aug 6, 2025

honghanhh suggested changes Aug 6, 2025

View reviewed changes

vllm/entrypoints/openai/tool_parsers/openai_tool_parser.py Outdated Show resolved Hide resolved

vllm/entrypoints/openai/serving_chat.py Outdated Show resolved Hide resolved

aarnphm changed the title ~~[Frontend] Harmony tool supports for /chat/completions [1/n]~~ [gpt-oss] tool parser supports for /chat/completions [1/n] Aug 6, 2025

aarnphm force-pushed the feat/gpt-oss-fc branch from 7d984d1 to 33f1852 Compare August 6, 2025 21:25

chaunceyjiang reviewed Aug 7, 2025

View reviewed changes

vllm/entrypoints/openai/tool_parsers/openai_tool_parser.py Outdated Show resolved Hide resolved

Ssskrilex reviewed Aug 7, 2025

View reviewed changes

vllm/entrypoints/openai/serving_chat.py Outdated Show resolved Hide resolved

Ubospica reviewed Aug 10, 2025

View reviewed changes

chaunceyjiang mentioned this pull request Aug 11, 2025

[Feature]: gpt-oss tool parser #22604

Closed

1 task

mergify bot added the gpt-oss Related to GPT-OSS models label Aug 11, 2025

aarnphm requested review from DarkLight1337, alexm-redhat, bigPYJ1151, comaniac, hmellor, jikunshang, njhill, patrickvonplaten, robertgshaw2-redhat, sighingnow, tlrmchlsmth, yewentao256 and ywang96 as code owners August 13, 2025 00:50

eicherseiji pushed a commit to eicherseiji/vllm that referenced this pull request Sep 9, 2025

[gpt-oss] tool parser supports for /chat/completions [1/n] (vllm-proj…

5d3b690

…ect#22386) Signed-off-by: Aaron Pham <[email protected]> Co-authored-by: Simon Mo <[email protected]>

skyloevil pushed a commit to skyloevil/vllm that referenced this pull request Sep 13, 2025

[gpt-oss] tool parser supports for /chat/completions [1/n] (vllm-proj…

669e8c3

…ect#22386) Signed-off-by: Aaron Pham <[email protected]> Co-authored-by: Simon Mo <[email protected]>

This was referenced Sep 23, 2025

[Feature]: If I want gpt-oss to be able to call custom tools, how should I set the --tool-call-parser parameter during deployment? #22308

Open

[Bugfix] [Frontend] Cleanup gpt-oss non-streaming chat tool calls #25514

Merged

FeiDaLI pushed a commit to FeiDaLI/vllm that referenced this pull request Sep 25, 2025

[gpt-oss] tool parser supports for /chat/completions [1/n] (vllm-proj…

10ae837

…ect#22386) Signed-off-by: Aaron Pham <[email protected]> Co-authored-by: Simon Mo <[email protected]>

shaozi mentioned this pull request Oct 3, 2025

[Bug]: with gpt-oss model, tool calls with openai python library error gpustack/gpustack#2870

Open

syedmba mentioned this pull request Oct 21, 2025

[Bug]: [vLLM v0.11.0 + GPT-OSS-120B]: when using tool calling, Unexpected token 200012 while expecting start token 200006 error occurred with a probability of approximately 10–20%. #27243

Closed

1 task

Uh oh!

[gpt-oss] tool parser supports for /chat/completions [1/n] #22386

[gpt-oss] tool parser supports for /chat/completions [1/n] #22386

Conversation

aarnphm commented Aug 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented Aug 6, 2025

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Ubospica left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

longern commented Aug 11, 2025

Uh oh!

davidallada commented Sep 11, 2025

Uh oh!

jakehlee commented Sep 11, 2025

Uh oh!

davidallada commented Sep 11, 2025

Uh oh!

timjwhite commented Sep 11, 2025

Uh oh!

furkanc commented Sep 12, 2025

Uh oh!

Blake-Martin-code commented Sep 12, 2025

Uh oh!

Blake-Martin-code commented Sep 14, 2025

Uh oh!

Blake-Martin-code commented Sep 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

andresC98 commented Sep 16, 2025

Uh oh!

fabienric commented Sep 16, 2025

Uh oh!

andresC98 commented Sep 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

fabienric commented Sep 16, 2025

Uh oh!

Blake-Martin-code commented Sep 18, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

33 participants

aarnphm commented Aug 6, 2025 •

edited

Loading

Ubospica left a comment •

edited

Loading

Blake-Martin-code commented Sep 14, 2025 •

edited

Loading

andresC98 commented Sep 16, 2025 •

edited

Loading