-
-
Notifications
You must be signed in to change notification settings - Fork 11.7k
[gpt-oss] tool parser supports for /chat/completions [1/n] #22386
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
👋 Hi! Thank you for contributing to the vLLM project. 💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels. Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging. To run CI, PR reviewers can either: Add 🚀 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request adds support for OpenAI Harmony tool calls in /chat/completions. It introduces a new OpenAIToolParser and integrates it into the chat serving logic for both streaming and non-streaming responses. The changes also include new tests for the parser. The review identifies a critical logic flaw in the non-streaming tool call extraction that could lead to missed tool calls, and a high-severity issue in the tests related to type correctness. Suggestions are provided to fix these issues.
7d984d1 to
33f1852
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The openai tool calling parser part looks good to me. We may also want to consider tool calling in analysis part, according to harmony's cookbook:
built-in tools will normally be triggered on the analysis channel
Openai's structural generation has a different format than tool calling. vllm should have a parser for that too.
|
I tested |
After [0] got merge to add support for tool parsing on VLLM for GPT-OSS with chat completions, extra corner cases sre needed at openai_compat file to properly translate the tool call content [0] vllm-project/vllm#22386
After [0] got merge to add support for tool parsing on VLLM for GPT-OSS with chat completions, extra corner cases sre needed at openai_compat file to properly translate the tool call content [0] vllm-project/vllm#22386
…ect#22386) Signed-off-by: Aaron Pham <[email protected]> Co-authored-by: Simon Mo <[email protected]>
|
|
@davidallada I've been tracking https://github.com/vllm-project/vllm/milestone/12 to see the progress of the 0.10.2 release. |
Thank you! |
|
I've been doing the same, waiting eagerly, although an interim update to the gptoss container would also be welcome! |
|
Hello, when I run vllm/vllm-openai:v0.10.1.1 image with following command Am I missing something ? |
I believe you need to wait until they push the new image to docker for this to be available. However, You can build from this branch to create it yourself |
…ect#22386) Signed-off-by: Aaron Pham <[email protected]> Co-authored-by: Simon Mo <[email protected]>
|
@aarnphm The image just got pushed and it is successfully calling tools with chat completion. However, there is a bug. The function call is appearing in both the |
Here are the engine args I passed: --model openai/gpt-oss-20b --served-model-name openai/gpt-oss-20b --tensor-parallel-size 2 --tool-call-parser openai --enable-auto-tool-choice ENV VARS: VLLM_ATTENTION_BACKEND=TRITON_ATTN_VLLM_V1 ChatCompletionMessage(content='{"city":"Berlin"}', refusal=None, role='assistant', annotations=None, audio=None, function_call=None, tool_calls=[ChatCompletionMessageFunctionToolCall(id='chatcmpl-tool-474e480646b04c5daf0adfd18a32c78b', function=Function(arguments='{"city":"Berlin"}', name='get_weather'), type='function')], reasoning_content='We need to fetch weather. We can use get_weather function.') |
|
I am running vllm 0.10.2 passing the configs calling it with the same example @Blake-Martin-code shown |
Have you tried redownloading the model from HF? OpenAI fixed the chat template and generation config files after the first release. |
fair point, noticed my file was from August 12th so I am missing this change in the generation_config.json file https://huggingface.co/openai/gpt-oss-20b/commit/d666cf3b67006cf8227666739edf25164aaffdeb . Not sure if this is the fix but I will try! EDIT: After redownloading and bringing the updated .json file it seems it is now working as expected! Thank you so much for pointing this out @fabienric! |
|
yes, this is the one 😄 |
I am using run pod. So the vllm container and hugging face model gets redownloaded everytime. Has anyone encountered the tool call arguments appearing in the content everytime? |
…ect#22386) Signed-off-by: Aaron Pham <[email protected]> Co-authored-by: Simon Mo <[email protected]>
This fixes an issue identified in vllm-project#22337 that was not entirely fixed by vllm-project#22386 around tool call parsing in gpt-oss models in the non-streaming case and the model output returned to the user. The main change here is to remove the parsed tool call out of the ChatCompletion message `content`, so that the generated tool call only shows up in its parsed form as an entry in `tool_calls` instead of in both `tool_calls` and `content`. A small related change is to ensure we're not sending newline characters in the JSON arguments string of the parsed tool call, since we don't do this for other tool calls. Together these should get non-streaming tool calls via the Chat Completions API working for gpt-oss models for typical use-cases. There may still be some edge cases the openai_tool_parser.py and/or serving_chat.py code need to handle, but this at least closes some known gaps. A couple of unit tests were tweaked here to test for both of these fixes. I ensured the tests failed before my fixes, and that they now pass with the fixes. Signed-off-by: Ben Browning <[email protected]>
…ect#22386) Signed-off-by: Aaron Pham <[email protected]> Co-authored-by: Simon Mo <[email protected]> Signed-off-by: xuebwang-amd <[email protected]>
…ect#22386) Signed-off-by: Aaron Pham <[email protected]> Co-authored-by: Simon Mo <[email protected]> Signed-off-by: xuebwang-amd <[email protected]>
This PR brings openai-harmony tool support and reasoning for /chat/completions.
Currently, it is yet to support
tool='required'Signed-off-by: Aaron Pham [email protected]