Skip to content

Comments

[gpt-oss] use vLLM instead of openai types for streaming#25186

Merged
hmellor merged 6 commits intovllm-project:mainfrom
qandrew:gpt-oss-streaming-3-wrap-types
Sep 30, 2025
Merged

[gpt-oss] use vLLM instead of openai types for streaming#25186
hmellor merged 6 commits intovllm-project:mainfrom
qandrew:gpt-oss-streaming-3-wrap-types

Conversation

@qandrew
Copy link
Contributor

@qandrew qandrew commented Sep 18, 2025

Purpose

Test Plan

Test Result

client

curl http://localhost:20001/v1/responses   -H "Content-Type: application/json"   -N   -d '{
    "model": "/data/users/axia/checkpoints/gpt-oss-120b",
    "input": [
        {
            "role": "user",
            "content": "Hello."
        }
    ],
    "temperature": 0.7,
    "max_output_tokens": 256,
    "stream": true, 
    "enable_response_messages": true
}'  

ResponseCompletedEvent output (see that the input/output messages are there)

{
    "response": {
        "id": "resp_aed05365be36434fb820a6eab52bfc39",
        "created_at": 1758328028,
        "incomplete_details": null,
        "instructions": null,
        "metadata": null,
        "model": "/data/users/axia/checkpoints/gpt-oss-120b",
        "object": "response",
        "output": [
            {
                "id": "rs_2ebdf86c76a54ddb85c5a867c31f9b77",
                "summary": [],
                "type": "reasoning",
                "content": [
                    {
                        "text": "The user says \"Hello.\" Probably a greeting. Should respond politely, maybe ask how can help. Keep brief friendly.",
                        "type": "reasoning_text"
                    }
                ],
                "encrypted_content": null,
                "status": null
            },
            {
                "id": "msg_30edf4e235b546078b529ebfaa6d7bee",
                "content": [
                    {
                        "annotations": [],
                        "text": "Hello! How can I assist you today?",
                        "type": "output_text",
                        "logprobs": null
                    }
                ],
                "role": "assistant",
                "status": "completed",
                "type": "message"
            }
        ],
        "input_messages": [
            {
                "author": {
                    "role": "system",
                    "name": null
                },
                "content": [
                    {}
                ],
                "channel": null,
                "recipient": null,
                "content_type": null
            },
            {
                "author": {
                    "role": "user",
                    "name": null
                },
                "content": [
                    {}
                ],
                "channel": null,
                "recipient": null,
                "content_type": null
            }
        ],
        "output_messages": [
            {
                "author": {
                    "role": "assistant",
                    "name": null
                },
                "content": [
                    {}
                ],
                "channel": "analysis",
                "recipient": null,
                "content_type": null
            },
            {
                "author": {
                    "role": "assistant",
                    "name": null
                },
                "content": [
                    {}
                ],
                "channel": "final",
                "recipient": null,
                "content_type": null
            }
        ],
        "parallel_tool_calls": true,
        "temperature": 0.7,
        "tool_choice": "auto",
        "tools": [],
        "top_p": 1.0,
        "background": false,
        "max_output_tokens": 256,
        "max_tool_calls": null,
        "previous_response_id": null,
        "prompt": null,
        "reasoning": null,
        "service_tier": "auto",
        "status": "completed",
        "text": null,
        "top_logprobs": null,
        "truncation": "disabled",
        "usage": {
            "input_tokens": 67,
            "input_tokens_details": {
                "cached_tokens": 64
            },
            "output_tokens": 43,
            "output_tokens_details": {
                "reasoning_tokens": 25,
                "tool_output_tokens": 0
            },
            "total_tokens": 110
        },
        "user": null
    },
    "sequence_number": 45,
    "type": "response.completed"
}

Essential Elements of an Effective PR Description Checklist
  • The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
  • The test plan, such as providing test command.
  • The test results, such as pasting the results comparison before and after, or e2e results
  • (Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
  • (Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

@qandrew qandrew changed the title [gptoss] use vLLM streaming types [gpt-oss] use vLLM streaming types Sep 18, 2025
@mergify mergify bot added frontend gpt-oss Related to GPT-OSS models labels Sep 18, 2025
@mergify
Copy link

mergify bot commented Sep 19, 2025

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @qandrew.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

@mergify mergify bot added the needs-rebase label Sep 19, 2025
Signed-off-by: Andrew Xia <axia@meta.com>
@qandrew qandrew force-pushed the gpt-oss-streaming-3-wrap-types branch from 0c32be0 to 7b82dbf Compare September 20, 2025 00:18
@mergify mergify bot removed the needs-rebase label Sep 20, 2025
Signed-off-by: Andrew Xia <axia@meta.com>
@qandrew qandrew changed the title [gpt-oss] use vLLM streaming types [gpt-oss] use vLLM instead of openai types for streaming Sep 20, 2025
@qandrew qandrew marked this pull request as ready for review September 21, 2025 04:50
@qandrew
Copy link
Contributor Author

qandrew commented Sep 21, 2025

cc @alecsolder , @yeqcharlotte , @chaunceyjiang ready for review. i think the readthedocs CI failure is a flake

@lacora
Copy link
Contributor

lacora commented Sep 22, 2025

maybe we need to think about whether allowing input_messages / output_messages for sync responses only or for streaming as well, appending long messages kills the efficiency that streaming provides cc @yeqcharlotte

@qandrew
Copy link
Contributor Author

qandrew commented Sep 22, 2025

maybe we need to think about whether allowing input_messages / output_messages for sync responses only or for streaming as well, appending long messages kills the efficiency that streaming provides cc @yeqcharlotte

that's a good point! this PR only modifies events in streaming right now (ResponseCreatedEvent, ResponseInProgressEvent, ResponseCompletedEvent), so none of the deltas / intermediate events will not get additional payload. Or i could remove from ResponseCreatedEvent, ResponseInProgressEvent but maybe it's fine to keep it for now for consistency?

Copy link
Collaborator

@yeqcharlotte yeqcharlotte left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. could you add a basic skeleton tests that load the type. cc:
@chaunceyjiang @aarnphm PTAL too thanks!

ResponseWebSearchCallCompletedEvent,
ResponseWebSearchCallInProgressEvent,
ResponseWebSearchCallSearchingEvent)
# yapf: enable
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is the format change intentional

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yep, yapf/isort had a conflict, it seems like it happens pretty often...

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we keep the original format?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@yeqcharlotte we can't. The introduction of below makes the conflict, i tried to but there will be conflicts. You can see in serving_responses.py below # yapf conflicts with isort for this block happens often

from openai.types.responses import (
    ResponseInProgressEvent as OpenAIResponseInProgressEvent)

Copy link
Collaborator

@yeqcharlotte yeqcharlotte left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! thanks for the change!

@github-project-automation github-project-automation bot moved this from To Triage to Ready in gpt-oss Issues & Enhancements Sep 27, 2025
@yeqcharlotte yeqcharlotte enabled auto-merge (squash) September 27, 2025 00:10
@github-actions github-actions bot added the ready ONLY add when PR is ready to merge/full CI is needed label Sep 27, 2025
Signed-off-by: Andrew Xia <axia@fb.com>
auto-merge was automatically disabled September 30, 2025 20:51

Head branch was pushed to by a user without write access

@hmellor hmellor enabled auto-merge (squash) September 30, 2025 22:09
@hmellor hmellor merged commit 5db1870 into vllm-project:main Sep 30, 2025
45 checks passed
pdasigi pushed a commit to pdasigi/vllm that referenced this pull request Oct 2, 2025
…t#25186)

Signed-off-by: Andrew Xia <axia@meta.com>
Signed-off-by: Andrew Xia <axia@fb.com>
Co-authored-by: Andrew Xia <axia@fb.com>
yewentao256 pushed a commit that referenced this pull request Oct 3, 2025
Signed-off-by: Andrew Xia <axia@meta.com>
Signed-off-by: Andrew Xia <axia@fb.com>
Co-authored-by: Andrew Xia <axia@fb.com>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
tomeras91 pushed a commit to tomeras91/vllm that referenced this pull request Oct 6, 2025
…t#25186)

Signed-off-by: Andrew Xia <axia@meta.com>
Signed-off-by: Andrew Xia <axia@fb.com>
Co-authored-by: Andrew Xia <axia@fb.com>
Signed-off-by: Tomer Asida <57313761+tomeras91@users.noreply.github.com>
lywa1998 pushed a commit to lywa1998/vllm that referenced this pull request Oct 20, 2025
…t#25186)

Signed-off-by: Andrew Xia <axia@meta.com>
Signed-off-by: Andrew Xia <axia@fb.com>
Co-authored-by: Andrew Xia <axia@fb.com>
alhridoy pushed a commit to alhridoy/vllm that referenced this pull request Oct 24, 2025
…t#25186)

Signed-off-by: Andrew Xia <axia@meta.com>
Signed-off-by: Andrew Xia <axia@fb.com>
Co-authored-by: Andrew Xia <axia@fb.com>
rtourgeman pushed a commit to rtourgeman/vllm that referenced this pull request Nov 10, 2025
…t#25186)

Signed-off-by: Andrew Xia <axia@meta.com>
Signed-off-by: Andrew Xia <axia@fb.com>
Co-authored-by: Andrew Xia <axia@fb.com>
devpatelio pushed a commit to SumanthRH/vllm that referenced this pull request Nov 29, 2025
…t#25186)

Signed-off-by: Andrew Xia <axia@meta.com>
Signed-off-by: Andrew Xia <axia@fb.com>
Co-authored-by: Andrew Xia <axia@fb.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

frontend gpt-oss Related to GPT-OSS models ready ONLY add when PR is ready to merge/full CI is needed

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

4 participants