Skip to content

[Responses] Decouple SSE event helpers from Harmony context#35148

Merged
vllm-bot merged 4 commits intovllm-project:mainfrom
sfeng33:sse_interface
Feb 25, 2026
Merged

[Responses] Decouple SSE event helpers from Harmony context#35148
vllm-bot merged 4 commits intovllm-project:mainfrom
sfeng33:sse_interface

Conversation

@sfeng33
Copy link
Copy Markdown
Contributor

@sfeng33 sfeng33 commented Feb 23, 2026

Purpose

Architecture: Two-layer design in streaming_events.py

The core refactor splits streaming_events.py into two layers: dispatchers that understand Harmony context objects, and leaf helpers that only accept plain strings. Dispatchers extract values from StreamingHarmonyContext / HarmonyMessage and delegate to leaf helpers, which build SSE events from primitive types. This means the event-building logic is reusable by any future backend without depending on Harmony.

     serving.py                streaming_events.py
    ┌──────────┐       ┌────────────────────────────────────────────────┐
    │          │       │                                                │
    │  ctx ────┼──────▶│  DISPATCHERS (Harmony-specific)               │
    │          │       │  ┌──────────────────────────────────────┐      │
    │          │       │  │ emit_content_delta_events(ctx,state) │      │
    │          │       │  │ emit_previous_item_done_events(prev) │      │
    │          │       │  │ emit_tool_action_events(ctx,state,ts)│      │
    │          │       │  └──────────────┬───────────────────────┘      │
    │          │       │                 │ extract plain values         │
    │          │       │                 ▼                              │
    │          │       │  LEAF HELPERS (backend-agnostic)               │
    │          │       │  ┌──────────────────────────────────────┐      │
    │          │       │  │  Delta:                              │      │
    │          │       │  │    emit_text_delta_events(str,.)     │      │
    │          │       │  │    emit_reasoning_delta_events(str,.)│      │
    │          │       │  │    emit_function_call_delta_ev(str,.)│      │
    │          │       │  │    emit_mcp_delta_events(str,.)      │      │
    │          │       │  │    emit_code_interp_delta_ev(str,.)  │      │
    │          │       │  │                                      │      │
    │          │       │  │  Done:                               │      │
    │          │       │  │    emit_text_output_done_events(str) │      │
    │          │       │  │    emit_reasoning_done_events(str)   │      │
    │          │       │  │    emit_function_call_done_ev(str,.) │      │
    │          │       │  │    emit_mcp_completion_events(str,.) │      │
    │          │       │  │    emit_code_interp_completion(..)   │      │
    │          │       │  │    emit_browser_tool_events(..)      │      │
    │          │       │  └──────────────────────────────────────┘      │
    │          │       │                 │                              │
    │          │       │                 ▼                              │
    │  events◄─┼───────│  list[StreamingResponsesResponse]             │
    │          │       │                                                │
    └──────────┘       └────────────────────────────────────────────────┘

Test Plan

pytest tests/entrypoints/openai/responses/test_harmony.py

@mergify mergify bot added frontend gpt-oss Related to GPT-OSS models labels Feb 23, 2026
@mergify
Copy link
Copy Markdown

mergify bot commented Feb 23, 2026

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @sfeng33.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request is a significant and well-executed refactoring of the SSE event generation logic. The decoupling of event helpers from the Harmony-specific context into dispatchers and backend-agnostic leaf helpers is a great architectural improvement that enhances reusability and maintainability. The accompanying enhancements to the test suite, particularly the more robust validation of event stream pairing, ordering, and field consistency, are also excellent and will help ensure correctness going forward.

However, I've found one critical issue in the refactored logic for function calls. The call_id for a ResponseFunctionToolCall is not consistent between its in_progress and completed states in a streaming response. This breaks the tool-calling protocol for clients. Please see the detailed comment for more information.

Signed-off-by: sfeng33 <4florafeng@gmail.com>
Signed-off-by: sfeng33 <4florafeng@gmail.com>
Signed-off-by: sfeng33 <4florafeng@gmail.com>
@sfeng33
Copy link
Copy Markdown
Contributor Author

sfeng33 commented Feb 23, 2026

PTAL: @qandrew @mgoin
cc @bbrowning

Copy link
Copy Markdown
Collaborator

@chaunceyjiang chaunceyjiang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks~

@github-project-automation github-project-automation bot moved this from To Triage to Ready in gpt-oss Issues & Enhancements Feb 24, 2026
@chaunceyjiang chaunceyjiang added the ready ONLY add when PR is ready to merge/full CI is needed label Feb 24, 2026
@chaunceyjiang
Copy link
Copy Markdown
Collaborator

/cc @qandrew PTAL.

@chaunceyjiang
Copy link
Copy Markdown
Collaborator

https://buildkite.com/vllm/ci/builds/52856#019c8e44-d8e8-42f7-a2a8-b6b25c0767d9 @sfeng33 PTAL.

Copy link
Copy Markdown
Contributor

@qandrew qandrew left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for the PR! lgtm, i trust Chauncey's review. is the eventual goal to have simpleContext/parsableContext use the logic in streaming_events too?

also cc @daniel-salib who is working on #35184 for streaming

@sfeng33
Copy link
Copy Markdown
Contributor Author

sfeng33 commented Feb 24, 2026

thanks for the PR! lgtm, i trust Chauncey's review. is the eventual goal to have simpleContext/parsableContext use the logic in streaming_events too?

also cc @daniel-salib who is working on #35184 for streaming

Thanks for taking a look, yes the goal is for parsableContext to re-use streaming_events for sse events. And deprecate simpleContext eventually.

Signed-off-by: sfeng33 <4florafeng@gmail.com>
@bbrowning
Copy link
Copy Markdown
Contributor

This looks like a reasonable cleanup that also fixes some bugs in our Responses streaming events. I don't think it necessarily fixes all the bugs (left a comment in one place about browser and container events), but I don't think fixing all bugs is necessarily the bar. Were you able to test this with a live model and a real client just to verify streaming behavior outside of what the unit test has?

@sfeng33
Copy link
Copy Markdown
Contributor Author

sfeng33 commented Feb 24, 2026

This looks like a reasonable cleanup that also fixes some bugs in our Responses streaming events. I don't think it necessarily fixes all the bugs (left a comment in one place about browser and container events), but I don't think fixing all bugs is necessarily the bar. Were you able to test this with a live model and a real client just to verify streaming behavior outside of what the unit test has?

Totally agree there are remaining bugs, in this PR I tried to keep it the same functionality, while fixing the one obvious bug I list in the PR summary. In terms of manual testing, I tested with gpt oss 20b model in the basic text and reasoning case, and see the stream events are the same as main. For the tool call/mcp/function call events, I actually think the way the events are emitted aren't as expected as the openai response api should emit, e.g. we emit all browser related events once, where we should emit right after tool call is executed.

@vllm-bot vllm-bot merged commit ec1d30c into vllm-project:main Feb 25, 2026
47 of 50 checks passed
@sfeng33 sfeng33 deleted the sse_interface branch February 25, 2026 04:10
tom-zju pushed a commit to tom-zju/vllm that referenced this pull request Feb 26, 2026
flutist pushed a commit to flutist/vllm_custom_dataset_img_support_base64 that referenced this pull request Feb 28, 2026
…ject#35148)

Signed-off-by: sfeng33 <4florafeng@gmail.com>
Signed-off-by: xjx <493337577@qq.com>
llsj14 pushed a commit to llsj14/vllm that referenced this pull request Mar 1, 2026
tunglinwood pushed a commit to tunglinwood/vllm that referenced this pull request Mar 4, 2026
askliar pushed a commit to askliar/vllm that referenced this pull request Mar 9, 2026
…ject#35148)

Signed-off-by: sfeng33 <4florafeng@gmail.com>
Signed-off-by: Andrii Skliar <askliar@nvidia.com>
Copilot AI pushed a commit to machov/vllm that referenced this pull request Mar 10, 2026
EricccYang pushed a commit to EricccYang/vllm that referenced this pull request Apr 1, 2026
…ject#35148)

Signed-off-by: sfeng33 <4florafeng@gmail.com>
Signed-off-by: EricccYang <yangyang4991@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

frontend gpt-oss Related to GPT-OSS models ready ONLY add when PR is ready to merge/full CI is needed

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

5 participants