[Responses] Decouple SSE event helpers from Harmony context#35148
[Responses] Decouple SSE event helpers from Harmony context#35148vllm-bot merged 4 commits intovllm-project:mainfrom
Conversation
|
This pull request has merge conflicts that must be resolved before it can be |
There was a problem hiding this comment.
Code Review
This pull request is a significant and well-executed refactoring of the SSE event generation logic. The decoupling of event helpers from the Harmony-specific context into dispatchers and backend-agnostic leaf helpers is a great architectural improvement that enhances reusability and maintainability. The accompanying enhancements to the test suite, particularly the more robust validation of event stream pairing, ordering, and field consistency, are also excellent and will help ensure correctness going forward.
However, I've found one critical issue in the refactored logic for function calls. The call_id for a ResponseFunctionToolCall is not consistent between its in_progress and completed states in a streaming response. This breaks the tool-calling protocol for clients. Please see the detailed comment for more information.
Signed-off-by: sfeng33 <4florafeng@gmail.com>
Signed-off-by: sfeng33 <4florafeng@gmail.com>
644f8ac to
55b7300
Compare
Signed-off-by: sfeng33 <4florafeng@gmail.com>
|
PTAL: @qandrew @mgoin |
|
/cc @qandrew PTAL. |
qandrew
left a comment
There was a problem hiding this comment.
thanks for the PR! lgtm, i trust Chauncey's review. is the eventual goal to have simpleContext/parsableContext use the logic in streaming_events too?
also cc @daniel-salib who is working on #35184 for streaming
Thanks for taking a look, yes the goal is for parsableContext to re-use streaming_events for sse events. And deprecate simpleContext eventually. |
|
This looks like a reasonable cleanup that also fixes some bugs in our Responses streaming events. I don't think it necessarily fixes all the bugs (left a comment in one place about browser and container events), but I don't think fixing all bugs is necessarily the bar. Were you able to test this with a live model and a real client just to verify streaming behavior outside of what the unit test has? |
Totally agree there are remaining bugs, in this PR I tried to keep it the same functionality, while fixing the one obvious bug I list in the PR summary. In terms of manual testing, I tested with gpt oss 20b model in the basic text and reasoning case, and see the stream events are the same as main. For the tool call/mcp/function call events, I actually think the way the events are emitted aren't as expected as the openai response api should emit, e.g. we emit all browser related events once, where we should emit right after tool call is executed. |
…ject#35148) Signed-off-by: sfeng33 <4florafeng@gmail.com>
…ject#35148) Signed-off-by: sfeng33 <4florafeng@gmail.com> Signed-off-by: xjx <493337577@qq.com>
…ject#35148) Signed-off-by: sfeng33 <4florafeng@gmail.com>
…ject#35148) Signed-off-by: sfeng33 <4florafeng@gmail.com>
…ject#35148) Signed-off-by: sfeng33 <4florafeng@gmail.com> Signed-off-by: Andrii Skliar <askliar@nvidia.com>
…ject#35148) Signed-off-by: sfeng33 <4florafeng@gmail.com>
…ject#35148) Signed-off-by: sfeng33 <4florafeng@gmail.com> Signed-off-by: EricccYang <yangyang4991@gmail.com>
Purpose
Architecture: Two-layer design in streaming_events.py
The core refactor splits streaming_events.py into two layers: dispatchers that understand Harmony context objects, and leaf helpers that only accept plain strings. Dispatchers extract values from StreamingHarmonyContext / HarmonyMessage and delegate to leaf helpers, which build SSE events from primitive types. This means the event-building logic is reusable by any future backend without depending on Harmony.
Test Plan