[Refactor] Extract Harmony streaming SSE event builders into streaming_events.py#34909
[Refactor] Extract Harmony streaming SSE event builders into streaming_events.py#34909vllm-bot merged 4 commits intovllm-project:mainfrom
Conversation
There was a problem hiding this comment.
Code Review
This pull request successfully refactors the Harmony streaming SSE event builders by extracting them from serving.py into a dedicated streaming_events.py module. This significantly improves the maintainability and testability of the code, and sets a solid foundation for unifying event emission across different streaming contexts. The refactor correctly handles the transition of state and dependencies (like tool_server). I have identified a robustness issue in the browser tool event builder that could lead to runtime crashes on unexpected model output.
| parsed_args = json.loads(previous_item.content[0].text) | ||
| action = None | ||
|
|
||
| if function_name == "search": | ||
| action = response_function_web_search.ActionSearch( | ||
| type="search", | ||
| query=parsed_args["query"], | ||
| ) | ||
| elif function_name == "open": | ||
| action = response_function_web_search.ActionOpenPage( | ||
| type="open_page", | ||
| # TODO: translate to url | ||
| url=f"cursor:{parsed_args.get('cursor', '')}", | ||
| ) | ||
| elif function_name == "find": | ||
| action = response_function_web_search.ActionFind( | ||
| type="find", | ||
| pattern=parsed_args["pattern"], | ||
| # TODO: translate to url | ||
| url=f"cursor:{parsed_args.get('cursor', '')}", | ||
| ) | ||
| else: | ||
| raise ValueError(f"Unknown function name: {function_name}") |
There was a problem hiding this comment.
The emit_browser_tool_events function lacks robustness against unexpected model output. Specifically:
json.loadswill raise aJSONDecodeErrorif the model produces malformed JSON.- Accessing
parsed_args["query"]andparsed_args["pattern"]will raise aKeyErrorif those keys are missing. - The
ValueErrorat line 658 will propagate and cause a 500 error for the streaming request.
Since this is an async generator context, these unhandled exceptions will crash the stream for the client. Consider implementing robust parsing with error handling similar to _parse_browser_tool_call in harmony_utils.py.
try:
parsed_args = json.loads(previous_item.content[0].text)
except (json.JSONDecodeError, IndexError):
return []
if function_name == "search":
query = parsed_args.get("query")
if not query: return []
action = response_function_web_search.ActionSearch(
type="search",
query=query,
)
elif function_name == "open":
action = response_function_web_search.ActionOpenPage(
type="open_page",
# TODO: translate to url
url=f"cursor:{parsed_args.get('cursor', '')}",
)
elif function_name == "find":
pattern = parsed_args.get("pattern")
if not pattern: return []
action = response_function_web_search.ActionFind(
type="find",
pattern=pattern,
# TODO: translate to url
url=f"cursor:{parsed_args.get('cursor', '')}",
)
else:
return []There was a problem hiding this comment.
This piece of code is copied over, I prefer to keep logic unchanged in this PR since it's for refactoring, but let me know if folks think otherwise.
|
PTAL: @qandrew @daniel-salib |
qandrew
left a comment
There was a problem hiding this comment.
lgtm, thanks! cc @houseroad can we add "ready" / automerge?
mgoin
left a comment
There was a problem hiding this comment.
LGTM just two nits. They can wait if the goal is to make this purely a movement PR
|
Verified the entrypoints-integration-responses-api tests passed locally (test_mcp_tools, test_parsable_context, test_parsable_context). The other failed tests are unrelated. |
…g_events.py (vllm-project#34909) Signed-off-by: sfeng33 <4florafeng@gmail.com> Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
…g_events.py (vllm-project#34909) Signed-off-by: sfeng33 <4florafeng@gmail.com> Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
…g_events.py (vllm-project#34909) Signed-off-by: sfeng33 <4florafeng@gmail.com> Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
…g_events.py (vllm-project#34909) Signed-off-by: sfeng33 <4florafeng@gmail.com> Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
…g_events.py (vllm-project#34909) Signed-off-by: sfeng33 <4florafeng@gmail.com> Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
…g_events.py (vllm-project#34909) Signed-off-by: sfeng33 <4florafeng@gmail.com> Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk> Signed-off-by: Andrii Skliar <askliar@nvidia.com>
…g_events.py (vllm-project#34909) Signed-off-by: sfeng33 <4florafeng@gmail.com> Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
Purpose
serving.pyis ~2800 lines. The_emit_*methods are pure functions (state + data → events) that don't use instance state, yet they live on the class. Extracting them:serving.pyby ~800 linesStreamingParsableContextsupport — the upcoming streaming parsable implementation will reuse these same event builders instead of duplicating them (as_process_simple_streaming_eventscurrently does)Future plan
This is the first step toward unifying SSE event emission across all three streaming paths (Harmony, Simple, Parsable):
Test Plan