Fix gpt‑oss Harmony token leaks in tool names and streaming content #32587#32633
Fix gpt‑oss Harmony token leaks in tool names and streaming content #32587#32633baonudesifeizhai wants to merge 5 commits intovllm-project:mainfrom
Conversation
There was a problem hiding this comment.
Code Review
This pull request introduces sanitization functions to prevent leaking Harmony control tokens in tool names and content. The changes are applied across several files to ensure that user-facing output is clean. The implementation of the sanitization logic is sound. I've pointed out a couple of places where the tool name parsing logic is inconsistent with the rest of the codebase, which could lead to incorrect behavior if tool names contain dots. Fixing this would improve the robustness of the code.
| def _parse_function_call(message: Message, recipient: str) -> list[ResponseOutputItem]: | ||
| """Parse function calls into function tool call items.""" | ||
| function_name = recipient.split(".")[-1] | ||
| function_name = sanitize_harmony_tool_name(recipient.split(".")[-1]) |
There was a problem hiding this comment.
The logic to extract the function name using recipient.split(".")[-1] can be incorrect if the function name itself contains dots. For consistency with other parts of the codebase (e.g., serving.py), it's better to slice the string after the functions. prefix. This will correctly handle function names that might contain dots.
| function_name = sanitize_harmony_tool_name(recipient.split(".")[-1]) | |
| function_name = sanitize_harmony_tool_name(recipient[len("functions.") :]) |
|
|
||
| if current_recipient and parser.current_channel in ("commentary", "analysis"): | ||
| if current_recipient.startswith("functions."): | ||
| function_name = sanitize_harmony_tool_name(current_recipient.split(".")[-1]) |
There was a problem hiding this comment.
Similar to another comment, using current_recipient.split(".")[-1] to extract the function name can be problematic if the function name contains dots. To ensure correctness and consistency, it's safer to slice the string after the functions. prefix.
| function_name = sanitize_harmony_tool_name(current_recipient.split(".")[-1]) | |
| function_name = sanitize_harmony_tool_name(current_recipient[len("functions.") :]) |
chaunceyjiang
left a comment
There was a problem hiding this comment.
Could you provide a reproduction example?
I’d like to reproduce it locally.
|
|
Okay, I’ll try to reproduce it locally. |
|
This pull request has merge conflicts that must be resolved before it can be |
Purpose
#32587
Defined sanitize_harmony_tool_name and strip_harmony_control_tokens in harmony_utils.py.
Used the sanitizer when reconstructing Harmony messages from chat/response history and when converting Harmony outputs back into API responses in harmony_utils.py.
Applied the sanitizer in tool extraction for chat completions (non‑stream) and stripped control tokens from final/commentary content in openai_tool_parser.py.
Applied the sanitizer in streaming deltas for chat completions and stripped control tokens from streamed content/reasoning deltas in stream_harmony.py.
Applied the sanitizer for function call events and stripped control tokens from response streaming deltas and done events in serving.py.
Test Plan
Test Result
main branch : https://paste.ubuntu.com/p/YthB2VTPQ8/
this branch

pytest tests/entrypoints/openai/parser/test_harmony_utils.py -v -- passed
Essential Elements of an Effective PR Description Checklist
supported_models.mdandexamplesfor a new model.