[Responses API] Sanitize leaked Harmony control tokens in tool names and recipients#35881
Closed
will-deines wants to merge 4 commits intovllm-project:mainfrom
Closed
[Responses API] Sanitize leaked Harmony control tokens in tool names and recipients#35881will-deines wants to merge 4 commits intovllm-project:mainfrom
will-deines wants to merge 4 commits intovllm-project:mainfrom
Conversation
…and recipients GPT-OSS models generate Harmony protocol control tokens (<|channel|>, <|constrain|>, <|start|>, <|end|>, <|message|>) in unexpected positions during output generation, causing tool name contamination, recipient misrouting, and parser crashes. Three layers of defense: 1. sanitize_harmony_name() — pure string function that strips leaked control token strings from tool/recipient names. 2. ResilientStreamableParser — wrapper around StreamableParser that recovers from missing <|start|> tokens between messages and malformed <|constrain|> tokens in headers. 3. Routing-level fallback — sanitized-to-empty recipients fall through to _parse_message_no_recipient() instead of being misrouted. Applied at all input parsing, output dispatching, tool routing, and streaming delta extraction sites.
Contributor
There was a problem hiding this comment.
Code Review
This pull request introduces important sanitization logic to handle leaked Harmony control tokens, a critical fix for tool use with GPT-OSS models, utilizing a robust multi-layered defense approach. However, a critical security vulnerability exists as the current implementation fails to sanitize Message objects stored in the conversation history, potentially leading to control token injection in multi-turn interactions. Additionally, improvements are needed for the sanitization of structured recipient names to prevent failed tool calls, and some redundant code could be simplified.
…ents, remove redundancy - Add sanitize_harmony_recipient() that splits on '.', sanitizes each part, and rejoins to preserve dotted structure (e.g. browser<|channel|>.search becomes browser.search instead of being truncated to browser) - Sanitize recipients on messages returned by ResilientStreamableParser.messages to prevent control token injection in multi-turn conversation history - Remove redundant sanitization in parser_state_to_response_output since ResilientStreamableParser.current_recipient already handles it - Use sanitize_harmony_recipient for full recipient strings in context.py and harmony.py routing logic
This was referenced Mar 3, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
GPT-OSS models leak Harmony protocol control tokens (
<|channel|>,<|constrain|>,<|start|>,<|end|>,<|message|>) into tool names and recipient fields during generation. This causes:manage_cart<|channel|>commentaryinstead ofmanage_cart, corrupting function call routing and causing infinite tool-call loops<|constrain|>as recipient — e.g.<|constrain|>jsonmatches no routing pattern, falls through to MCP handler or raises errors<|start|>between channels — model omits start token between consecutive outputs, causingStreamableParserto throwHarmonyError<|constrain|>in headers — produces garbage in recipient or content_type fieldsThree layers of defense
sanitize_harmony_name()— Pure string function that finds the earliest Harmony control token in a name and returns only the text before it. Applied at all input parsing, output dispatching, tool routing, and streaming delta extraction sites.ResilientStreamableParser— Drop-in wrapper aroundStreamableParserthat intercepts two malformed token patterns:<|start|>recovery: when parser expects<|start|>but gets<|channel|>, inject the missing tokens<|constrain|>in headers: skip tokens until<|message|>or<|end|>Routing-level fallback — After sanitization, if a recipient becomes empty string, treat it as
Noneso it falls through to_parse_message_no_recipient()(produces a user-visible message instead of a misrouted MCP call).Related Issues & PRs
<|constrain|>misrouting<|channel|>from recipientsDecisions to debate
Wrapper vs. monkey-patch for
StreamableParser: We chose a wrapper class (ResilientStreamableParser) that delegates all properties to the inner parser, rather than monkey-patching or subclassing. This meansget_streamable_parser_for_assistant()returns our wrapper instead of a rawStreamableParser. All existing consumers work unchanged, butisinstance(parser, StreamableParser)checks would fail — we haven't found any such checks in the codebase, but reviewers should flag if they know of one.String-level vs. token-level sanitization:
sanitize_harmony_name()operates on strings, not token IDs. This is intentional — by the time we have amessage.recipientorfunction_name, it's already a string. Token-level recovery is handled separately byResilientStreamableParser.process(). The two layers are complementary, not redundant.Hardcoded token IDs (200003, 200005–200008): The
ResilientStreamableParserreferences specific GPT-OSS encoding token IDs. These are stable across theharmony-gpt-ossencoding but would break if a different encoding were used. We could look these up dynamically from the encoding, but the IDs are well-established constants and dynamic lookup adds complexity for no current benefit.Sanitization applied broadly (defense in depth): We sanitize at input parsing, output dispatch, tool routing, AND streaming — even though the
ResilientStreamableParsershould catch most issues at the token level. This is intentional defense-in-depth: if a code path bypasses the resilient parser (e.g. directMessageconstruction in tests or fromprevious_input_messages), the string-level sanitization still catches leaked tokens.Empty-after-sanitization →
Nonefallback: When sanitizing a recipient produces an empty string, we convert it toNonerather than raising an error. This causes the message to be treated as a "no-recipient" message (preamble), which is the safest fallback — the user sees the text content rather than getting a routing error. This is a design choice that could mask other bugs; an alternative would be to log a warning.Files changed
vllm/entrypoints/openai/parser/harmony_utils.pysanitize_harmony_name(),ResilientStreamableParser, wrapget_streamable_parser_for_assistant(), sanitize input parsingvllm/entrypoints/openai/responses/harmony.pyvllm/entrypoints/openai/responses/context.pyvllm/entrypoints/openai/chat_completion/stream_harmony.pytests/entrypoints/openai/parser/test_harmony_utils.pysanitize_harmony_name+ResilientStreamableParsertests/entrypoints/openai/responses/test_harmony_utils.pyTest plan
TestSanitizeHarmonyName— 7 cases: clean passthrough,<|channel|>stripping,<|constrain|>stripping, pure token → empty, multiple tokens → earliest wins, empty input, trailing whitespaceTestResilientStreamableParser— 3 cases: normal sequence unchanged, missing<|start|>recovery,<|constrain|>in header skipTestHarmonyOutputSanitization— 2 cases:<|constrain|>jsonrecipient → message output, contaminated function name → cleaned