Skip to content

[Responses API] Sanitize leaked Harmony control tokens in tool names and recipients#35881

Closed
will-deines wants to merge 4 commits intovllm-project:mainfrom
will-deines:harmony-token-sanitization
Closed

[Responses API] Sanitize leaked Harmony control tokens in tool names and recipients#35881
will-deines wants to merge 4 commits intovllm-project:mainfrom
will-deines:harmony-token-sanitization

Conversation

@will-deines
Copy link
Copy Markdown

Summary

GPT-OSS models leak Harmony protocol control tokens (<|channel|>, <|constrain|>, <|start|>, <|end|>, <|message|>) into tool names and recipient fields during generation. This causes:

  • Tool name contamination — e.g. manage_cart<|channel|>commentary instead of manage_cart, corrupting function call routing and causing infinite tool-call loops
  • <|constrain|> as recipient — e.g. <|constrain|>json matches no routing pattern, falls through to MCP handler or raises errors
  • Missing <|start|> between channels — model omits start token between consecutive outputs, causing StreamableParser to throw HarmonyError
  • Malformed <|constrain|> in headers — produces garbage in recipient or content_type fields

Three layers of defense

  1. sanitize_harmony_name() — Pure string function that finds the earliest Harmony control token in a name and returns only the text before it. Applied at all input parsing, output dispatching, tool routing, and streaming delta extraction sites.

  2. ResilientStreamableParser — Drop-in wrapper around StreamableParser that intercepts two malformed token patterns:

    • Missing <|start|> recovery: when parser expects <|start|> but gets <|channel|>, inject the missing tokens
    • Malformed <|constrain|> in headers: skip tokens until <|message|> or <|end|>
  3. Routing-level fallback — After sanitization, if a recipient becomes empty string, treat it as None so it falls through to _parse_message_no_recipient() (produces a user-visible message instead of a misrouted MCP call).

Related Issues & PRs

# Title Relation
#32587 Special tokens leak into tool names Primary bug report for tool name contamination
#30372 Distorted tool names + infinite tool-call loop Consequence of tool name contamination
#23567 HarmonyError: unexpected tokens in message header Parser crash from malformed sequences
#28262 Incorrect input/output handling in Responses API Channel metadata loss causing <|constrain|> misrouting
#31677 Sanitize malformed tool call recipients (stale PR) Strips <|channel|> from recipients
#32633 Fix token leaks in tool names and streaming (stale PR) Defines sanitize + strip functions
#28303 Parse gpt-oss refusals w/ non-strict mode (stale PR) Different approach via openai-harmony library
#29236 Fix gpt oss tool parser v2 (stale PR) Also addresses tag sanitization
#34857 Responses API & Tool Calling H1 2026 roadmap Lists "guided decode and structured outputs" as focus area

Decisions to debate

  1. Wrapper vs. monkey-patch for StreamableParser: We chose a wrapper class (ResilientStreamableParser) that delegates all properties to the inner parser, rather than monkey-patching or subclassing. This means get_streamable_parser_for_assistant() returns our wrapper instead of a raw StreamableParser. All existing consumers work unchanged, but isinstance(parser, StreamableParser) checks would fail — we haven't found any such checks in the codebase, but reviewers should flag if they know of one.

  2. String-level vs. token-level sanitization: sanitize_harmony_name() operates on strings, not token IDs. This is intentional — by the time we have a message.recipient or function_name, it's already a string. Token-level recovery is handled separately by ResilientStreamableParser.process(). The two layers are complementary, not redundant.

  3. Hardcoded token IDs (200003, 200005–200008): The ResilientStreamableParser references specific GPT-OSS encoding token IDs. These are stable across the harmony-gpt-oss encoding but would break if a different encoding were used. We could look these up dynamically from the encoding, but the IDs are well-established constants and dynamic lookup adds complexity for no current benefit.

  4. Sanitization applied broadly (defense in depth): We sanitize at input parsing, output dispatch, tool routing, AND streaming — even though the ResilientStreamableParser should catch most issues at the token level. This is intentional defense-in-depth: if a code path bypasses the resilient parser (e.g. direct Message construction in tests or from previous_input_messages), the string-level sanitization still catches leaked tokens.

  5. Empty-after-sanitization → None fallback: When sanitizing a recipient produces an empty string, we convert it to None rather than raising an error. This causes the message to be treated as a "no-recipient" message (preamble), which is the safest fallback — the user sees the text content rather than getting a routing error. This is a design choice that could mask other bugs; an alternative would be to log a warning.

Files changed

File Change
vllm/entrypoints/openai/parser/harmony_utils.py Add sanitize_harmony_name(), ResilientStreamableParser, wrap get_streamable_parser_for_assistant(), sanitize input parsing
vllm/entrypoints/openai/responses/harmony.py Sanitize recipients in output dispatch + input parsing functions
vllm/entrypoints/openai/responses/context.py Sanitize recipients in tool routing
vllm/entrypoints/openai/chat_completion/stream_harmony.py Sanitize tool names in streaming delta extraction
tests/entrypoints/openai/parser/test_harmony_utils.py Unit tests for sanitize_harmony_name + ResilientStreamableParser
tests/entrypoints/openai/responses/test_harmony_utils.py Unit tests for output sanitization (contaminated recipients + tool names)

Test plan

  • TestSanitizeHarmonyName — 7 cases: clean passthrough, <|channel|> stripping, <|constrain|> stripping, pure token → empty, multiple tokens → earliest wins, empty input, trailing whitespace
  • TestResilientStreamableParser — 3 cases: normal sequence unchanged, missing <|start|> recovery, <|constrain|> in header skip
  • TestHarmonyOutputSanitization — 2 cases: <|constrain|>json recipient → message output, contaminated function name → cleaned
  • All existing parser and responses unit tests pass (90 total, 0 regressions)
  • Integration test with live GPT-OSS model (needs model access)

…and recipients

GPT-OSS models generate Harmony protocol control tokens (<|channel|>,
<|constrain|>, <|start|>, <|end|>, <|message|>) in unexpected positions
during output generation, causing tool name contamination, recipient
misrouting, and parser crashes.

Three layers of defense:

1. sanitize_harmony_name() — pure string function that strips leaked
   control token strings from tool/recipient names.

2. ResilientStreamableParser — wrapper around StreamableParser that
   recovers from missing <|start|> tokens between messages and
   malformed <|constrain|> tokens in headers.

3. Routing-level fallback — sanitized-to-empty recipients fall through
   to _parse_message_no_recipient() instead of being misrouted.

Applied at all input parsing, output dispatching, tool routing, and
streaming delta extraction sites.
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces important sanitization logic to handle leaked Harmony control tokens, a critical fix for tool use with GPT-OSS models, utilizing a robust multi-layered defense approach. However, a critical security vulnerability exists as the current implementation fails to sanitize Message objects stored in the conversation history, potentially leading to control token injection in multi-turn interactions. Additionally, improvements are needed for the sanitization of structured recipient names to prevent failed tool calls, and some redundant code could be simplified.

garrio-1 and others added 3 commits March 3, 2026 12:11
…ents, remove redundancy

- Add sanitize_harmony_recipient() that splits on '.', sanitizes each
  part, and rejoins to preserve dotted structure (e.g. browser<|channel|>.search
  becomes browser.search instead of being truncated to browser)
- Sanitize recipients on messages returned by ResilientStreamableParser.messages
  to prevent control token injection in multi-turn conversation history
- Remove redundant sanitization in parser_state_to_response_output since
  ResilientStreamableParser.current_recipient already handles it
- Use sanitize_harmony_recipient for full recipient strings in context.py
  and harmony.py routing logic
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

frontend gpt-oss Related to GPT-OSS models

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

2 participants