[Responses API] Sanitize leaked Harmony control tokens in tool names and recipients by will-deines · Pull Request #35881 · vllm-project/vllm

will-deines · 2026-03-03T16:33:39Z

Summary

GPT-OSS models leak Harmony protocol control tokens (<|channel|>, <|constrain|>, <|start|>, <|end|>, <|message|>) into tool names and recipient fields during generation. This causes:

Tool name contamination — e.g. manage_cart<|channel|>commentary instead of manage_cart, corrupting function call routing and causing infinite tool-call loops
<|constrain|> as recipient — e.g. <|constrain|>json matches no routing pattern, falls through to MCP handler or raises errors
Missing <|start|> between channels — model omits start token between consecutive outputs, causing StreamableParser to throw HarmonyError
Malformed <|constrain|> in headers — produces garbage in recipient or content_type fields

Three layers of defense

sanitize_harmony_name() — Pure string function that finds the earliest Harmony control token in a name and returns only the text before it. Applied at all input parsing, output dispatching, tool routing, and streaming delta extraction sites.
ResilientStreamableParser — Drop-in wrapper around StreamableParser that intercepts two malformed token patterns:
- Missing <|start|> recovery: when parser expects <|start|> but gets <|channel|>, inject the missing tokens
- Malformed <|constrain|> in headers: skip tokens until <|message|> or <|end|>
Routing-level fallback — After sanitization, if a recipient becomes empty string, treat it as None so it falls through to _parse_message_no_recipient() (produces a user-visible message instead of a misrouted MCP call).

Related Issues & PRs

#	Title	Relation
#32587	Special tokens leak into tool names	Primary bug report for tool name contamination
#30372	Distorted tool names + infinite tool-call loop	Consequence of tool name contamination
#23567	HarmonyError: unexpected tokens in message header	Parser crash from malformed sequences
#28262	Incorrect input/output handling in Responses API	Channel metadata loss causing `<\|constrain\|>` misrouting
#31677	Sanitize malformed tool call recipients (stale PR)	Strips `<\|channel\|>` from recipients
#32633	Fix token leaks in tool names and streaming (stale PR)	Defines sanitize + strip functions
#28303	Parse gpt-oss refusals w/ non-strict mode (stale PR)	Different approach via openai-harmony library
#29236	Fix gpt oss tool parser v2 (stale PR)	Also addresses tag sanitization
#34857	Responses API & Tool Calling H1 2026 roadmap	Lists "guided decode and structured outputs" as focus area

Decisions to debate

Wrapper vs. monkey-patch for StreamableParser: We chose a wrapper class (ResilientStreamableParser) that delegates all properties to the inner parser, rather than monkey-patching or subclassing. This means get_streamable_parser_for_assistant() returns our wrapper instead of a raw StreamableParser. All existing consumers work unchanged, but isinstance(parser, StreamableParser) checks would fail — we haven't found any such checks in the codebase, but reviewers should flag if they know of one.
String-level vs. token-level sanitization: sanitize_harmony_name() operates on strings, not token IDs. This is intentional — by the time we have a message.recipient or function_name, it's already a string. Token-level recovery is handled separately by ResilientStreamableParser.process(). The two layers are complementary, not redundant.
Hardcoded token IDs (200003, 200005–200008): The ResilientStreamableParser references specific GPT-OSS encoding token IDs. These are stable across the harmony-gpt-oss encoding but would break if a different encoding were used. We could look these up dynamically from the encoding, but the IDs are well-established constants and dynamic lookup adds complexity for no current benefit.
Sanitization applied broadly (defense in depth): We sanitize at input parsing, output dispatch, tool routing, AND streaming — even though the ResilientStreamableParser should catch most issues at the token level. This is intentional defense-in-depth: if a code path bypasses the resilient parser (e.g. direct Message construction in tests or from previous_input_messages), the string-level sanitization still catches leaked tokens.
Empty-after-sanitization → None fallback: When sanitizing a recipient produces an empty string, we convert it to None rather than raising an error. This causes the message to be treated as a "no-recipient" message (preamble), which is the safest fallback — the user sees the text content rather than getting a routing error. This is a design choice that could mask other bugs; an alternative would be to log a warning.

Files changed

File	Change
`vllm/entrypoints/openai/parser/harmony_utils.py`	Add `sanitize_harmony_name()`, `ResilientStreamableParser`, wrap `get_streamable_parser_for_assistant()`, sanitize input parsing
`vllm/entrypoints/openai/responses/harmony.py`	Sanitize recipients in output dispatch + input parsing functions
`vllm/entrypoints/openai/responses/context.py`	Sanitize recipients in tool routing
`vllm/entrypoints/openai/chat_completion/stream_harmony.py`	Sanitize tool names in streaming delta extraction
`tests/entrypoints/openai/parser/test_harmony_utils.py`	Unit tests for `sanitize_harmony_name` + `ResilientStreamableParser`
`tests/entrypoints/openai/responses/test_harmony_utils.py`	Unit tests for output sanitization (contaminated recipients + tool names)

Test plan

TestSanitizeHarmonyName — 7 cases: clean passthrough, <|channel|> stripping, <|constrain|> stripping, pure token → empty, multiple tokens → earliest wins, empty input, trailing whitespace
TestResilientStreamableParser — 3 cases: normal sequence unchanged, missing <|start|> recovery, <|constrain|> in header skip
TestHarmonyOutputSanitization — 2 cases: <|constrain|>json recipient → message output, contaminated function name → cleaned
All existing parser and responses unit tests pass (90 total, 0 regressions)
Integration test with live GPT-OSS model (needs model access)

…and recipients GPT-OSS models generate Harmony protocol control tokens (<|channel|>, <|constrain|>, <|start|>, <|end|>, <|message|>) in unexpected positions during output generation, causing tool name contamination, recipient misrouting, and parser crashes. Three layers of defense: 1. sanitize_harmony_name() — pure string function that strips leaked control token strings from tool/recipient names. 2. ResilientStreamableParser — wrapper around StreamableParser that recovers from missing <|start|> tokens between messages and malformed <|constrain|> tokens in headers. 3. Routing-level fallback — sanitized-to-empty recipients fall through to _parse_message_no_recipient() instead of being misrouted. Applied at all input parsing, output dispatching, tool routing, and streaming delta extraction sites.

gemini-code-assist

Code Review

This pull request introduces important sanitization logic to handle leaked Harmony control tokens, a critical fix for tool use with GPT-OSS models, utilizing a robust multi-layered defense approach. However, a critical security vulnerability exists as the current implementation fails to sanitize Message objects stored in the conversation history, potentially leading to control token injection in multi-turn interactions. Additionally, improvements are needed for the sanitization of structured recipient names to prevent failed tool calls, and some redundant code could be simplified.

vllm/entrypoints/openai/parser/harmony_utils.py

vllm/entrypoints/openai/responses/context.py

vllm/entrypoints/openai/parser/harmony_utils.py

vllm/entrypoints/openai/responses/context.py

vllm/entrypoints/openai/responses/harmony.py

…ents, remove redundancy - Add sanitize_harmony_recipient() that splits on '.', sanitizes each part, and rejoins to preserve dotted structure (e.g. browser<|channel|>.search becomes browser.search instead of being truncated to browser) - Sanitize recipients on messages returned by ResilientStreamableParser.messages to prevent control token injection in multi-turn conversation history - Remove redundant sanitization in parser_state_to_response_output since ResilientStreamableParser.current_recipient already handles it - Use sanitize_harmony_recipient for full recipient strings in context.py and harmony.py routing logic

…line

mergify bot added frontend gpt-oss Related to GPT-OSS models labels Mar 3, 2026

github-project-automation bot added this to gpt-oss Issues & Enhancements Mar 3, 2026

github-project-automation bot moved this to To Triage in gpt-oss Issues & Enhancements Mar 3, 2026

gemini-code-assist bot reviewed Mar 3, 2026

View reviewed changes

garrio-1 and others added 3 commits March 3, 2026 12:11

Fix pre-commit formatting: import order, line length, trailing blank …

d828ea1

…line

Merge branch 'main' into harmony-token-sanitization

9bbde8b

will-deines closed this Mar 3, 2026

github-project-automation bot moved this from To Triage to Done in gpt-oss Issues & Enhancements Mar 3, 2026

This was referenced Mar 3, 2026

[Responses API] Sanitize leaked Harmony control tokens in tool names and recipients #35901

Closed

[Responses API] Sanitize leaked Harmony control tokens in tool names and recipients #35906

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Responses API] Sanitize leaked Harmony control tokens in tool names and recipients#35881

[Responses API] Sanitize leaked Harmony control tokens in tool names and recipients#35881
will-deines wants to merge 4 commits intovllm-project:mainfrom
will-deines:harmony-token-sanitization

will-deines commented Mar 3, 2026

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

will-deines commented Mar 3, 2026

Summary

Three layers of defense

Related Issues & PRs

Decisions to debate

Files changed

Test plan

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants