[BugFix] Fix malformed output with streaming responses API #29044

njhill · 2025-11-19T22:58:24Z

Attempting to fix issues described in #28635.

This was written by claude code, I haven't yet scrutinized the logic.

● Perfect! I've identified and fixed the race condition. Here's a summary:

  Root Cause

  The race condition was caused by incomplete implementation in StreamingHarmonyContext.append_tool_output() at vllm/entrypoints/context.py:550. The method had a TODO comment indicating that tool
  output messages should be added to self._messages, but this was never implemented.

  The Bug Flow

  1. Tool execution completes: append_tool_output() processes tool output tokens through the parser and updates self.last_tok
  2. Critical bug: Tool message is NOT added to self._messages (the TODO was never completed)
  3. Next turn begins: render_for_completion() is called to prepare the next prompt
  4. Token mismatch: It renders self._messages (missing the tool output) and tries to find self.last_tok in the rendered tokens
  5. Wrong position: If self.last_tok happens to match a token at the wrong position, it processes tokens from an incorrect location
  6. Parser corruption: The harmony parser receives garbled token sequences and fails with "Unknown role: assistantary"

  The error "assistantary" is likely a result of the parser trying to extract a role from tokens that span across message boundaries or are otherwise malformed due to processing from the wrong
  offset.

  Fixes Applied

  1. Primary fix (vllm/entrypoints/context.py:551): Completed the TODO by adding self._messages.extend(output) to properly track tool output messages
  2. Secondary fix (vllm/entrypoints/context.py:534-535): Added guard to only update self.last_tok if tokens were actually processed, preventing potential NameError if token_ids is empty
  3. Additional fix (vllm/entrypoints/context.py:217): Removed erroneous self.parser = get_streamable_parser_for_assistant() that was resetting the parser state on every append_output() call in
  non-streaming HarmonyContext

  These fixes ensure that the message list stays synchronized with the parser state, preventing token processing from incorrect positions that lead to the race condition errors.

Signed-off-by: Nick Hill <[email protected]>

njhill · 2025-11-19T23:02:08Z

vllm/entrypoints/context.py


    def append_output(self, output: RequestOutput) -> None:
        output_token_ids = output.outputs[0].token_ids
-        self.parser = get_streamable_parser_for_assistant()


@heheda12345 it looks like you added this line in #22512... wonder if you could check this and claude's logic when you have a chance?

Actually in response to the new test failure below, claude suggested to reinstate this line.

What's not clear to me is why the non-streaming case resets the parser between turns but the streaming case uses the same parser and tries to feed the missing tokens into it to catch it back up with the expected state. On the surface, it feels like both of these should take the same approach - either resetting the parser between turns or doing the extra logic the streaming side does down in streaming's render_for_completion.

But, for the sake of trying to get this going without digging deeper into things here, I think putting this back like it was with the added comment at least makes it clear why it's doing this.

bbrowning · 2025-11-19T23:04:23Z

vllm/entrypoints/context.py

        self.last_tok = toks[-1]
-        # TODO: add tool_output messages to self._messages
+        # Add tool output messages to self._messages
+        self._messages.extend(output)


This looks like a key fix to any scenario involving server-side builtin tools executed via the Responses API. Without this, the logic in render_for_completion(self) of the StreamingHarmonyContext would backtrack much further than just the <|start|>assistant tokens it is trying to do and would go all the way back to before the tool output, leading to who knows what as far as next tokens and state.

njhill · 2025-11-20T00:53:24Z

I ran the CI on this change in #27614, unfortunately a different test failed before it reached test_streaming:

[2025-11-20T00:15:34Z] FAILED entrypoints/openai/test_response_api_mcp_tools.py::test_mcp_tool_env_flag_enabled[openai/gpt-oss-20b] - openai.BadRequestError: Error code: 400 - {'error': {'message': 'Unexpected token 815 while expecting start token 200006', 'type': 'BadRequestError', 'param': None, 'code': 400}}

Signed-off-by: Nick Hill <[email protected]>

[BugFix] Fix malformed output with streaming responses API

5cdbdb1

Signed-off-by: Nick Hill <[email protected]>

mergify bot added frontend gpt-oss Related to GPT-OSS models labels Nov 19, 2025

github-project-automation bot added this to gpt-oss Issues & Enhancements Nov 19, 2025

github-project-automation bot moved this to To Triage in gpt-oss Issues & Enhancements Nov 19, 2025

njhill mentioned this pull request Nov 19, 2025

[Bug]: Streaming=True causes missing or scrambled tokens with GPT-OSS 120B on vLLM v0.11.0 #28635

Open

njhill commented Nov 19, 2025

View reviewed changes

njhill requested a review from heheda12345 November 19, 2025 23:02

bbrowning reviewed Nov 19, 2025

View reviewed changes

njhill added 2 commits November 19, 2025 17:03

fix multi-turn MCP

f04f01b

Signed-off-by: Nick Hill <[email protected]>

next fix from claude

4d59d5e

Signed-off-by: Nick Hill <[email protected]>

njhill force-pushed the fix-harmony-stream branch from fee81d7 to 4d59d5e Compare November 20, 2025 04:14

next fix from claude

9ae585f

Signed-off-by: Nick Hill <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[BugFix] Fix malformed output with streaming responses API #29044

[BugFix] Fix malformed output with streaming responses API #29044

njhill commented Nov 19, 2025 •

edited by github-actions bot

Loading

Uh oh!

njhill Nov 19, 2025

Uh oh!

njhill Nov 20, 2025 •

edited

Loading

Uh oh!

bbrowning Nov 20, 2025

Uh oh!

bbrowning Nov 19, 2025

Uh oh!

njhill commented Nov 20, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

[BugFix] Fix malformed output with streaming responses API #29044

Are you sure you want to change the base?

[BugFix] Fix malformed output with streaming responses API #29044

Conversation

njhill commented Nov 19, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

njhill Nov 19, 2025

Choose a reason for hiding this comment

Uh oh!

njhill Nov 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

bbrowning Nov 20, 2025

Choose a reason for hiding this comment

Uh oh!

bbrowning Nov 19, 2025

Choose a reason for hiding this comment

Uh oh!

njhill commented Nov 20, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

njhill commented Nov 19, 2025 •

edited by github-actions bot

Loading

njhill Nov 20, 2025 •

edited

Loading