[Bugfix] Fix harmony streaming tool call crash and argument splitting by Pradyun92 · Pull Request #37070 · vllm-project/vllm

Pradyun92 · 2026-03-14T21:31:09Z

Purpose

Two fixes for Chat Completions streaming with harmony models (e.g., gpt_oss):

Bug 1: Tool call arguments split across indices

With speculative decoding (Eagle), multi-token batches can span channel boundaries. extract_harmony_streaming_delta reads harmony_parser.messages to compute tool call indices, so it must be called immediately after each token is processed — not after the entire batch. Otherwise the parser state is fully advanced and base_index is wrong for early tokens, causing argument fragments to land in separate tool calls.

For example, a single tool call produces two entries on the client:

Tool[0] get_weather: {"city": "Tokyo"    ← missing closing }
Tool[1] None: }                           ← stray } as separate "tool call"

Fix: Interleave token processing with delta extraction — process one token through harmony_parser.process(), immediately call extract_harmony_streaming_delta for that token, then process the next token. Merge the per-token deltas into a single DeltaMessage.

Bug 2: Server crash (IndexError) in autocomplete logic

The "unstreamed tool arg tokens" autocomplete logic at lines 1232-1281 accesses tool_parser.prev_tool_call_arr[index] and tool_parser.streamed_args_for_tool[index]. Harmony models use extract_harmony_streaming_delta() for tool call parsing, which never populates these arrays. Indexing into empty arrays raises IndexError.

Fix: Add and not self.use_harmony guard to skip the autocomplete block for harmony models.

Not duplicating existing PRs: Checked open PRs #33520, #35449, #35907, #36011, #36445 — none address these specific bugs (argument splitting across indices or IndexError in autocomplete logic).

AI assistance: This PR was developed with AI assistance (Claude). The submitter has reviewed all changes and tested end-to-end.

Test Plan

# Start vLLM with a harmony model + Eagle speculative decoding
vllm serve <harmony-model-path> \
  --enable-auto-tool-choice --tool-call-parser openai \
  --reasoning-parser openai_gptoss \
  --speculative-config '{"method":"eagle3","model":"<eagle-head-path>","num_speculative_tokens":5}'

# Streaming tool call (was crashing / producing malformed args):
curl -X POST http://localhost:8000/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "<model>",
    "stream": true,
    "messages": [{"role": "user", "content": "What is the weather in Tokyo?"}],
    "tools": [{"type": "function", "function": {"name": "get_weather", "description": "Get weather", "parameters": {"type": "object", "properties": {"city": {"type": "string"}}, "required": ["city"]}}}]
  }'

# BFCL multi-turn function calling benchmark:
OPENAI_BASE_URL="http://localhost:8000/v1" \
OPENAI_API_KEY="fake" \
bfcl generate \
  --model <model> \
  --test-category multi_turn \
  --num-threads 8

Test Result

Before fix (Bug 1): Streaming tool call arguments split across indices:

Tool[0] get_weather: {"city": "Tokyo"     ← incomplete
Tool[1] None: }                            ← stray fragment

Before fix (Bug 2): Server returns 500:

{"error": {"message": "list index out of range", "type": "InternalServerError", "code": 500}}

After fix: Streaming tool calls produce correct, complete JSON:

Tool[0] get_weather: {"city": "Tokyo", "units": "celsius"}

BFCL multi-turn results (gpt-oss-120b with Eagle3 speculative decoding):

Category	Accuracy
Multi Turn Overall	54.75%
Base	65.00%
Miss Func	51.50%
Miss Param	48.50%
Long Context	54.00%

All 800 test cases completed with 0 malformed JSON errors in tool call arguments.

Essential Elements of an Effective PR Description Checklist

gemini-code-assist

Code Review

This pull request fixes an IndexError crash that occurs during streaming tool calls with harmony models. The change adds a condition, and not self.use_harmony, to prevent access to unpopulated tool_parser arrays for these models. This approach correctly addresses the described bug. The provided test plan appears sufficient to validate the fix. I have reviewed the changes and have no further comments.

Two fixes for Chat Completions streaming with harmony models (gpt_oss): 1. Tool call arguments split across indices: With speculative decoding (Eagle), multi-token batches can span channel boundaries. extract_harmony_streaming_delta reads harmony_parser.messages to compute tool call indices, so it must be called immediately after each token is processed — not after the entire batch. Otherwise the parser state is fully advanced and base_index is wrong for early tokens, causing argument fragments to land in separate tool calls. Fix: Interleave token processing with delta extraction — process one token through harmony_parser.process(), immediately call extract_harmony_streaming_delta for that token, then process the next token. Merge the per-token deltas into a single DeltaMessage. 2. IndexError crash in autocomplete logic: The unstreamed tool arg tokens autocomplete logic accesses tool_parser.prev_tool_call_arr and tool_parser.streamed_args_for_tool, but harmony models never populate these arrays. Skip this block for harmony models. Signed-off-by: Pradyun Ramadorai <pradyunr@amazon.com> Co-authored-by: Claude

Pradyun92 requested review from DarkLight1337, aarnphm, chaunceyjiang and russellb as code owners March 14, 2026 21:31

mergify bot added frontend gpt-oss Related to GPT-OSS models bug Something isn't working labels Mar 14, 2026

github-project-automation bot added this to gpt-oss Issues & Enhancements Mar 14, 2026

github-project-automation bot moved this to To Triage in gpt-oss Issues & Enhancements Mar 14, 2026

gemini-code-assist bot reviewed Mar 14, 2026

View reviewed changes

Pradyun92 force-pushed the fix/harmony-streaming-tool-call-crash branch from 70cc7ae to f3e7988 Compare March 14, 2026 21:58

Pradyun92 changed the title ~~[Bugfix] Fix harmony model streaming tool call crash (IndexError)~~ [Bugfix] Fix harmony streaming tool call crash and argument splitting Mar 14, 2026

Pradyun92 force-pushed the fix/harmony-streaming-tool-call-crash branch from f3e7988 to 1a71d80 Compare March 14, 2026 22:59

will-deines mentioned this pull request Mar 18, 2026

[Bugfix] Fix Harmony streaming cross-channel delta accumulation #36011

Open

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Bugfix] Fix harmony streaming tool call crash and argument splitting#37070

[Bugfix] Fix harmony streaming tool call crash and argument splitting#37070
Pradyun92 wants to merge 1 commit intovllm-project:mainfrom
Pradyun92:fix/harmony-streaming-tool-call-crash

Pradyun92 commented Mar 14, 2026 •

edited

Loading

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

Pradyun92 commented Mar 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Bug 1: Tool call arguments split across indices

Bug 2: Server crash (IndexError) in autocomplete logic

Test Plan

Test Result

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Pradyun92 commented Mar 14, 2026 •

edited

Loading