Skip to content

fix: handle Harmony <|call|> EOG token for GPT-OSS tool calling#1812

Merged
gianni-cor merged 11 commits into
tetherto:mainfrom
dev-nid:fix/gpt-oss-harmony-tool-call-eog-break
Apr 30, 2026
Merged

fix: handle Harmony <|call|> EOG token for GPT-OSS tool calling#1812
gianni-cor merged 11 commits into
tetherto:mainfrom
dev-nid:fix/gpt-oss-harmony-tool-call-eog-break

Conversation

@dev-nid

@dev-nid dev-nid commented Apr 29, 2026

Copy link
Copy Markdown
Contributor

Summary

GPT-OSS 20B tool calling was broken — the model emits a Harmony tool call frame but generation stops silently before <|call|> reaches the SDK, resulting in 0 parsed tool calls and no visible frame delimiter.

Root cause

<|call|> (token 200012) is in the model's EOG set. When sampled, it renders as 0 bytes (control token with special=false) and immediately triggers the generation loop break. The model produces the tool call JSON but the output is truncated with no frame boundary visible to the SDK.

Fix

  • Detect Harmony/GPT-OSS architecture at initialization and resolve the <|call|> token ID
  • In the generation loop, intercept <|call|> before the generic EOG break when isHarmonyModel_ and params_.use_jinja are true
  • Render <|call|> as visible text using common_token_to_piece(lctx_, tokenId, true) so the SDK can parse frame boundaries
  • Stop generation cleanly — Harmony is turn-based, one tool call per pass, SDK handles tool execution and re-prompts
  • Applied to both TextLlmContext.cpp and MtmdLlmContext.cpp
  • Added multi-turn example (harmonyMultiTurnTools.js) demonstrating sequential tool execution across turns
  • Confirmed parallel tool calling is not supported — model emits exactly one <|call|> per generation regardless of prompt

dev-nid added 2 commits April 29, 2026 18:18
GPT-OSS models use <|call|> as a frame delimiter in Harmony tool-call
protocol. This token is in the EOG set, causing generation to stop
silently before tool calls reach the SDK.

Add Harmony model detection and <|call|>-specific handling in the
generation loop: render the token as visible text (special=true) so the
SDK can parse frame boundaries, then stop generation for the turn-based
tool execution protocol.
@dev-nid dev-nid requested review from a team as code owners April 29, 2026 16:30
jesusmb1995
jesusmb1995 previously approved these changes Apr 30, 2026
@gianni-cor

Copy link
Copy Markdown
Contributor

/review

@gianni-cor

Copy link
Copy Markdown
Contributor

/review

@gianni-cor

Copy link
Copy Markdown
Contributor

/review

aegioscy
aegioscy previously approved these changes Apr 30, 2026
@gianni-cor

Copy link
Copy Markdown
Contributor

/review

@gianni-cor

Copy link
Copy Markdown
Contributor

/review

@gianni-cor

Copy link
Copy Markdown
Contributor

/review

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants