Skip to content

fix(server): integrate tool call parser into reasoning parser streaming path#253

Merged
waybarrios merged 2 commits intowaybarrios:mainfrom
mxl:fix/reasoning-parser-tool-call-streaming
Apr 11, 2026
Merged

fix(server): integrate tool call parser into reasoning parser streaming path#253
waybarrios merged 2 commits intowaybarrios:mainfrom
mxl:fix/reasoning-parser-tool-call-streaming

Conversation

@mxl
Copy link
Copy Markdown
Contributor

@mxl mxl commented Apr 4, 2026

Running Qwen 3.5 model results in printing tool calls as raw xml and not applying actual tools.

Summary

  • run the streaming tool-call parser in the reasoning-parser path after reasoning content has been stripped from streamed output
  • suppress tool-call markup from normal content chunks and emit structured tool_calls chunks when parsed tool calls are detected
  • preserve the tool_calls finish reason on the final streamed chunk when generation ends with parsed tool calls

@mxl mxl force-pushed the fix/reasoning-parser-tool-call-streaming branch 7 times, most recently from 4bff328 to 89303c8 Compare April 4, 2026 07:13
@mxl mxl force-pushed the fix/reasoning-parser-tool-call-streaming branch from 89303c8 to 1ce4107 Compare April 4, 2026 07:40
Copy link
Copy Markdown
Owner

@waybarrios waybarrios left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Clean fix for a real bug — tool calls were leaking as raw XML when the reasoning parser was active. The approach is correct and tests are solid.

One minor bug:

request.model vs _model_name inconsistency

The new tool_chunk in the reasoning path uses model=request.model, but every other chunk in the function (including the equivalent tool_calls chunk in the standard path) uses model=_model_name:

# New code (reasoning path):
tool_chunk = ChatCompletionChunk(
    id=response_id,
    model=request.model,      # ← client-provided value
    ...
)

# Standard path (line ~2252) and all other chunks:
chunk = ChatCompletionChunk(
    id=response_id,
    model=_model_name,         # ← actual served model name
    ...
)

These can differ when --served-model-name is set. Should be _model_name for consistency.

@waybarrios
Copy link
Copy Markdown
Owner

Pushed a small fix (1d16507) — was using request.model instead of _model_name in the reasoning tool chunk. Consistent with the rest of the function now.

@waybarrios waybarrios merged commit 660552e into waybarrios:main Apr 11, 2026
7 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants