Add streaming tool call parsing support#43
Closed
janhilgard wants to merge 4 commits intowaybarrios:mainfrom
Closed
Add streaming tool call parsing support#43janhilgard wants to merge 4 commits intowaybarrios:mainfrom
janhilgard wants to merge 4 commits intowaybarrios:mainfrom
Conversation
Adds support for GLM-4.7 and GLM-4.7-Flash tool calling format: <tool_call>function_name <arg_key>param</arg_key><arg_value>value</arg_value> </tool_call> The parser: - Extracts function name and arguments from GLM47 XML format - Removes <think>...</think> tags from output - Supports streaming tool call detection - Registered as "glm47" and "glm4" parser names Usage: vllm-mlx serve model --enable-auto-tool-choice --tool-call-parser glm47 Based on vLLM's glm47_moe_tool_parser.py implementation. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Implement streaming tool call detection in stream_chat_completion() - Use tool parser's extract_tool_calls_streaming() method when enabled - Buffer content during tool_call generation, emit tool_calls chunk on completion - Add fallback to extract_tool_calls() at stream end for edge cases - Add --enable-auto-tool-choice and --tool-call-parser CLI args to server.py - Add reasoning field to ChatCompletionChunkDelta for streaming reasoning content This enables Cline and other streaming clients to receive proper tool_calls in SSE format instead of raw <tool_call> tags in content. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
|
UP |
Collaborator
Author
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
stream_chat_completion()extract_tool_calls_streaming()method when enabled--enable-auto-tool-choiceand--tool-call-parserCLI args to server.pyreasoningfield toChatCompletionChunkDeltafor streaming reasoning contentProblem
When using
stream: true, the server was sending raw<tool_call>tags as content instead of proper OpenAI-compatibletool_callschunks. This broke Cline and other streaming clients expecting structured tool call responses.Solution
The streaming function now:
extract_tool_calls_streaming()methodNoneduring buffering)tool_callschunk when the tool call is completefinish_reason: "tool_calls"appropriatelyTest plan
🤖 Generated with Claude Code