WIP: Fix glm-4.6 tool call streaming parse#11951
Draft
tonylt wants to merge 1 commit intosgl-project:mainfrom
Draft
WIP: Fix glm-4.6 tool call streaming parse#11951tonylt wants to merge 1 commit intosgl-project:mainfrom
tonylt wants to merge 1 commit intosgl-project:mainfrom
Conversation
11bcdbd to
8e1a949
Compare
8e1a949 to
e3faf3d
Compare
|
tonylt 加油~ 在线等修复 |
目前的实现有问题, 例如下面的new_text调用顺序会把工具名称解析成:"read", 其实应该是:"read-file" 1:<tool_call>read |
|
json.loads(prev_args_str) 这种解析key, value节点的规则, 无法满足例如创建一个大文件, value的内容超大就会导致等待一个超大的流节点, 无法满足需求 |
Contributor
|
Maybe consider reuse the streaming xml parser from this pr: #10035. Or a more general streaming xml parser for all kind of LLM which uses xml as tool use template? |
Author
Contributor
|
@tonylt @gaoganlsz I'm trying to fix this issue. Could you take a look? |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Motivation
Summary
I have implemented a fix for GitHub issue #11888 regarding GLM-4.6 tool calls not supporting streaming output for arguments in SGLang.
Problem Analysis
The issue was that GLM-4.6 tool calls were being returned all at once rather than being streamed progressively. The original implementation waited for complete tool calls (until it found </tool_call>) before parsing and streaming them, which caused arguments to appear in a single chunk after a long wait.
Modifications
Solution Implemented
I implemented incremental streaming support for GLM-4.6 tool call arguments by modifying both the Rust and Python implementations:
Key Changes Made:
Files Modified:
Expected Behavior After Fix
With this implementation, GLM-4.6 tool calls now support proper streaming:
Tool name streaming: The function name is streamed first as soon as it's detected
Incremental argument streaming: Arguments are streamed progressively as they are parsed from the XML format
Better user experience: Users will see tool calls building up incrementally rather than waiting for complete tool calls
Testing
I created and ran comprehensive tests that verify:
The fix ensures that GLM-4.6 tool calls now provide the same streaming experience as other model formats in SGLang, addressing the user's concern about better responsiveness and user experience.
Accuracy Tests
Benchmarking and Profiling
Checklist