[Bugfix] Fix GLM tool-call finish chunk suffix alignment in streaming#37845
[Bugfix] Fix GLM tool-call finish chunk suffix alignment in streaming#37845QwertyJack wants to merge 1 commit intovllm-project:mainfrom
Conversation
Keep the GLM parser contract unchanged and fix the finish-chunk suffix calculation in OpenAIServingChat so mixed-whitespace JSON prefixes do not produce malformed final tool argument deltas. Co-authored-by: OpenAI Codex <noreply@openai.com> Signed-off-by: QwertyJack <7554089+QwertyJack@users.noreply.github.com>
There was a problem hiding this comment.
Code Review
This pull request addresses a bug in the streaming finish-chunk logic for GLM tool calls where JSON serialization format inconsistencies could lead to malformed JSON. The fix is implemented in OpenAIServingChat by introducing a more robust method, _compute_remaining_tool_args, to calculate the remaining tool arguments suffix. This new method correctly handles various JSON formatting styles, including compact, default, and mixed-whitespace, by trying multiple parsing strategies. The logic is well-encapsulated and supported by a comprehensive new set of unit tests. The changes appear correct and effectively resolve the issue without altering existing parser contracts.
|
This PR is ready for review. CI reported that it needs the Could a maintainer please add the |
Summary
Fix the streaming finish-chunk logic for GLM tool calls when the parser's streamed JSON prefix does not exactly match
json.dumps(...)formatting.Current
OpenAIServingChatcomputes the final tool-argument delta by serializing the parser state and callingreplace()against the streamed prefix. That is brittle for GLM parsers because the streamed prefix is not always serialized in a single stable style:{"a":1}{"a": 1}When
replace()cannot align the prefix, the finish chunk either re-emits the full object (duplicated-value malformed JSON) or fails to backfill the missing suffix.This PR fixes the suffix computation centrally in
OpenAIServingChatby:json.dumps, and compactjson.dumps(..., separators=(",", ":"))candidates;This keeps the existing GLM parser contract unchanged and fixes the finish-chunk backfill in the serving layer rather than mutating parser state.
Fixes #36857
Tests
Commands run:
Results:
py_compile: passed9 passed, 31 deselected2 passed, 20 deselected