Skip to content

[Bugfix] Fix GLM tool-call finish chunk suffix alignment in streaming#37845

Open
QwertyJack wants to merge 1 commit intovllm-project:mainfrom
QwertyJack:fix/glm-tool-finish-suffix-alignment
Open

[Bugfix] Fix GLM tool-call finish chunk suffix alignment in streaming#37845
QwertyJack wants to merge 1 commit intovllm-project:mainfrom
QwertyJack:fix/glm-tool-finish-suffix-alignment

Conversation

@QwertyJack
Copy link
Copy Markdown
Contributor

@QwertyJack QwertyJack commented Mar 23, 2026

Summary

Fix the streaming finish-chunk logic for GLM tool calls when the parser's streamed JSON prefix does not exactly match json.dumps(...) formatting.

Current OpenAIServingChat computes the final tool-argument delta by serializing the parser state and calling replace() against the streamed prefix. That is brittle for GLM parsers because the streamed prefix is not always serialized in a single stable style:

  • some runs are compact, e.g. {"a":1}
  • some are default-spaced, e.g. {"a": 1}
  • some GLM outputs are mixed, e.g. the outer object is compact but nested objects keep default whitespace

When replace() cannot align the prefix, the finish chunk either re-emits the full object (duplicated-value malformed JSON) or fails to backfill the missing suffix.

This PR fixes the suffix computation centrally in OpenAIServingChat by:

  1. trying exact prefix matches against string-backed, default json.dumps, and compact json.dumps(..., separators=(",", ":")) candidates;
  2. falling back to whitespace-minified prefix comparison for mixed-whitespace GLM fragments;
  3. returning an empty suffix if nothing aligns, instead of re-emitting the whole object.

This keeps the existing GLM parser contract unchanged and fixes the finish-chunk backfill in the serving layer rather than mutating parser state.

Fixes #36857

Tests

Commands run:

python -m py_compile \
  vllm/entrypoints/openai/chat_completion/serving.py \
  tests/entrypoints/openai/chat_completion/test_serving_chat.py

PYTHONPATH=/tmp/vllm_pydeps:/tmp:$PYTHONPATH \
pytest -p skip_cleanup_plugin \
  tests/entrypoints/openai/chat_completion/test_serving_chat.py \
  -k "TestCreateRemainingArgsDelta or TestComputeRemainingToolArgs" -q

PYTHONPATH=/tmp:$PYTHONPATH \
pytest -p skip_cleanup_plugin \
  tests/tool_parsers/test_glm4_moe_tool_parser.py \
  -k "streaming_prev_tool_call_arr_updates or streaming_multiple_tool_calls_sequential" -q

Results:

  • py_compile: passed
  • serving tests: 9 passed, 31 deselected
  • GLM parser contract tests: 2 passed, 20 deselected

Keep the GLM parser contract unchanged and fix the finish-chunk suffix calculation in OpenAIServingChat so mixed-whitespace JSON prefixes do not produce malformed final tool argument deltas.

Co-authored-by: OpenAI Codex <noreply@openai.com>
Signed-off-by: QwertyJack <7554089+QwertyJack@users.noreply.github.com>
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request addresses a bug in the streaming finish-chunk logic for GLM tool calls where JSON serialization format inconsistencies could lead to malformed JSON. The fix is implemented in OpenAIServingChat by introducing a more robust method, _compute_remaining_tool_args, to calculate the remaining tool arguments suffix. This new method correctly handles various JSON formatting styles, including compact, default, and mixed-whitespace, by trying multiple parsing strategies. The logic is well-encapsulated and supported by a comprehensive new set of unit tests. The changes appear correct and effectively resolve the issue without altering existing parser contracts.

@QwertyJack
Copy link
Copy Markdown
Contributor Author

This PR is ready for review.

CI reported that it needs the ready label (or an author with at least 4 merged PRs) before the full workflow can proceed. I don't have permission to add labels on vllm-project/vllm from this fork.

Could a maintainer please add the ready label to trigger the gated CI path?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working frontend

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug]: The arguments invoked by the tool in the GLM-5 streaming output cannot be parsed into the JSON format.

1 participant