[Bugfix] Fix GLM tool-call finish chunk suffix alignment in streaming by QwertyJack · Pull Request #37845 · vllm-project/vllm

QwertyJack · 2026-03-23T03:33:42Z

Summary

Fix the streaming finish-chunk logic for GLM tool calls when the parser's streamed JSON prefix does not exactly match json.dumps(...) formatting.

Current OpenAIServingChat computes the final tool-argument delta by serializing the parser state and calling replace() against the streamed prefix. That is brittle for GLM parsers because the streamed prefix is not always serialized in a single stable style:

some runs are compact, e.g. {"a":1}
some are default-spaced, e.g. {"a": 1}
some GLM outputs are mixed, e.g. the outer object is compact but nested objects keep default whitespace

When replace() cannot align the prefix, the finish chunk either re-emits the full object (duplicated-value malformed JSON) or fails to backfill the missing suffix.

This PR fixes the suffix computation centrally in OpenAIServingChat by:

trying exact prefix matches against string-backed, default json.dumps, and compact json.dumps(..., separators=(",", ":")) candidates;
falling back to whitespace-minified prefix comparison for mixed-whitespace GLM fragments;
returning an empty suffix if nothing aligns, instead of re-emitting the whole object.

This keeps the existing GLM parser contract unchanged and fixes the finish-chunk backfill in the serving layer rather than mutating parser state.

Fixes #36857

Tests

Commands run:

python -m py_compile \
  vllm/entrypoints/openai/chat_completion/serving.py \
  tests/entrypoints/openai/chat_completion/test_serving_chat.py

PYTHONPATH=/tmp/vllm_pydeps:/tmp:$PYTHONPATH \
pytest -p skip_cleanup_plugin \
  tests/entrypoints/openai/chat_completion/test_serving_chat.py \
  -k "TestCreateRemainingArgsDelta or TestComputeRemainingToolArgs" -q

PYTHONPATH=/tmp:$PYTHONPATH \
pytest -p skip_cleanup_plugin \
  tests/tool_parsers/test_glm4_moe_tool_parser.py \
  -k "streaming_prev_tool_call_arr_updates or streaming_multiple_tool_calls_sequential" -q

Results:

py_compile: passed
serving tests: 9 passed, 31 deselected
GLM parser contract tests: 2 passed, 20 deselected

Keep the GLM parser contract unchanged and fix the finish-chunk suffix calculation in OpenAIServingChat so mixed-whitespace JSON prefixes do not produce malformed final tool argument deltas. Co-authored-by: OpenAI Codex <noreply@openai.com> Signed-off-by: QwertyJack <7554089+QwertyJack@users.noreply.github.com>

gemini-code-assist

Code Review

This pull request addresses a bug in the streaming finish-chunk logic for GLM tool calls where JSON serialization format inconsistencies could lead to malformed JSON. The fix is implemented in OpenAIServingChat by introducing a more robust method, _compute_remaining_tool_args, to calculate the remaining tool arguments suffix. This new method correctly handles various JSON formatting styles, including compact, default, and mixed-whitespace, by trying multiple parsing strategies. The logic is well-encapsulated and supported by a comprehensive new set of unit tests. The changes appear correct and effectively resolve the issue without altering existing parser contracts.

QwertyJack · 2026-03-23T04:13:24Z

This PR is ready for review.

CI reported that it needs the ready label (or an author with at least 4 merged PRs) before the full workflow can proceed. I don't have permission to add labels on vllm-project/vllm from this fork.

Could a maintainer please add the ready label to trigger the gated CI path?

QwertyJack mentioned this pull request Mar 23, 2026

[Bug]: The arguments invoked by the tool in the GLM-5 streaming output cannot be parsed into the JSON format. #36857

Open

1 task

mergify bot added frontend bug Something isn't working labels Mar 23, 2026

gemini-code-assist bot reviewed Mar 23, 2026

View reviewed changes

QwertyJack marked this pull request as ready for review March 23, 2026 04:11

QwertyJack requested review from DarkLight1337, NickLucche, aarnphm, chaunceyjiang, robertgshaw2-redhat and russellb as code owners March 23, 2026 04:11

nejch mentioned this pull request Mar 29, 2026

[Feature]: GLM-4.6 spec decode with mtp #25993

Open

1 task

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Bugfix] Fix GLM tool-call finish chunk suffix alignment in streaming#37845

[Bugfix] Fix GLM tool-call finish chunk suffix alignment in streaming#37845
QwertyJack wants to merge 1 commit intovllm-project:mainfrom
QwertyJack:fix/glm-tool-finish-suffix-alignment

QwertyJack commented Mar 23, 2026 •

edited

Loading

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

QwertyJack commented Mar 23, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

QwertyJack commented Mar 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Tests

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

QwertyJack commented Mar 23, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

QwertyJack commented Mar 23, 2026 •

edited

Loading