Skip to content

[Bugfix] Preserve leading/trailing whitespace in GLM non-streaming tool parser#42026

Merged
vllm-bot merged 1 commit into
vllm-project:mainfrom
rishaps:fix-glm-whitespace-strip-string
May 9, 2026
Merged

[Bugfix] Preserve leading/trailing whitespace in GLM non-streaming tool parser#42026
vllm-bot merged 1 commit into
vllm-project:mainfrom
rishaps:fix-glm-whitespace-strip-string

Conversation

@rishaps

@rishaps rishaps commented May 8, 2026

Copy link
Copy Markdown
Contributor

Purpose

GLM's non-streaming tool parser currently calls value.strip() on completed <arg_value> text before parsing it, which for string types can remove important leading or trailing whitespace, causing incorrect parsing results in code-editing tools, diffs etc...

For example, when the model generates:

<tool_call>apply_diff
<arg_key>string</arg_key>
<arg_value>    indented code    </arg_value>
</tool_call>

The parser produces: {"string": "indented code"} and not: {"string": " indented code "}

Fixed this by stripping only non-string values, leaving string whitespace intact.

Test Plan

repro script
.venv/bin/python - <<'PY'
import json
from types import SimpleNamespace

from vllm.entrypoints.openai.chat_completion.protocol import ChatCompletionToolsParam
from vllm.entrypoints.openai.engine.protocol import FunctionDefinition
from vllm.tool_parsers.glm4_moe_tool_parser import Glm4MoeModelToolParser
from vllm.tool_parsers.glm47_moe_tool_parser import Glm47MoeModelToolParser


class Tok:
    def get_vocab(self):
        return {
            "<tool_call>": 1,
            "</tool_call>": 2,
        }


glm4_tools = [
    ChatCompletionToolsParam(
        function=FunctionDefinition(
            name="apply_diff",
            parameters={
                "type": "object",
                "properties": {"s": {"type": "string"}},
                "required": ["s"],
            },
        )
    )
]

glm47_tools = [
    ChatCompletionToolsParam(
        function=FunctionDefinition(
            name="get_weather",
            parameters={
                "type": "object",
                "properties": {"city": {"type": "string"}},
            },
        )
    )
]

cases = [
    (
        "glm4",
        Glm4MoeModelToolParser,
        glm4_tools,
        "s",
        "    indented code    ",
        "<tool_call>apply_diff\n"
        "<arg_key>s</arg_key>\n"
        "<arg_value>    indented code    </arg_value>\n"
        "</tool_call>",
    ),
    (
        "glm47",
        Glm47MoeModelToolParser,
        glm47_tools,
        "city",
        "  Beijing  ",
        "<tool_call>get_weather"
        "<arg_key>city</arg_key>"
        "<arg_value>  Beijing  </arg_value>"
        "</tool_call>",
    ),
]

for name, parser_cls, tools, arg_name, expected, text in cases:
    parser = parser_cls(Tok(), tools=tools)
    request = SimpleNamespace(tools=tools, tool_choice="auto")
    result = parser.extract_tool_calls(text, request=request)
    actual = json.loads(result.tool_calls[0].function.arguments)[arg_name]

    print(f"[{name}]")
    print("expected string:", repr(expected))
    print("actual string:", repr(actual))
    print()
PY

Unit tests:

.venv/bin/python -m pytest tests/tool_parsers/test_glm4_moe_tool_parser.py tests/tool_parsers/test_glm47_moe_tool_parser.py -v

Test Result

Before:

[glm4]
expected string: '    indented code    '
actual string: 'indented code'

[glm47]
expected string: '  Beijing  '
actual string: 'Beijing'

After:

[glm4]
expected string: '    indented code    '
actual string:   '    indented code    '
[glm47]
expected string: '  Beijing  '
actual string:   '  Beijing  '

Signed-off-by: Rishapveer Singh <singhrishapveer@gmail.com>

@claude claude Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Claude Code Review

This pull request is from a fork — automated review is disabled. A repository maintainer can comment @claude review to run a one-time review.

@mergify mergify Bot added tool-calling bug Something isn't working labels May 8, 2026
@mergify

mergify Bot commented May 8, 2026

Copy link
Copy Markdown
Contributor

Hi @rishaps, the pre-commit checks have failed. Please run:

uv pip install pre-commit>=4.5.1
pre-commit install
pre-commit run --all-files

Then, commit the changes and push to your branch.

For future commits, pre-commit will run automatically on changed files before each commit.

Tip

Is mypy failing?
mypy is run differently in CI. If the failure is related to this check, please use the following command to run it locally:
# For mypy (substitute "3.10" with the failing version if needed)
pre-commit run --hook-stage manual mypy-3.10

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request modifies the Glm4MoeModelToolParser to preserve leading and trailing whitespace for tool call arguments identified as string types, while continuing to strip and deserialize non-string arguments. This ensures that whitespace-sensitive data, such as indented code, is correctly processed. New test cases have been added to test_glm47_moe_tool_parser.py and test_glm4_moe_tool_parser.py to verify this behavior. There are no review comments to address, and I have no further feedback to provide.

@bbrowning bbrowning left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You made this one very easy to understand why we need this, why I should review it, and why I should approve it. Thank you!

The tests look good and I pulled them locally and they pass with this fix. I confirmed this is already what happens in the streaming path and this is just getting the non-streaming path to match that behavior.

@bbrowning bbrowning added the ready ONLY add when PR is ready to merge/full CI is needed label May 8, 2026
@bbrowning

Copy link
Copy Markdown
Collaborator

One thing I noticed when reviewing this that I think is out of scope for this PR, but probably worth investigating and/or opening a new PR for, is that I believe the _is_string_type logic in glm4_moe_tool_parser.py doesn't handle the case of nullable strings, like a JSON schema with a type of ["string", "null"].

There may be other PRs to fix that already - I haven't searched yet. But, that would potentially be another related and self-contained fix in this area if you're up for more.

Thanks for the contribution!

@bbrowning

Copy link
Copy Markdown
Collaborator

I see #40197 has a fix for this string type detection mixed in with a few other things, so I'm going to ask that author to split that PR up a bit so we can get that bit definitively fixed without the other changes in 40197 that need more view.

@rishaps

rishaps commented May 9, 2026

Copy link
Copy Markdown
Contributor Author

Need a force merge @DarkLight1337. Failures seems unrelated.

@vllm-bot vllm-bot merged commit f6490a2 into vllm-project:main May 9, 2026
55 of 63 checks passed
@rishaps rishaps deleted the fix-glm-whitespace-strip-string branch May 9, 2026 04:52
weifang231 pushed a commit to weifang231/eb-vllm that referenced this pull request May 13, 2026
…ol parser (vllm-project#42026)

Signed-off-by: Rishapveer Singh <singhrishapveer@gmail.com>
mfylcek pushed a commit to mfylcek/vllm that referenced this pull request May 19, 2026
…ol parser (vllm-project#42026)

Signed-off-by: Rishapveer Singh <singhrishapveer@gmail.com>
jhu960213 pushed a commit to jhu960213/vllm that referenced this pull request May 20, 2026
…ol parser (vllm-project#42026)

Signed-off-by: Rishapveer Singh <singhrishapveer@gmail.com>
mvanhorn pushed a commit to mvanhorn/vllm that referenced this pull request Jun 4, 2026
…ol parser (vllm-project#42026)

Signed-off-by: Rishapveer Singh <singhrishapveer@gmail.com>
Signed-off-by: Matt Van Horn <455140+mvanhorn@users.noreply.github.com>
knight0528 pushed a commit to knight0528/vllm that referenced this pull request Jun 8, 2026
…ol parser (vllm-project#42026)

Signed-off-by: Rishapveer Singh <singhrishapveer@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working ready ONLY add when PR is ready to merge/full CI is needed tool-calling

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

3 participants