[Bugfix] Preserve leading/trailing whitespace in GLM non-streaming tool parser#42026
Conversation
|
Hi @rishaps, the pre-commit checks have failed. Please run: uv pip install pre-commit>=4.5.1
pre-commit install
pre-commit run --all-filesThen, commit the changes and push to your branch. For future commits, Tip Is
|
There was a problem hiding this comment.
Code Review
This pull request modifies the Glm4MoeModelToolParser to preserve leading and trailing whitespace for tool call arguments identified as string types, while continuing to strip and deserialize non-string arguments. This ensures that whitespace-sensitive data, such as indented code, is correctly processed. New test cases have been added to test_glm47_moe_tool_parser.py and test_glm4_moe_tool_parser.py to verify this behavior. There are no review comments to address, and I have no further feedback to provide.
bbrowning
left a comment
There was a problem hiding this comment.
You made this one very easy to understand why we need this, why I should review it, and why I should approve it. Thank you!
The tests look good and I pulled them locally and they pass with this fix. I confirmed this is already what happens in the streaming path and this is just getting the non-streaming path to match that behavior.
|
One thing I noticed when reviewing this that I think is out of scope for this PR, but probably worth investigating and/or opening a new PR for, is that I believe the There may be other PRs to fix that already - I haven't searched yet. But, that would potentially be another related and self-contained fix in this area if you're up for more. Thanks for the contribution! |
|
I see #40197 has a fix for this string type detection mixed in with a few other things, so I'm going to ask that author to split that PR up a bit so we can get that bit definitively fixed without the other changes in 40197 that need more view. |
|
Need a force merge @DarkLight1337. Failures seems unrelated. |
…ol parser (vllm-project#42026) Signed-off-by: Rishapveer Singh <singhrishapveer@gmail.com>
…ol parser (vllm-project#42026) Signed-off-by: Rishapveer Singh <singhrishapveer@gmail.com>
…ol parser (vllm-project#42026) Signed-off-by: Rishapveer Singh <singhrishapveer@gmail.com>
…ol parser (vllm-project#42026) Signed-off-by: Rishapveer Singh <singhrishapveer@gmail.com> Signed-off-by: Matt Van Horn <455140+mvanhorn@users.noreply.github.com>
…ol parser (vllm-project#42026) Signed-off-by: Rishapveer Singh <singhrishapveer@gmail.com>
Purpose
GLM's non-streaming tool parser currently calls
value.strip()on completed<arg_value>text before parsing it, which for string types can remove important leading or trailing whitespace, causing incorrect parsing results in code-editing tools, diffs etc...For example, when the model generates:
The parser produces:
{"string": "indented code"}and not:{"string": " indented code "}Fixed this by stripping only non-string values, leaving string whitespace intact.
Test Plan
repro script
Unit tests:
Test Result
Before:
After: