Fix GLM-4.6v flash tool calling in transformers 5.x by baonudesifeizhai · Pull Request #31622 · vllm-project/vllm

baonudesifeizhai · 2026-01-02T09:21:07Z

Purpose

Test Plan

export CUDA_VISIBLE_DEVICES=0,1,2,3

vllm serve "zai-org/GLM-4.6V-Flash" \
  --tensor-parallel-size 4 \
  --enable-auto-tool-choice \
  --tool-call-parser glm45 \
  --limit-mm-per-prompt '{"image": 1, "video": 0}' \
  --trust-remote-code \
  --host 0.0.0.0 \
  --port 8000

cat > repro_glm46v_toolcall.py <<'EOF'
import base64, json, requests

def b64_image(path: str) -> str:
    data = open(path, "rb").read()
    return base64.b64encode(data).decode("utf-8")

img_b64 = b64_image("test.jpg")

payload = {
  "model": "zai-org/GLM-4.6V-Flash",
  "temperature": 0.2,
  "max_tokens": 256,
  "tools": [
    {
      "type": "function",
      "function": {
        "name": "get_weather",
        "description": "Get weather for a city",
        "parameters": {
          "type": "object",
          "properties": {"location": {"type": "string"}},
          "required": ["location"]
        }
      }
    }
  ],
  "tool_choice": "auto",
  "messages": [
    {
      "role": "user",
      "content": [
        {"type": "text", "text": "Look at the image, then call get_weather for Tokyo. Only call the tool; do not answer in text."},
        {"type": "image_url", "image_url": {"url": f"data:image/jpeg;base64,{img_b64}"}}
      ]
    }
  ]
}

r = requests.post("http://127.0.0.1:8000/v1/chat/completions", json=payload, timeout=300)
print("status:", r.status_code)
print(json.dumps(r.json(), indent=2, ensure_ascii=False))
EOF

python repro_glm46v_toolcall.py

main branch

python repro_glm46v_toolcall.py
status: 200
{
  "id": "chatcmpl-86e1c31493b07575",
  "object": "chat.completion",
  "created": 1767344497,
  "model": "zai-org/GLM-4.6V-Flash",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "Got it, let's see. The user wants me to call the get_weather function for Tokyo. The image is of a cat, but the instruction is to look at the image then call the tool for Tokyo. So I need to proceed with that. The function call should be get_weather with location \"Tokyo\".\nI will now call the get_weather function for Tokyo.\nget_weather\nlocation\nTokyo\n",
        "refusal": null,
        "annotations": null,
        "audio": null,
        "function_call": null,
        "tool_calls": [],
        "reasoning": null,
        "reasoning_content": null
      },
      "logprobs": null,
      "finish_reason": "stop",
      "stop_reason": 151338,
      "token_ids": null
    }
  ],
  "service_tier": null,
  "system_fingerprint": null,
  "usage": {
    "prompt_tokens": 3436,
    "total_tokens": 3529,
    "completion_tokens": 93,
    "prompt_tokens_details": null
  },
  "prompt_logprobs": null,
  "prompt_token_ids": null,
  "kv_transfer_params": null
}

this branch

python repro_glm46v_toolcall.py
status: 200
{
 "id": "chatcmpl-b7ee6c5807040746",
 "object": "chat.completion",
 "created": 1767345552,
 "model": "zai-org/GLM-4.6V-Flash",
 "choices": [
   {
     "index": 0,
     "message": {
       "role": "assistant",
       "content": "<think>Got it, let's see. The user wants me to call the get_weather function for Tokyo. The image is of a cat, but the instruction is to look at the image then call the tool for Tokyo. So I need to proceed with that. The function call should be get_weather with location \"Tokyo\".</think>\nI will now call the get_weather function for Tokyo.\n",
       "refusal": null,
       "annotations": null,
       "audio": null,
       "function_call": null,
       "tool_calls": [
         {
           "id": "chatcmpl-tool-a7572587686f09fd",
           "type": "function",
           "function": {
             "name": "get_weather",
             "arguments": "{\"location\": \"Tokyo\"}"
           }
         }
       ],
       "reasoning": null,
       "reasoning_content": null
     },
     "logprobs": null,
     "finish_reason": "tool_calls",
     "stop_reason": 151338,
     "token_ids": null
   }
 ],
 "service_tier": null,
 "system_fingerprint": null,
 "usage": {
   "prompt_tokens": 3436,
   "total_tokens": 3529,
   "completion_tokens": 93,
   "prompt_tokens_details": null
 },
 "prompt_logprobs": null,
 "prompt_token_ids": null,
 "kv_transfer_params": null
}

Essential Elements of an Effective PR Description Checklist

The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
The test plan, such as providing test command.
The test results, such as pasting the results comparison before and after, or e2e results
(Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
(Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

Signed-off-by: baonudesifeizhai <baonudesifeizhai@gmail.com>

mergify · 2026-01-02T09:21:46Z

Documentation preview: https://vllm--31622.org.readthedocs.build/en/31622/

gemini-code-assist

Code Review

This pull request fixes an issue with tool calling for GLM-4.6v models by ensuring tool call tokens are not skipped during decoding. The core logic of setting skip_special_tokens=False is correct. However, I've identified a potential high-severity issue where the call to super().adjust_request() could enable structured output generation, which conflicts with this model's text-based tool call format and could break tool calling for certain tool_choice options. I have provided a suggestion to rectify this.

vllm/tool_parsers/glm4_moe_tool_parser.py

chaunceyjiang

thanks~

Signed-off-by: baonudesifeizhai <baonudesifeizhai@gmail.com>

Signed-off-by: baonudesifeizhai <baonudesifeizhai@gmail.com> Signed-off-by: dsuhinin <suhinin.dmitriy@gmail.com>

Signed-off-by: baonudesifeizhai <baonudesifeizhai@gmail.com>

Audited recent tool parser bug-fix PRs and found that several landed without corresponding test coverage. Added unit tests for each fix to prevent regressions. - Mistral: fast detokenization text detection (PR vllm-project#37209) - Qwen3Coder: malformed XML crash, anyOf double-encoding, speculative decode streaming (PRs vllm-project#36774, vllm-project#36032, vllm-project#35615) - DeepSeekV32: delimiter preservation with fast detokenization, skip_special_tokens adjustment (PR vllm-project#33964) - GLM-4 MoE: zero-argument tool calls, transformers 5.x delimiter handling, Unicode character preservation (PRs vllm-project#32321, vllm-project#31622, vllm-project#30920) - MiniMax M2: anyOf nullable parameter handling for non-null and null values (PR vllm-project#32342) - Step3p5: MTP-style variable-chunk and multi-token streaming (PR vllm-project#33690) - Kimi K2: native tool call ID extraction and multi-turn ID continuity (PR vllm-project#32768) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> Signed-off-by: Ben Browning <bbrownin@redhat.com>

fix

bae6863

Signed-off-by: baonudesifeizhai <baonudesifeizhai@gmail.com>

baonudesifeizhai requested review from aarnphm and chaunceyjiang as code owners January 2, 2026 09:21

mergify bot added documentation Improvements or additions to documentation tool-calling labels Jan 2, 2026

github-project-automation bot added this to Tool Calling Jan 2, 2026

gemini-code-assist bot reviewed Jan 2, 2026

View reviewed changes

vllm/tool_parsers/glm4_moe_tool_parser.py Show resolved Hide resolved

chaunceyjiang approved these changes Jan 4, 2026

View reviewed changes

chaunceyjiang added the ready ONLY add when PR is ready to merge/full CI is needed label Jan 4, 2026

vllm-bot merged commit 02dbb93 into vllm-project:main Jan 5, 2026
47 of 51 checks passed

github-project-automation bot moved this to Done in Tool Calling Jan 5, 2026

LucasWilkinson pushed a commit to neuralmagic/vllm that referenced this pull request Jan 6, 2026

Fix GLM-4.6v flash tool calling in transformers 5.x (vllm-project#31622)

c5f580d

Signed-off-by: baonudesifeizhai <baonudesifeizhai@gmail.com>

yugong333 pushed a commit to yugong333/vllm that referenced this pull request Jan 9, 2026

Fix GLM-4.6v flash tool calling in transformers 5.x (vllm-project#31622)

14b0611

Signed-off-by: baonudesifeizhai <baonudesifeizhai@gmail.com>

akh64bit pushed a commit to akh64bit/vllm that referenced this pull request Jan 16, 2026

Fix GLM-4.6v flash tool calling in transformers 5.x (vllm-project#31622)

2672e46

Signed-off-by: baonudesifeizhai <baonudesifeizhai@gmail.com>

dsuhinin pushed a commit to dsuhinin/vllm that referenced this pull request Jan 21, 2026

Fix GLM-4.6v flash tool calling in transformers 5.x (vllm-project#31622)

ae270f7

Signed-off-by: baonudesifeizhai <baonudesifeizhai@gmail.com> Signed-off-by: dsuhinin <suhinin.dmitriy@gmail.com>

ItzDEXX pushed a commit to ItzDEXX/vllm that referenced this pull request Feb 19, 2026

Fix GLM-4.6v flash tool calling in transformers 5.x (vllm-project#31622)

3ec9e1b

Signed-off-by: baonudesifeizhai <baonudesifeizhai@gmail.com>

bbrowning mentioned this pull request Mar 26, 2026

[Misc] Add 20 regression tests for 11 tool parser bug fixes #38172

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix GLM-4.6v flash tool calling in transformers 5.x#31622

Fix GLM-4.6v flash tool calling in transformers 5.x#31622
vllm-bot merged 1 commit intovllm-project:mainfrom
baonudesifeizhai:fixglm4.6toolcall

baonudesifeizhai commented Jan 2, 2026 •

edited by github-actions bot

Loading

Uh oh!

mergify bot commented Jan 2, 2026

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

Uh oh!

chaunceyjiang left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

Conversation

baonudesifeizhai commented Jan 2, 2026 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Test Plan

Uh oh!

mergify bot commented Jan 2, 2026

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

chaunceyjiang left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

baonudesifeizhai commented Jan 2, 2026 •

edited by github-actions bot

Loading