Skip to content

[Bugfix] Fix Gemma4 tool call parser using vocab key instead of decoded token string#44532

Open
pens-u wants to merge 1 commit into
vllm-project:mainfrom
pens-u:fix/gemma4-tool-call-token-decode-mismatch
Open

[Bugfix] Fix Gemma4 tool call parser using vocab key instead of decoded token string#44532
pens-u wants to merge 1 commit into
vllm-project:mainfrom
pens-u:fix/gemma4-tool-call-token-decode-mismatch

Conversation

@pens-u
Copy link
Copy Markdown

@pens-u pens-u commented Jun 4, 2026

Why this is not a duplicate

Checked existing open PRs against issue #44522 and the search terms
gemma4 tool_call_start token decode:

None address the tokenizer key/decode-string mismatch described in #44522.

Problem

Some Gemma4 tokenizer builds store the canonical vocabulary key
<|tool_call> (12 chars) in get_vocab() but the HuggingFace fast
tokenizer's added_tokens_decoder[id].content path produces a
different string — e.g. <|tool_call|> (13 chars) — when the same
token ID is decoded. vllm.v1.engine.detokenizer.FastIncrementalDetokenizer
uses tok.content for output text, so output.text and therefore
current_text in the streaming parser contains <|tool_call|>, not
<|tool_call>.

The streaming guard:

if self.tool_call_start_token not in current_text:   # always True!
    return DeltaMessage(content=delta_text)           # entire call leaks

always fired, leaking the full raw tool-call block
(<|tool_call|>call:func{…}<tool_call|>) as plain assistant content.
The same mismatch affects the string-quoting delimiter <|"|>, causing
garbled argument values when tool calls were detected.

Observed output (from issue #44522):

<|tool_call|>call:preview_url{explanation:<|">…<|">,url:<|">…<|">}<tool_call|>

Fix

At construction time, call
tokenizer.decode([token_id], skip_special_tokens=False) for the
start token, end token, and string delimiter to detect the actual
decoded form
. All text-matching paths (guard check, _buffer_delta_text,
regex, count-based phase tracking, argument parsing) now use the
detected form instead of the module-level constant.

Key changes in vllm/tool_parsers/gemma4_tool_parser.py:

  • _decoded_token_str(tokenizer, token_id, fallback) — decodes a
    single token; guards for isinstance(str) so mock tokenizers in
    tests fall back to the constant without breaking
  • Gemma4ToolParser.__init__ stores self.tool_call_start_token,
    self.tool_call_end_token, self.string_delim from the tokenizer
  • _buffer_delta_text uses the instance attributes
  • self.tool_call_regex rebuilt from the detected strings via
    re.escape
  • _parse_gemma4_args / _parse_gemma4_array accept a
    string_delim kwarg (default = STRING_DELIM constant) so the
    correct delimiter propagates through all recursive calls

Test plan

# Install deps (first time only)
uv pip install pytest pytest-timeout

# Run the full Gemma4 tool parser suite (54 tests: 51 original + 3 new)
uv run python -m pytest tests/tool_parsers/test_gemma4_tool_parser.py -v

All 54 pass. The three new tests in TestAltTokenStrings directly
reproduce the tokenizer mismatch by wiring mock.decode.side_effect to
return <|tool_call|> and <|"> and asserting:

  1. parser.tool_call_start_token uses the decoded form, not the vocab key
  2. Streaming emits zero content deltas (no leakage)
  3. Non-streaming extract_tool_calls correctly parses the tool call

AI assistance disclosure: This fix was developed with Claude Code
(Anthropic). Every changed line has been reviewed by the submitter.

🤖 Generated with Claude Code

…ed token string (vllm-project#44522)

Some Gemma4 tokenizer builds store the canonical vocabulary key
`<|tool_call>` in `get_vocab()` but produce a different string
(e.g. `<|tool_call|>`) when the same token ID is decoded via the
fast-tokenizer's `added_tokens_decoder[id].content` path.  The
streaming guard check

    if self.tool_call_start_token not in current_text:
        return DeltaMessage(content=delta_text)

used the vocabulary key, so it never matched the decoded form in
`current_text`, causing the entire tool-call block to leak as plain
content.  The same mismatch affects the string-quoting delimiter
(`STRING_DELIM`), which could produce garbled argument values.

Fix: at construction time, call `tokenizer.decode([token_id],
skip_special_tokens=False)` for the start token, end token, and
string delimiter to detect the actual decoded form.  All subsequent
text-matching (guard check, buffer logic, regex, count-based phase
tracking, argument parsing) uses the detected form instead of the
module-level constant.

Changes:
- Add `_decoded_token_str()` helper with str-type guard and fallback
- `Gemma4ToolParser.__init__` detects `tool_call_start_token`,
  `tool_call_end_token`, and `string_delim` from the tokenizer
- `_buffer_delta_text` uses instance attributes instead of constants
- Regex rebuilt from detected strings via `re.escape`
- `_parse_gemma4_args` / `_parse_gemma4_array` accept a
  `string_delim` kwarg (default = `STRING_DELIM` constant) so all
  parsing paths honour the tokenizer-specific delimiter
- 3 new regression tests in `TestAltTokenStrings` cover the
  alternate-token-string scenario end-to-end

Fixes vllm-project#44522

Co-authored-by: Claude <noreply@anthropic.com>
Signed-off-by: Priyanshu Kalal <priyanshu@zettabolt.com>
@github-actions
Copy link
Copy Markdown

github-actions Bot commented Jun 4, 2026

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

PRs do not trigger a full CI run by default. Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

If you have any questions, please reach out to us on Slack at https://slack.vllm.ai.

Agent Guidelines

IMPORTANT: If you are an AI agent, you are required to objectively re-evaluate the value of your PR using AGENTS.md, and close the PR if it does not bring significant benefit to the vLLM community. Failure to do so may result in an immediate ban.

🚀

@elseyu
Copy link
Copy Markdown

elseyu commented Jun 5, 2026

Thanks for the quick fix! Apologies for the confusion—I realized I made a mistake in my previous description regarding the exact tokens. The actual raw output from the client is a bit different from what I initially stated.

Instead of <|tool_call|>, the engine actually outputs <|tool_call> at the start, and <tool_call|> at the end.

Here is the exact behavior we observed:

Observed Behavior (Raw Client Output received):

<|tool_call>call:Bash{command:<|"|>/Users/john_doe/.workbuddy/binaries/python/versions/3.13.12/bin/python3 generate_dashboards_v2.py<|"|>,description:<|"|>Execute the updated dashboard generation script.<|"|>}<tool_call|>

Or for bash/array commands:

<|tool_call>call:deliver_attachments{attachments:<|"|>["/Users/john_doe/WorkBuddy/2026-06-02-14-11-40/Monitoring_Dashboard_v2.html"]<|"|>,explanation:<|"|>Delivering the fixed v2 monitoring dashboard.<|"|>}<tool_call|>
Image

Could you double-check if your PR #44532 covers this specific mismatch?

@pens-u
Copy link
Copy Markdown
Author

pens-u commented Jun 5, 2026

Hi @elseyu — thanks for the follow-up. To confirm whether this PR covers your case, could you run the snippet below on your server's model?

from transformers import AutoTokenizer
tok = AutoTokenizer.from_pretrained("/data/model/gemma-4-31b-it")
vocab = tok.get_vocab()
for key in ["<|tool_call>", "<tool_call|>", '<|"|>']:
    tid = vocab[key]
    decoded = tok.decode([tid], skip_special_tokens=False)
    print(f"vocab[{key!r}] → id={tid}  decoded={decoded!r}  match={decoded == key}")

If any line prints match=False, your tokenizer has the vocab-key vs decoded-form mismatch this PR fixes — the streaming guard was checking for the vocab-key string but output.text contained the decoded form, so the guard always missed it and leaked the raw tool-call block as content.

If all lines print match=True, your tokens decode canonically and the bug is something else (possibly the model skipping the <|channel>…<channel|> thinking block in some turns). In that case please let us know and we can investigate further.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working tool-calling

Projects

Status: No status

Development

Successfully merging this pull request may close these issues.

2 participants