Skip to content

support skip tokenizer chat_template#29895

Closed
piood wants to merge 2 commits intovllm-project:mainfrom
piood:support_skip_tokenizer_chat_template
Closed

support skip tokenizer chat_template#29895
piood wants to merge 2 commits intovllm-project:mainfrom
piood:support_skip_tokenizer_chat_template

Conversation

@piood
Copy link
Copy Markdown
Contributor

@piood piood commented Dec 2, 2025

Purpose

This PR adds support for skipping tokenizer chat templates for SigLIP and SigLIP2 models. These models' tokenizer-provided chat templates may have issues, so we skip them and use vLLM's predefined fallback templates instead.

The changes include:

  • Added a new set _SKIP_TOKENIZER_CHAT_TEMPLATE containing "siglip" and "siglip2" model types
  • Modified resolve_hf_chat_template() in chat_utils.py to check if the model type should skip tokenizer chat templates
  • When skipping, the function directly falls back to predefined templates instead of attempting to load from tokenizer

Fix #27566 (comment)

Test Plan

vllm serve google/siglip2-base-patch16-224 --runner pooling --enforce-eager
python examples/online_serving/pooling/openai_chat_embedding_client_for_multimodal.py --model siglip

Test Result

Correct output, not occur error.


Essential Elements of an Effective PR Description Checklist
  • [x ] The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
  • The test plan, such as providing test command.
  • The test results, such as pasting the results comparison before and after, or e2e results
  • (Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
  • (Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

Signed-off-by: piood <2477084691@qq.com>
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a mechanism to skip using the tokenizer-provided chat template for specific models, namely 'siglip' and 'siglip2', and instead use vLLM's fallback templates. The implementation is straightforward, using a new set _SKIP_TOKENIZER_CHAT_TEMPLATE to control this behavior within resolve_hf_chat_template. The logic correctly implements the intended fix. My main feedback is to improve code maintainability by moving a local import to the top level of the file, following standard Python conventions. Additionally, I recommend adding a unit test to cover this new skipping logic to prevent future regressions.

Signed-off-by: piood <2477084691@qq.com>
@piood
Copy link
Copy Markdown
Contributor Author

piood commented Dec 2, 2025

/gemini review

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a mechanism to skip using a tokenizer's built-in chat template for specific models, namely siglip and siglip2, and instead use vLLM's predefined fallbacks. The changes are implemented by creating a skip-list of model types and checking against it before attempting to load the template from the tokenizer. The implementation is clean and effectively addresses the issue. I've added one suggestion to improve maintainability by adding an assertion to ensure that any model added to the skip list has a corresponding fallback template defined, preventing potential runtime errors in the future.

@piood
Copy link
Copy Markdown
Contributor Author

piood commented Dec 2, 2025

@DarkLight1337 Can you review it, thanks.

@DarkLight1337
Copy link
Copy Markdown
Member

I think this isn't really necessary? You can just pass --chat-template argument to achieve this.

@piood
Copy link
Copy Markdown
Contributor Author

piood commented Dec 2, 2025

def run_siglip(client: OpenAI, model: str):
    """
    Start the server using:

    vllm serve google/siglip-base-patch16-224 \
        --runner pooling
    """

Yes, you are right, I think we should add --chat-template vllm/transformers_utils/chat_templates/template_basic.jinja behind this command to tell user add this param.

@DarkLight1337
Copy link
Copy Markdown
Member

Sure, do update the documentation!

@piood
Copy link
Copy Markdown
Contributor Author

piood commented Dec 2, 2025

Open #29902 update siglip use example.

@piood
Copy link
Copy Markdown
Contributor Author

piood commented Dec 2, 2025

Close this.

@piood piood closed this Dec 2, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants