support skip tokenizer chat_template#29895
Conversation
Signed-off-by: piood <2477084691@qq.com>
There was a problem hiding this comment.
Code Review
This pull request introduces a mechanism to skip using the tokenizer-provided chat template for specific models, namely 'siglip' and 'siglip2', and instead use vLLM's fallback templates. The implementation is straightforward, using a new set _SKIP_TOKENIZER_CHAT_TEMPLATE to control this behavior within resolve_hf_chat_template. The logic correctly implements the intended fix. My main feedback is to improve code maintainability by moving a local import to the top level of the file, following standard Python conventions. Additionally, I recommend adding a unit test to cover this new skipping logic to prevent future regressions.
|
/gemini review |
There was a problem hiding this comment.
Code Review
This pull request introduces a mechanism to skip using a tokenizer's built-in chat template for specific models, namely siglip and siglip2, and instead use vLLM's predefined fallbacks. The changes are implemented by creating a skip-list of model types and checking against it before attempting to load the template from the tokenizer. The implementation is clean and effectively addresses the issue. I've added one suggestion to improve maintainability by adding an assertion to ensure that any model added to the skip list has a corresponding fallback template defined, preventing potential runtime errors in the future.
|
@DarkLight1337 Can you review it, thanks. |
|
I think this isn't really necessary? You can just pass |
def run_siglip(client: OpenAI, model: str):
"""
Start the server using:
vllm serve google/siglip-base-patch16-224 \
--runner pooling
"""Yes, you are right, I think we should add |
|
Sure, do update the documentation! |
|
Open #29902 update siglip use example. |
|
Close this. |
Purpose
This PR adds support for skipping tokenizer chat templates for SigLIP and SigLIP2 models. These models' tokenizer-provided chat templates may have issues, so we skip them and use vLLM's predefined fallback templates instead.
The changes include:
_SKIP_TOKENIZER_CHAT_TEMPLATEcontaining "siglip" and "siglip2" model typesresolve_hf_chat_template()inchat_utils.pyto check if the model type should skip tokenizer chat templatesFix #27566 (comment)
Test Plan
vllm serve google/siglip2-base-patch16-224 --runner pooling --enforce-eagerpython examples/online_serving/pooling/openai_chat_embedding_client_for_multimodal.py --model siglipTest Result
Correct output, not occur error.
Essential Elements of an Effective PR Description Checklist
supported_models.mdandexamplesfor a new model.