Skip to content

fix: allow HuggingFace standard chat template params via **kwargs#27622

Merged
Isotr0py merged 4 commits intovllm-project:mainfrom
wangln19:1028-white-list
Oct 28, 2025
Merged

fix: allow HuggingFace standard chat template params via **kwargs#27622
Isotr0py merged 4 commits intovllm-project:mainfrom
wangln19:1028-white-list

Conversation

@wangln19
Copy link
Contributor

@wangln19 wangln19 commented Oct 28, 2025

Purpose

Fix compatibility issue with tokenizers that use **kwargs to receive standard chat template parameters.

Problem:

  • Some tokenizer implementations (e.g., Kimi K2) don't explicitly declare standard HuggingFace parameters like add_generation_prompt, tools, etc. in their apply_chat_template method signature
  • Instead, they receive these parameters via **kwargs
  • The current parameter filtering logic in resolve_chat_template_kwargs uses allow_var_kwargs=False, which rejects these parameters
  • This causes tool calling and other features to fail silently (e.g., Kimi K2 always returns finish_reason: stop instead of finish_reason: tool_calls)

Root Cause:
The security fix in PR #25794 prevents passing parameters not explicitly declared in the function signature to avoid injection attacks. While this is correct for unknown parameters, it inadvertently blocks legitimate HuggingFace standard parameters when tokenizers use **kwargs.

Solution:
Dynamically extract the standard parameter list from PreTrainedTokenizer.apply_chat_template base class signature and whitelist these parameters even when the tokenizer implementation uses **kwargs to receive them.

Benefits:

  • ✅ Fixes compatibility with Kimi K2 and similar tokenizers
  • ✅ Maintains security - only official HuggingFace parameters are allowed
  • ✅ Zero maintenance - automatically stays in sync with transformers library updates
  • ✅ No manual whitelist to maintain

Test Plan

  1. Unit test for parameter filtering logic:
pytest tests/entrypoints/openai/test_chat_template.py -v
  1. Integration test with Kimi K2 model:
# Start vLLM server with Kimi K2
vllm serve Kimi/kimi-k2 --tool-parser kimi_k2

# Test tool calling with add_generation_prompt
curl http://localhost:8000/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "Kimi/kimi-k2",
    "messages": [{"role": "user", "content": "What is the weather in Beijing?"}],
    "tools": [{"type": "function", "function": {"name": "get_weather", "parameters": {...}}}],
    "add_generation_prompt": true
  }'
  1. Verify other models still work correctly:
# Test with standard tokenizers (e.g., Llama, Qwen)
pytest tests/tool_use/ -k "not kimi" -v

Test Result

Before the fix:

  • Kimi K2 tool calls: finish_reason: "stop" (wrong - model generates text instead of tool call)
  • Parameter add_generation_prompt was silently dropped
  • Logs show filtered parameters: {'tools': [...]} (missing add_generation_prompt)

After the fix:

  • Kimi K2 tool calls: finish_reason: "tool_calls"
  • Parameter add_generation_prompt: true correctly passed to tokenizer
  • Logs show all parameters: {'tools': [...], 'add_generation_prompt': True}
  • Standard tokenizers (Llama, Qwen, etc.) continue to work as before ✅

Security verification:

  • Unknown parameters still rejected: ✅
    # Request with evil_param
    {"evil_param": "malicious"} → Filtered out, not passed to tokenizer
  • Only HuggingFace official parameters allowed: ✅
    _get_hf_base_chat_template_params() → {'conversation', 'add_generation_prompt', 'tools', ...}

Code Changes

Modified file: vllm/entrypoints/chat_utils.py

  1. Added _get_hf_base_chat_template_params() function to dynamically extract standard parameters from HuggingFace base class
  2. Updated resolve_chat_template_kwargs() to include hf_base_params in the accept list
  3. Moved import inspect to module level for clarity

Lines changed: ~15 lines added


Checklist

  • The purpose of the PR - Fix tokenizer compatibility issue with **kwargs parameters
  • The test plan - Provided test commands for Kimi K2 and other models
  • The test results - Before/after comparison showing the fix works
  • Documentation update - Not applicable (internal change, no user-facing API changes)
  • Release notes update - Will update if maintainers request

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request provides a solid fix for the compatibility issue with tokenizers that use **kwargs for standard chat template parameters. The approach of dynamically inspecting the base PreTrainedTokenizer.apply_chat_template method is clean and maintainable. I've found one potential issue with how the parameters are extracted, which could lead to unexpected behavior. My review includes a suggestion to address this.

Some tokenizer implementations (e.g., Kimi K2) use **kwargs to receive
standard parameters like add_generation_prompt instead of declaring
them explicitly. This fix extracts the standard parameter list from
PreTrainedTokenizer.apply_chat_template base class signature to allow
these parameters while maintaining security.

The implementation also correctly excludes VAR_KEYWORD and VAR_POSITIONAL
parameter types to prevent 'kwargs' or 'args' from being treated as
valid parameter names.

Signed-off-by: wangln19 <wanglinian@dev.wanglinian.msh-dev.svc.cluster.local>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
Signed-off-by: wangln19 <96399074+wangln19@users.noreply.github.com>
Copy link
Member

@DarkLight1337 DarkLight1337 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, LGTM.

cc @Isotr0py

@DarkLight1337 DarkLight1337 enabled auto-merge (squash) October 28, 2025 04:58
@github-actions github-actions bot added the ready ONLY add when PR is ready to merge/full CI is needed label Oct 28, 2025
Copy link
Member

@Isotr0py Isotr0py left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for fixing!

Comment on lines +1560 to +1564

# Allow standard HF parameters even if tokenizer uses **kwargs to receive them
hf_base_params = _get_hf_base_chat_template_params()

accept_vars = (fn_kw | template_vars | hf_base_params) - unexpected_vars
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you also update the test in tests/entrypoints/test_chat_utils.py?

@pytest.mark.parametrize(
"model, expected_kwargs",
[
(
QWEN2VL_MODEL_ID,
{
"add_vision_id",
"add_generation_prompt",
"continue_final_message",
"tools",
},
),
(
QWEN3_MODEL_ID,
{
"enable_thinking",
"add_generation_prompt",
"continue_final_message",
"tools",
},
),
],
)
def test_resolve_hf_chat_template_kwargs(sample_json_schema, model, expected_kwargs):

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated the test as suggested.

I considered adding Kimi K2 to the test model registry, but decided against it for the following reasons:

  1. Infrastructure overhead: Adding a new model to tests/models/registry.py requires extensive configuration (tokenizer path, trust_remote_code settings, HF overrides, etc.), which would be significant effort for testing a single parameter filtering behavior.

  2. Generic fix: This change is not Kimi K2-specific. It enables support for any tokenizer that uses **kwargs to receive HuggingFace standard parameters. The mock tokenizer approach tests the core logic more directly.

  3. Sufficient coverage: The updated test now includes:

    • Existing integration tests (Qwen2-VL, Qwen3) verify backward compatibility
    • New mock tokenizer test validates the **kwargs scenario (like Kimi K2)
    • Manual integration testing with actual Kimi K2 model confirms end-to-end functionality (as documented in PR description)

The mock approach isolates the parameter filtering logic we're actually fixing, without coupling the test suite to a specific model that may have availability/licensing constraints.

Some tokenizer implementations (e.g., Kimi K2) use **kwargs to receive
standard parameters like add_generation_prompt instead of declaring
them explicitly. This fix extracts the standard parameter list from
PreTrainedTokenizer.apply_chat_template base class signature to allow
these parameters while maintaining security.

The implementation also correctly excludes VAR_KEYWORD and VAR_POSITIONAL
parameter types to prevent 'kwargs' or 'args' from being treated as
valid parameter names.

Signed-off-by: wangln19 <wanglinian@dev.wanglinian.msh-dev.svc.cluster.local>
@Isotr0py Isotr0py merged commit 446912d into vllm-project:main Oct 28, 2025
47 checks passed
ilmarkov pushed a commit to neuralmagic/vllm that referenced this pull request Nov 7, 2025
…lm-project#27622)

Signed-off-by: wangln19 <wanglinian@dev.wanglinian.msh-dev.svc.cluster.local>
Signed-off-by: wangln19 <96399074+wangln19@users.noreply.github.com>
Co-authored-by: wangln19 <wanglinian@dev.wanglinian.msh-dev.svc.cluster.local>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
ZhengHongming888 pushed a commit to ZhengHongming888/vllm that referenced this pull request Nov 8, 2025
…lm-project#27622)

Signed-off-by: wangln19 <wanglinian@dev.wanglinian.msh-dev.svc.cluster.local>
Signed-off-by: wangln19 <96399074+wangln19@users.noreply.github.com>
Co-authored-by: wangln19 <wanglinian@dev.wanglinian.msh-dev.svc.cluster.local>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
rtourgeman pushed a commit to rtourgeman/vllm that referenced this pull request Nov 10, 2025
…lm-project#27622)

Signed-off-by: wangln19 <wanglinian@dev.wanglinian.msh-dev.svc.cluster.local>
Signed-off-by: wangln19 <96399074+wangln19@users.noreply.github.com>
Co-authored-by: wangln19 <wanglinian@dev.wanglinian.msh-dev.svc.cluster.local>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
devpatelio pushed a commit to SumanthRH/vllm that referenced this pull request Nov 29, 2025
…lm-project#27622)

Signed-off-by: wangln19 <wanglinian@dev.wanglinian.msh-dev.svc.cluster.local>
Signed-off-by: wangln19 <96399074+wangln19@users.noreply.github.com>
Co-authored-by: wangln19 <wanglinian@dev.wanglinian.msh-dev.svc.cluster.local>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

frontend ready ONLY add when PR is ready to merge/full CI is needed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants