fix: allow HuggingFace standard chat template params via **kwargs by wangln19 · Pull Request #27622 · vllm-project/vllm

wangln19 · 2025-10-28T03:09:59Z

Purpose

Fix compatibility issue with tokenizers that use **kwargs to receive standard chat template parameters.

Problem:

Some tokenizer implementations (e.g., Kimi K2) don't explicitly declare standard HuggingFace parameters like add_generation_prompt, tools, etc. in their apply_chat_template method signature
Instead, they receive these parameters via **kwargs
The current parameter filtering logic in resolve_chat_template_kwargs uses allow_var_kwargs=False, which rejects these parameters
This causes tool calling and other features to fail silently (e.g., Kimi K2 always returns finish_reason: stop instead of finish_reason: tool_calls)

Root Cause:
The security fix in PR #25794 prevents passing parameters not explicitly declared in the function signature to avoid injection attacks. While this is correct for unknown parameters, it inadvertently blocks legitimate HuggingFace standard parameters when tokenizers use **kwargs.

Solution:
Dynamically extract the standard parameter list from PreTrainedTokenizer.apply_chat_template base class signature and whitelist these parameters even when the tokenizer implementation uses **kwargs to receive them.

Benefits:

✅ Fixes compatibility with Kimi K2 and similar tokenizers
✅ Maintains security - only official HuggingFace parameters are allowed
✅ Zero maintenance - automatically stays in sync with transformers library updates
✅ No manual whitelist to maintain

Test Plan

Unit test for parameter filtering logic:

pytest tests/entrypoints/openai/test_chat_template.py -v

Integration test with Kimi K2 model:

# Start vLLM server with Kimi K2
vllm serve Kimi/kimi-k2 --tool-parser kimi_k2

# Test tool calling with add_generation_prompt
curl http://localhost:8000/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "Kimi/kimi-k2",
    "messages": [{"role": "user", "content": "What is the weather in Beijing?"}],
    "tools": [{"type": "function", "function": {"name": "get_weather", "parameters": {...}}}],
    "add_generation_prompt": true
  }'

Verify other models still work correctly:

# Test with standard tokenizers (e.g., Llama, Qwen)
pytest tests/tool_use/ -k "not kimi" -v

Test Result

Before the fix:

Kimi K2 tool calls: finish_reason: "stop" (wrong - model generates text instead of tool call)
Parameter add_generation_prompt was silently dropped
Logs show filtered parameters: {'tools': [...]} (missing add_generation_prompt)

After the fix:

Kimi K2 tool calls: finish_reason: "tool_calls" ✅
Parameter add_generation_prompt: true correctly passed to tokenizer
Logs show all parameters: {'tools': [...], 'add_generation_prompt': True}
Standard tokenizers (Llama, Qwen, etc.) continue to work as before ✅

Security verification:

Unknown parameters still rejected: ✅

# Request with evil_param
{"evil_param": "malicious"} → Filtered out, not passed to tokenizer

Only HuggingFace official parameters allowed: ✅

_get_hf_base_chat_template_params() → {'conversation', 'add_generation_prompt', 'tools', ...}

Code Changes

Modified file: vllm/entrypoints/chat_utils.py

Added _get_hf_base_chat_template_params() function to dynamically extract standard parameters from HuggingFace base class
Updated resolve_chat_template_kwargs() to include hf_base_params in the accept list
Moved import inspect to module level for clarity

Lines changed: ~15 lines added

Checklist

The purpose of the PR - Fix tokenizer compatibility issue with **kwargs parameters
The test plan - Provided test commands for Kimi K2 and other models
The test results - Before/after comparison showing the fix works
Documentation update - Not applicable (internal change, no user-facing API changes)
Release notes update - Will update if maintainers request

gemini-code-assist

Code Review

This pull request provides a solid fix for the compatibility issue with tokenizers that use **kwargs for standard chat template parameters. The approach of dynamically inspecting the base PreTrainedTokenizer.apply_chat_template method is clean and maintainable. I've found one potential issue with how the parameters are extracted, which could lead to unexpected behavior. My review includes a suggestion to address this.

vllm/entrypoints/chat_utils.py

Some tokenizer implementations (e.g., Kimi K2) use **kwargs to receive standard parameters like add_generation_prompt instead of declaring them explicitly. This fix extracts the standard parameter list from PreTrainedTokenizer.apply_chat_template base class signature to allow these parameters while maintaining security. The implementation also correctly excludes VAR_KEYWORD and VAR_POSITIONAL parameter types to prevent 'kwargs' or 'args' from being treated as valid parameter names. Signed-off-by: wangln19 <wanglinian@dev.wanglinian.msh-dev.svc.cluster.local>

vllm/entrypoints/chat_utils.py

Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com> Signed-off-by: wangln19 <96399074+wangln19@users.noreply.github.com>

DarkLight1337

Thanks, LGTM.

cc @Isotr0py

Isotr0py

Thanks for fixing!

Isotr0py · 2025-10-28T05:29:04Z

vllm/entrypoints/chat_utils.py

+
+    # Allow standard HF parameters even if tokenizer uses **kwargs to receive them
+    hf_base_params = _get_hf_base_chat_template_params()
+
+    accept_vars = (fn_kw | template_vars | hf_base_params) - unexpected_vars


Can you also update the test in tests/entrypoints/test_chat_utils.py?

vllm/tests/entrypoints/test_chat_utils.py

Lines 1783 to 1806 in 5b3c35a

@pytest.mark.parametrize(

"model, expected_kwargs",

[

(

QWEN2VL_MODEL_ID,

{

"add_vision_id",

"add_generation_prompt",

"continue_final_message",

"tools",

},

),

(

QWEN3_MODEL_ID,

{

"enable_thinking",

"add_generation_prompt",

"continue_final_message",

"tools",

},

),

],

)

def test_resolve_hf_chat_template_kwargs(sample_json_schema, model, expected_kwargs):

Updated the test as suggested.

I considered adding Kimi K2 to the test model registry, but decided against it for the following reasons:

Infrastructure overhead: Adding a new model to tests/models/registry.py requires extensive configuration (tokenizer path, trust_remote_code settings, HF overrides, etc.), which would be significant effort for testing a single parameter filtering behavior.

Generic fix: This change is not Kimi K2-specific. It enables support for any tokenizer that uses **kwargs to receive HuggingFace standard parameters. The mock tokenizer approach tests the core logic more directly.

Sufficient coverage: The updated test now includes:

Existing integration tests (Qwen2-VL, Qwen3) verify backward compatibility

New mock tokenizer test validates the **kwargs scenario (like Kimi K2)

Manual integration testing with actual Kimi K2 model confirms end-to-end functionality (as documented in PR description)

The mock approach isolates the parameter filtering logic we're actually fixing, without coupling the test suite to a specific model that may have availability/licensing constraints.

Some tokenizer implementations (e.g., Kimi K2) use **kwargs to receive standard parameters like add_generation_prompt instead of declaring them explicitly. This fix extracts the standard parameter list from PreTrainedTokenizer.apply_chat_template base class signature to allow these parameters while maintaining security. The implementation also correctly excludes VAR_KEYWORD and VAR_POSITIONAL parameter types to prevent 'kwargs' or 'args' from being treated as valid parameter names. Signed-off-by: wangln19 <wanglinian@dev.wanglinian.msh-dev.svc.cluster.local>

…lm-project#27622) Signed-off-by: wangln19 <wanglinian@dev.wanglinian.msh-dev.svc.cluster.local> Signed-off-by: wangln19 <96399074+wangln19@users.noreply.github.com> Co-authored-by: wangln19 <wanglinian@dev.wanglinian.msh-dev.svc.cluster.local> Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>

wangln19 requested review from aarnphm and chaunceyjiang as code owners October 28, 2025 03:10

mergify bot added the frontend label Oct 28, 2025

gemini-code-assist bot reviewed Oct 28, 2025

View reviewed changes

vllm/entrypoints/chat_utils.py Outdated Show resolved Hide resolved

wangln19 force-pushed the 1028-white-list branch from 5f46dd0 to e00dee4 Compare October 28, 2025 03:23

wangln19 force-pushed the 1028-white-list branch from 3ec44f6 to 4636728 Compare October 28, 2025 03:38

DarkLight1337 reviewed Oct 28, 2025

View reviewed changes

vllm/entrypoints/chat_utils.py Outdated Show resolved Hide resolved

Update vllm/entrypoints/chat_utils.py

60a3ca5

Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com> Signed-off-by: wangln19 <96399074+wangln19@users.noreply.github.com>

DarkLight1337 approved these changes Oct 28, 2025

View reviewed changes

DarkLight1337 enabled auto-merge (squash) October 28, 2025 04:58

github-actions bot added the ready ONLY add when PR is ready to merge/full CI is needed label Oct 28, 2025

Isotr0py approved these changes Oct 28, 2025

View reviewed changes

DarkLight1337 disabled auto-merge October 28, 2025 05:43

wangln19 requested review from NickLucche, robertgshaw2-redhat and simon-mo as code owners October 28, 2025 07:24

Merge branch 'main' into 1028-white-list

271245a

Isotr0py merged commit 446912d into vllm-project:main Oct 28, 2025
47 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix: allow HuggingFace standard chat template params via **kwargs#27622

fix: allow HuggingFace standard chat template params via **kwargs#27622
Isotr0py merged 4 commits intovllm-project:mainfrom
wangln19:1028-white-list

wangln19 commented Oct 28, 2025 •

edited by github-actions bot

Loading

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

Uh oh!

Uh oh!

DarkLight1337 left a comment

Uh oh!

Isotr0py left a comment

Uh oh!

Isotr0py Oct 28, 2025

Uh oh!

wangln19 Oct 28, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

	@pytest.mark.parametrize(
	"model, expected_kwargs",
	[
	(
	QWEN2VL_MODEL_ID,
	{
	"add_vision_id",
	"add_generation_prompt",
	"continue_final_message",
	"tools",
	},
	),
	(
	QWEN3_MODEL_ID,
	{
	"enable_thinking",
	"add_generation_prompt",
	"continue_final_message",
	"tools",
	},
	),
	],
	)
	def test_resolve_hf_chat_template_kwargs(sample_json_schema, model, expected_kwargs):

Uh oh!

Conversation

wangln19 commented Oct 28, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Test Plan

Test Result

Code Changes

Checklist

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

DarkLight1337 left a comment

Choose a reason for hiding this comment

Uh oh!

Isotr0py left a comment

Choose a reason for hiding this comment

Uh oh!

Isotr0py Oct 28, 2025

Choose a reason for hiding this comment

Uh oh!

wangln19 Oct 28, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

wangln19 commented Oct 28, 2025 •

edited by github-actions bot

Loading