Skip to content

feat(frontend): add --default-chat-template-kwargs CLI argument#31343

Merged
chaunceyjiang merged 4 commits intovllm-project:mainfrom
effortprogrammer:feat/default-chat-template-kwargs
Dec 30, 2025
Merged

feat(frontend): add --default-chat-template-kwargs CLI argument#31343
chaunceyjiang merged 4 commits intovllm-project:mainfrom
effortprogrammer:feat/default-chat-template-kwargs

Conversation

@effortprogrammer
Copy link
Copy Markdown
Contributor

@effortprogrammer effortprogrammer commented Dec 25, 2025

Fixes #28070

Purpose

Add server-level default chat_template_kwargs to control reasoning model behavior at deployment time. Request-level kwargs override these defaults.

Test Plan

This PR allows explicit control of reasoning/non-reasoning mode at the vllm serve command level using --default-chat-template-kwargs.

For reasoning models like Qwen3, you can now disable thinking mode server-wide by setting {"enable_thinking": false} as a default, eliminating the need to specify it in every request. Request-level chat_template_kwargs will override these server defaults when provided.

Manual test command:

vllm serve Qwen/Qwen3-8B --tensor-parallel-size 2 --served-model-name xionic-test --host 0.0.0.0 --port 8000

Minimal python code for test:

from openai import OpenAI
BASE_URL = "http://localhost:8000/v1"  # Change to your server
MODEL = "xionic-test"
client = OpenAI(api_key="EMPTY", base_url=BASE_URL)
# Same request to both servers
messages = [{"role": "user", "content": "What is 2+2?"}]
print("=" * 60)
print("WITHOUT --default-chat-template-kwargs (thinking enabled)")
print("=" * 60)
resp1 = client.chat.completions.create(model=MODEL, messages=messages, max_tokens=200)
print(resp1.choices[0].message.content)
print("\n" + "=" * 60)
print("WITH --default-chat-template-kwargs (thinking disabled)")
print("Or using client-side override (current workaround):")
print("=" * 60)
resp2 = client.chat.completions.create(
    model=MODEL,
    messages=messages,
    max_tokens=200,
    extra_body={"chat_template_kwargs": {"enable_thinking": False}},
)
print(resp2.choices[0].message.content)
print("\n" + "=" * 60)
print("SUMMARY")
print("=" * 60)
print(f"Response 1 length: {len(resp1.choices[0].message.content)} chars")
print(f"Response 2 length: {len(resp2.choices[0].message.content)} chars")
print(f"Has <think> tag in resp1: {'<think>' in resp1.choices[0].message.content}")
print(f"Has <think> tag in resp2: {'<think>' in resp2.choices[0].message.content}")

Test Result

WITHOUT --default-chat-template-kwargs (thinking enabled):

Result:
Okay, the user is asking "What is 2+2?" That seems straightforward, but maybe they want a detailed explanation. Let me think. First, I should confirm the basic arithmetic. 2 plus 2 is 4. But maybe they're testing if I know the answer or if there's a trick. Sometimes people ask simple questions to see if the AI is reliable.

Wait, could there be a different interpretation? Like in some contexts, 2+2 might not be 4? For example, in modular arithmetic, if we're working modulo 3, 2+2 would be 1. But the question doesn't specify any context, so the default is standard arithmetic.

Also, maybe they want to know the steps involved. Let me break it down. Starting with two units and adding another two units. So 2 + 2 equals 4. But perhaps they want a more detailed explanation, like using number lines or visual aids.

WITH --default-chat-template-kwargs (thinking disabled):

Result: 2 + 2 equals 4.

Essential Elements of an Effective PR Description Checklist
  • The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
  • The test plan, such as providing test command.
  • The test results, such as pasting the results comparison before and after, or e2e results
  • (Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
  • (Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

@chatgpt-codex-connector
Copy link
Copy Markdown

Codex usage limits have been reached for code reviews. Please check with the admins of this repo to increase the limits by adding credits.

@mergify mergify bot added the frontend label Dec 25, 2025
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a new CLI argument --default-chat-template-kwargs to set server-level default keyword arguments for the chat template renderer. The implementation correctly adds the argument and passes it through to the serving layer. However, there is a logic issue in how these default arguments are merged with request-level arguments, which could lead to server defaults incorrectly overriding request parameters. I've provided a suggestion to fix the merge order.

@github-actions
Copy link
Copy Markdown

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors.

You ask your reviewers to trigger select CI tests on top of fastcheck CI.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

If you have any questions, please reach out to us on Slack at https://slack.vllm.ai.

🚀

@effortprogrammer effortprogrammer marked this pull request as ready for review December 25, 2025 11:27
@chatgpt-codex-connector
Copy link
Copy Markdown

Codex usage limits have been reached for code reviews. Please check with the admins of this repo to increase the limits by adding credits.

@mergify
Copy link
Copy Markdown

mergify bot commented Dec 25, 2025

Hi @effortprogrammer, the pre-commit checks have failed. Please run:

uv pip install pre-commit
pre-commit install
pre-commit run --all-files

Then, commit the changes and push to your branch.

For future commits, pre-commit will run automatically on changed files before each commit.

Tip

Is mypy or markdownlint failing?
mypy and markdownlint are run differently in CI. If the failure is related to either of these checks, please use the following commands to run them locally:
# For mypy (substitute "3.10" with the failing version if needed)
pre-commit run --hook-stage manual mypy-3.10
# For markdownlint
pre-commit run --hook-stage manual markdownlint

@effortprogrammer effortprogrammer force-pushed the feat/default-chat-template-kwargs branch from 8731e9a to cce2494 Compare December 25, 2025 11:39
@effortprogrammer
Copy link
Copy Markdown
Contributor Author

@DarkLight1337 @chaunceyjiang

I made essential changes based on review. Please check if there's any more issues!

@chaunceyjiang
Copy link
Copy Markdown
Collaborator

@effortprogrammer You need to DCO.

@effortprogrammer effortprogrammer force-pushed the feat/default-chat-template-kwargs branch from 7386263 to 413cd1b Compare December 29, 2025 13:36
@mergify
Copy link
Copy Markdown

mergify bot commented Dec 29, 2025

Documentation preview: https://vllm--31343.org.readthedocs.build/en/31343/

@mergify mergify bot added the documentation Improvements or additions to documentation label Dec 29, 2025
Add server-level default chat_template_kwargs to control reasoning model
behavior at deployment time. Request-level kwargs override these defaults.
Fixes vllm-project#28070

Signed-off-by: effortprogrammer <yhjhoward7@gmail.com>
Signed-off-by: effortprogrammer <yhjhoward7@gmail.com>
…te args

Signed-off-by: effortprogrammer <yhjhoward7@gmail.com>
Signed-off-by: effortprogrammer <yhjhoward7@gmail.com>
@effortprogrammer effortprogrammer force-pushed the feat/default-chat-template-kwargs branch from 61f6f35 to dda72c8 Compare December 29, 2025 13:52
Copy link
Copy Markdown
Collaborator

@chaunceyjiang chaunceyjiang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@chaunceyjiang chaunceyjiang enabled auto-merge (squash) December 29, 2025 14:32
@chaunceyjiang chaunceyjiang added the ready ONLY add when PR is ready to merge/full CI is needed label Dec 29, 2025
@effortprogrammer
Copy link
Copy Markdown
Contributor Author

@chaunceyjiang @DarkLight1337 It seems like current CI/CD failed does not relate with my current changes. Is there anything I should change for?

@chaunceyjiang chaunceyjiang merged commit dc837bc into vllm-project:main Dec 30, 2025
49 checks passed
yiliu30 pushed a commit to yiliu30/vllm-fork that referenced this pull request Dec 30, 2025
yugong333 pushed a commit to yugong333/vllm that referenced this pull request Jan 9, 2026
akh64bit pushed a commit to akh64bit/vllm that referenced this pull request Jan 16, 2026
dsuhinin pushed a commit to dsuhinin/vllm that referenced this pull request Jan 21, 2026
…-project#31343)

Signed-off-by: effortprogrammer <yhjhoward7@gmail.com>
Signed-off-by: dsuhinin <suhinin.dmitriy@gmail.com>
ItzDEXX pushed a commit to ItzDEXX/vllm that referenced this pull request Feb 19, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

documentation Improvements or additions to documentation frontend ready ONLY add when PR is ready to merge/full CI is needed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Usage]: Is there a way to control default thinking behaviour of a model?

3 participants