feat(frontend): add --default-chat-template-kwargs CLI argument#31343
Conversation
|
Codex usage limits have been reached for code reviews. Please check with the admins of this repo to increase the limits by adding credits. |
There was a problem hiding this comment.
Code Review
This pull request introduces a new CLI argument --default-chat-template-kwargs to set server-level default keyword arguments for the chat template renderer. The implementation correctly adds the argument and passes it through to the serving layer. However, there is a logic issue in how these default arguments are merged with request-level arguments, which could lead to server defaults incorrectly overriding request parameters. I've provided a suggestion to fix the merge order.
|
👋 Hi! Thank you for contributing to the vLLM project. 💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels. Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run You ask your reviewers to trigger select CI tests on top of Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging. To run CI, PR reviewers can either: Add If you have any questions, please reach out to us on Slack at https://slack.vllm.ai. 🚀 |
|
Codex usage limits have been reached for code reviews. Please check with the admins of this repo to increase the limits by adding credits. |
|
Hi @effortprogrammer, the pre-commit checks have failed. Please run: uv pip install pre-commit
pre-commit install
pre-commit run --all-filesThen, commit the changes and push to your branch. For future commits, Tip Is
|
8731e9a to
cce2494
Compare
|
I made essential changes based on review. Please check if there's any more issues! |
|
@effortprogrammer You need to DCO. |
7386263 to
413cd1b
Compare
|
Documentation preview: https://vllm--31343.org.readthedocs.build/en/31343/ |
Add server-level default chat_template_kwargs to control reasoning model behavior at deployment time. Request-level kwargs override these defaults. Fixes vllm-project#28070 Signed-off-by: effortprogrammer <yhjhoward7@gmail.com>
Signed-off-by: effortprogrammer <yhjhoward7@gmail.com>
…te args Signed-off-by: effortprogrammer <yhjhoward7@gmail.com>
Signed-off-by: effortprogrammer <yhjhoward7@gmail.com>
61f6f35 to
dda72c8
Compare
chaunceyjiang
left a comment
There was a problem hiding this comment.
Thanks~ @effortprogrammer
|
@chaunceyjiang @DarkLight1337 It seems like current CI/CD failed does not relate with my current changes. Is there anything I should change for? |
…-project#31343) Signed-off-by: effortprogrammer <yhjhoward7@gmail.com>
…-project#31343) Signed-off-by: effortprogrammer <yhjhoward7@gmail.com>
…-project#31343) Signed-off-by: effortprogrammer <yhjhoward7@gmail.com>
…-project#31343) Signed-off-by: effortprogrammer <yhjhoward7@gmail.com> Signed-off-by: dsuhinin <suhinin.dmitriy@gmail.com>
…-project#31343) Signed-off-by: effortprogrammer <yhjhoward7@gmail.com>
Fixes #28070
Purpose
Add server-level default chat_template_kwargs to control reasoning model behavior at deployment time. Request-level kwargs override these defaults.
Test Plan
This PR allows explicit control of reasoning/non-reasoning mode at the vllm serve command level using
--default-chat-template-kwargs.For reasoning models like Qwen3, you can now disable thinking mode server-wide by setting {"enable_thinking": false} as a default, eliminating the need to specify it in every request. Request-level chat_template_kwargs will override these server defaults when provided.
Manual test command:
Minimal python code for test:
Test Result
WITHOUT --default-chat-template-kwargs (thinking enabled):
Result:
Okay, the user is asking "What is 2+2?" That seems straightforward, but maybe they want a detailed explanation. Let me think. First, I should confirm the basic arithmetic. 2 plus 2 is 4. But maybe they're testing if I know the answer or if there's a trick. Sometimes people ask simple questions to see if the AI is reliable.
Wait, could there be a different interpretation? Like in some contexts, 2+2 might not be 4? For example, in modular arithmetic, if we're working modulo 3, 2+2 would be 1. But the question doesn't specify any context, so the default is standard arithmetic.
Also, maybe they want to know the steps involved. Let me break it down. Starting with two units and adding another two units. So 2 + 2 equals 4. But perhaps they want a more detailed explanation, like using number lines or visual aids.
WITH --default-chat-template-kwargs (thinking disabled):
Result: 2 + 2 equals 4.
Essential Elements of an Effective PR Description Checklist
supported_models.mdandexamplesfor a new model.