[Frontend] [Bugfix] respect server-level default chat template kwargs in reasoning parser by cjackal · Pull Request #31581 · vllm-project/vllm

cjackal · 2025-12-31T15:57:14Z

Purpose

After #31343, the concept of server-wide default chat template keyword arguments (--default-chat-template-kwargs) came up. These default kwargs are merged with the user-provided (request-wide) kwargs and then passed to the tokenizer.

This logic works well for chat template rendering purpose, but vllm frontend has another place which consumes the chat template kwargs: reasoning parser.

Currently some reasoning models (e.g. DeepSeek V3.1, Holo2) support thinking on/off switch with the presence of "enable_thinking" keyword in the request. This keyword should be passed to the reasoning parser as well, otherwise it can happen that hybrid reasoning models process reasoning request using non-reasoning parser or vise versa.

This PR fixes this mismatch by applying the same keyword merge before passing to the reasoning parser.

Note

While writing this PR I saw that responses API does not reflect the chat template kwargs at all. Not in the scope of this PR, but it sounds necessary to pass the chat template kwargs (both the user-/server-side) to the responses processor as well for responses API to work properly.

Test Plan

Launch DeepSeek V3.1 with --reasoning-parser deepseek_v3 --default-chat-template-kwargs '{"enable_thinking":true}'. See that the model responds correctly with reasoning parser for the following request.

curl -XPOST -H 'Content-Type: application/json' \
  http://localhost:8000/v1/chat/completions \
  -d '{"messages":[{"role":"user","content":"안녕?"}]}' | jq .

Test Result

{
  "id":"chatcmpl-f9219592-21f1-9e5a-b92b-124ac8a8c1dc",
  "object":"chat.completion",
  "create":1767195458,
  "model":"DEEPSEEK-V3.1",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "안녕하세요! 반갑습니다. 오늘 어떻게 도와드릴까요?",
        "refusal": null,
        "annotations": null,
        "audio": null,
        "function_call": null,
        "tool_calls": [],
        "reasoning": "The user initiated with \"안녕?\", a Korean greeting coupled with a question mark, indicating a welcoming and potentially inquisitive tone. Since this translates to \"Hello?\" in English, reflecting a standard conversational opening, my priority is to craft a reply that mirrors the warmth and language for effective rapport.  \n\nOpting for a natural response in Korean not only aligns with linguistic consistency but also fosters comfort for the user. Reciprocating with \"안녕하세요!\" ensures a friendly and polite acknowledgment, while appending an offer of assistant line \"오늘 어떻게 도와드릴까요?\" expands the interaction to address their needs proactively. This approach maintains engagement and encourages further dialogue.",
        "reasoning_content": "The user initiated with \"안녕?\", a Korean greeting coupled with a question mark, indicating a welcoming and potentially inquisitive tone. Since this translates to \"Hello?\" in English, reflecting a standard conversational opening, my priority is to craft a reply that mirrors the warmth and language for effective rapport.  \n\nOpting for a natural response in Korean not only aligns with linguistic consistency but also fosters comfort for the user. Reciprocating with \"안녕하세요!\" ensures a friendly and polite acknowledgment, while appending an offer of assistant line \"오늘 어떻게 도와드릴까요?\" expands the interaction to address their needs proactively. This approach maintains engagement and encourages further dialogue."
      },
      "logprobs": null,
      "finish_reason": "stop",
      "stop_reason": 151336,
      "token_ids": null
    }
  ],
  "service_tier": null,
  "system_fingerprint": null,
  "usage": {
    "prompt_tokens": 9,
    "total_tokens": 183,
    "completion_tokens": 174,
    "prompt_tokens_details": null
  },
  "prompt_logprobs": null,
  "prompt_token_ids": null,
  "kv_transfer_params": null
}

Essential Elements of an Effective PR Description Checklist

The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
The test plan, such as providing test command.
The test results, such as pasting the results comparison before and after, or e2e results
(Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
(Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

Signed-off-by: cjackal <44624812+cjackal@users.noreply.github.com>

gemini-code-assist

Code Review

This pull request correctly fixes a bug where server-level default chat template keyword arguments were not being passed to the reasoning parser. The change ensures that both server-level and request-level keyword arguments are merged and used, which is crucial for features like the 'enable_thinking' switch in some reasoning models. The fix is applied to both streaming and non-streaming chat completion paths. However, this introduces code duplication. I've added a comment suggesting a refactoring to a helper method to improve maintainability and prevent future inconsistencies, which I consider a high-priority improvement to prevent similar bugs from recurring.

vllm/entrypoints/openai/serving_chat.py

Signed-off-by: cjackal <44624812+cjackal@users.noreply.github.com>

chaunceyjiang

thanks~

vllm/entrypoints/openai/serving_engine.py

Co-authored-by: Chauncey <chaunceyjiang@gmail.com> Signed-off-by: cjackal <44624812+cjackal@users.noreply.github.com>

cjackal · 2026-01-05T00:06:59Z

It seems CI failed with infra-related (rate exceeded) issue

… in reasoning parser (vllm-project#31581) Signed-off-by: cjackal <44624812+cjackal@users.noreply.github.com> Co-authored-by: Chauncey <chaunceyjiang@gmail.com>

… in reasoning parser (vllm-project#31581) Signed-off-by: cjackal <44624812+cjackal@users.noreply.github.com> Co-authored-by: Chauncey <chaunceyjiang@gmail.com> Signed-off-by: dsuhinin <suhinin.dmitriy@gmail.com>

… in reasoning parser (vllm-project#31581) Signed-off-by: cjackal <44624812+cjackal@users.noreply.github.com> Co-authored-by: Chauncey <chaunceyjiang@gmail.com>

respect server-level default chat template kwargs in reasoning parser

8332c67

Signed-off-by: cjackal <44624812+cjackal@users.noreply.github.com>

cjackal requested review from aarnphm and chaunceyjiang as code owners December 31, 2025 15:57

mergify bot added the frontend label Dec 31, 2025

gemini-code-assist bot reviewed Dec 31, 2025

View reviewed changes

vllm/entrypoints/openai/serving_chat.py Outdated Show resolved Hide resolved

refactor following gemini comment

310e1c4

Signed-off-by: cjackal <44624812+cjackal@users.noreply.github.com>

chaunceyjiang approved these changes Jan 4, 2026

View reviewed changes

chaunceyjiang self-assigned this Jan 4, 2026

chaunceyjiang added the ready ONLY add when PR is ready to merge/full CI is needed label Jan 4, 2026

chaunceyjiang reviewed Jan 4, 2026

View reviewed changes

vllm/entrypoints/openai/serving_engine.py Show resolved Hide resolved

Update vllm/entrypoints/openai/serving_engine.py

9a72ac9

Co-authored-by: Chauncey <chaunceyjiang@gmail.com> Signed-off-by: cjackal <44624812+cjackal@users.noreply.github.com>

chaunceyjiang enabled auto-merge (squash) January 4, 2026 03:05

Merge branch 'main' into respec-default-chat-template-kwargs

e88515d

chaunceyjiang merged commit e2701cc into vllm-project:main Jan 5, 2026
47 checks passed

cjackal mentioned this pull request Jan 6, 2026

[Frontend] Support GLM-4.5 / GLM-4.7 with enable_thinking: false #31788

Merged

5 tasks

cjackal deleted the respec-default-chat-template-kwargs branch January 11, 2026 16:43

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Frontend] [Bugfix] respect server-level default chat template kwargs in reasoning parser#31581

[Frontend] [Bugfix] respect server-level default chat template kwargs in reasoning parser#31581
chaunceyjiang merged 4 commits intovllm-project:mainfrom
cjackal:respec-default-chat-template-kwargs

cjackal commented Dec 31, 2025 •

edited by github-actions bot

Loading

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

Uh oh!

chaunceyjiang left a comment

Uh oh!

Uh oh!

cjackal commented Jan 5, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

cjackal commented Dec 31, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Note

Test Plan

Test Result

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

chaunceyjiang left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

cjackal commented Jan 5, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

cjackal commented Dec 31, 2025 •

edited by github-actions bot

Loading