Skip to content

[Frontend] [Bugfix] respect server-level default chat template kwargs in reasoning parser#31581

Merged
chaunceyjiang merged 4 commits intovllm-project:mainfrom
cjackal:respec-default-chat-template-kwargs
Jan 5, 2026
Merged

[Frontend] [Bugfix] respect server-level default chat template kwargs in reasoning parser#31581
chaunceyjiang merged 4 commits intovllm-project:mainfrom
cjackal:respec-default-chat-template-kwargs

Conversation

@cjackal
Copy link
Copy Markdown
Contributor

@cjackal cjackal commented Dec 31, 2025

Purpose

After #31343, the concept of server-wide default chat template keyword arguments (--default-chat-template-kwargs) came up. These default kwargs are merged with the user-provided (request-wide) kwargs and then passed to the tokenizer.

This logic works well for chat template rendering purpose, but vllm frontend has another place which consumes the chat template kwargs: reasoning parser.

Currently some reasoning models (e.g. DeepSeek V3.1, Holo2) support thinking on/off switch with the presence of "enable_thinking" keyword in the request. This keyword should be passed to the reasoning parser as well, otherwise it can happen that hybrid reasoning models process reasoning request using non-reasoning parser or vise versa.

This PR fixes this mismatch by applying the same keyword merge before passing to the reasoning parser.

Note

While writing this PR I saw that responses API does not reflect the chat template kwargs at all. Not in the scope of this PR, but it sounds necessary to pass the chat template kwargs (both the user-/server-side) to the responses processor as well for responses API to work properly.

Test Plan

Launch DeepSeek V3.1 with --reasoning-parser deepseek_v3 --default-chat-template-kwargs '{"enable_thinking":true}'. See that the model responds correctly with reasoning parser for the following request.

curl -XPOST -H 'Content-Type: application/json' \
  http://localhost:8000/v1/chat/completions \
  -d '{"messages":[{"role":"user","content":"안녕?"}]}' | jq .

Test Result

{
  "id":"chatcmpl-f9219592-21f1-9e5a-b92b-124ac8a8c1dc",
  "object":"chat.completion",
  "create":1767195458,
  "model":"DEEPSEEK-V3.1",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "안녕하세요! 반갑습니다. 오늘 어떻게 도와드릴까요?",
        "refusal": null,
        "annotations": null,
        "audio": null,
        "function_call": null,
        "tool_calls": [],
        "reasoning": "The user initiated with \"안녕?\", a Korean greeting coupled with a question mark, indicating a welcoming and potentially inquisitive tone. Since this translates to \"Hello?\" in English, reflecting a standard conversational opening, my priority is to craft a reply that mirrors the warmth and language for effective rapport.  \n\nOpting for a natural response in Korean not only aligns with linguistic consistency but also fosters comfort for the user. Reciprocating with \"안녕하세요!\" ensures a friendly and polite acknowledgment, while appending an offer of assistant line \"오늘 어떻게 도와드릴까요?\" expands the interaction to address their needs proactively. This approach maintains engagement and encourages further dialogue.",
        "reasoning_content": "The user initiated with \"안녕?\", a Korean greeting coupled with a question mark, indicating a welcoming and potentially inquisitive tone. Since this translates to \"Hello?\" in English, reflecting a standard conversational opening, my priority is to craft a reply that mirrors the warmth and language for effective rapport.  \n\nOpting for a natural response in Korean not only aligns with linguistic consistency but also fosters comfort for the user. Reciprocating with \"안녕하세요!\" ensures a friendly and polite acknowledgment, while appending an offer of assistant line \"오늘 어떻게 도와드릴까요?\" expands the interaction to address their needs proactively. This approach maintains engagement and encourages further dialogue."
      },
      "logprobs": null,
      "finish_reason": "stop",
      "stop_reason": 151336,
      "token_ids": null
    }
  ],
  "service_tier": null,
  "system_fingerprint": null,
  "usage": {
    "prompt_tokens": 9,
    "total_tokens": 183,
    "completion_tokens": 174,
    "prompt_tokens_details": null
  },
  "prompt_logprobs": null,
  "prompt_token_ids": null,
  "kv_transfer_params": null
}

Essential Elements of an Effective PR Description Checklist
  • The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
  • The test plan, such as providing test command.
  • The test results, such as pasting the results comparison before and after, or e2e results
  • (Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
  • (Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

Signed-off-by: cjackal <44624812+cjackal@users.noreply.github.com>
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request correctly fixes a bug where server-level default chat template keyword arguments were not being passed to the reasoning parser. The change ensures that both server-level and request-level keyword arguments are merged and used, which is crucial for features like the 'enable_thinking' switch in some reasoning models. The fix is applied to both streaming and non-streaming chat completion paths. However, this introduces code duplication. I've added a comment suggesting a refactoring to a helper method to improve maintainability and prevent future inconsistencies, which I consider a high-priority improvement to prevent similar bugs from recurring.

Signed-off-by: cjackal <44624812+cjackal@users.noreply.github.com>
Copy link
Copy Markdown
Collaborator

@chaunceyjiang chaunceyjiang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks~

@chaunceyjiang chaunceyjiang self-assigned this Jan 4, 2026
@chaunceyjiang chaunceyjiang added the ready ONLY add when PR is ready to merge/full CI is needed label Jan 4, 2026
Co-authored-by: Chauncey <chaunceyjiang@gmail.com>
Signed-off-by: cjackal <44624812+cjackal@users.noreply.github.com>
@chaunceyjiang chaunceyjiang enabled auto-merge (squash) January 4, 2026 03:05
@cjackal
Copy link
Copy Markdown
Contributor Author

cjackal commented Jan 5, 2026

It seems CI failed with infra-related (rate exceeded) issue

@chaunceyjiang chaunceyjiang merged commit e2701cc into vllm-project:main Jan 5, 2026
47 checks passed
LucasWilkinson pushed a commit to neuralmagic/vllm that referenced this pull request Jan 6, 2026
… in reasoning parser (vllm-project#31581)

Signed-off-by: cjackal <44624812+cjackal@users.noreply.github.com>
Co-authored-by: Chauncey <chaunceyjiang@gmail.com>
yugong333 pushed a commit to yugong333/vllm that referenced this pull request Jan 9, 2026
… in reasoning parser (vllm-project#31581)

Signed-off-by: cjackal <44624812+cjackal@users.noreply.github.com>
Co-authored-by: Chauncey <chaunceyjiang@gmail.com>
@cjackal cjackal deleted the respec-default-chat-template-kwargs branch January 11, 2026 16:43
akh64bit pushed a commit to akh64bit/vllm that referenced this pull request Jan 16, 2026
… in reasoning parser (vllm-project#31581)

Signed-off-by: cjackal <44624812+cjackal@users.noreply.github.com>
Co-authored-by: Chauncey <chaunceyjiang@gmail.com>
dsuhinin pushed a commit to dsuhinin/vllm that referenced this pull request Jan 21, 2026
… in reasoning parser (vllm-project#31581)

Signed-off-by: cjackal <44624812+cjackal@users.noreply.github.com>
Co-authored-by: Chauncey <chaunceyjiang@gmail.com>
Signed-off-by: dsuhinin <suhinin.dmitriy@gmail.com>
ItzDEXX pushed a commit to ItzDEXX/vllm that referenced this pull request Feb 19, 2026
… in reasoning parser (vllm-project#31581)

Signed-off-by: cjackal <44624812+cjackal@users.noreply.github.com>
Co-authored-by: Chauncey <chaunceyjiang@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

frontend ready ONLY add when PR is ready to merge/full CI is needed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants