[Frontend] [Bugfix] respect server-level default chat template kwargs in reasoning parser#31581
Conversation
Signed-off-by: cjackal <44624812+cjackal@users.noreply.github.com>
There was a problem hiding this comment.
Code Review
This pull request correctly fixes a bug where server-level default chat template keyword arguments were not being passed to the reasoning parser. The change ensures that both server-level and request-level keyword arguments are merged and used, which is crucial for features like the 'enable_thinking' switch in some reasoning models. The fix is applied to both streaming and non-streaming chat completion paths. However, this introduces code duplication. I've added a comment suggesting a refactoring to a helper method to improve maintainability and prevent future inconsistencies, which I consider a high-priority improvement to prevent similar bugs from recurring.
Signed-off-by: cjackal <44624812+cjackal@users.noreply.github.com>
Co-authored-by: Chauncey <chaunceyjiang@gmail.com> Signed-off-by: cjackal <44624812+cjackal@users.noreply.github.com>
|
It seems CI failed with infra-related (rate exceeded) issue |
… in reasoning parser (vllm-project#31581) Signed-off-by: cjackal <44624812+cjackal@users.noreply.github.com> Co-authored-by: Chauncey <chaunceyjiang@gmail.com>
… in reasoning parser (vllm-project#31581) Signed-off-by: cjackal <44624812+cjackal@users.noreply.github.com> Co-authored-by: Chauncey <chaunceyjiang@gmail.com>
… in reasoning parser (vllm-project#31581) Signed-off-by: cjackal <44624812+cjackal@users.noreply.github.com> Co-authored-by: Chauncey <chaunceyjiang@gmail.com>
… in reasoning parser (vllm-project#31581) Signed-off-by: cjackal <44624812+cjackal@users.noreply.github.com> Co-authored-by: Chauncey <chaunceyjiang@gmail.com> Signed-off-by: dsuhinin <suhinin.dmitriy@gmail.com>
… in reasoning parser (vllm-project#31581) Signed-off-by: cjackal <44624812+cjackal@users.noreply.github.com> Co-authored-by: Chauncey <chaunceyjiang@gmail.com>
Purpose
After #31343, the concept of server-wide default chat template keyword arguments (
--default-chat-template-kwargs) came up. These default kwargs are merged with the user-provided (request-wide) kwargs and then passed to the tokenizer.This logic works well for chat template rendering purpose, but vllm frontend has another place which consumes the chat template kwargs: reasoning parser.
Currently some reasoning models (e.g. DeepSeek V3.1, Holo2) support thinking on/off switch with the presence of "enable_thinking" keyword in the request. This keyword should be passed to the reasoning parser as well, otherwise it can happen that hybrid reasoning models process reasoning request using non-reasoning parser or vise versa.
This PR fixes this mismatch by applying the same keyword merge before passing to the reasoning parser.
Note
While writing this PR I saw that responses API does not reflect the chat template kwargs at all. Not in the scope of this PR, but it sounds necessary to pass the chat template kwargs (both the user-/server-side) to the responses processor as well for responses API to work properly.
Test Plan
Launch DeepSeek V3.1 with
--reasoning-parser deepseek_v3 --default-chat-template-kwargs '{"enable_thinking":true}'. See that the model responds correctly with reasoning parser for the following request.Test Result
{ "id":"chatcmpl-f9219592-21f1-9e5a-b92b-124ac8a8c1dc", "object":"chat.completion", "create":1767195458, "model":"DEEPSEEK-V3.1", "choices": [ { "index": 0, "message": { "role": "assistant", "content": "안녕하세요! 반갑습니다. 오늘 어떻게 도와드릴까요?", "refusal": null, "annotations": null, "audio": null, "function_call": null, "tool_calls": [], "reasoning": "The user initiated with \"안녕?\", a Korean greeting coupled with a question mark, indicating a welcoming and potentially inquisitive tone. Since this translates to \"Hello?\" in English, reflecting a standard conversational opening, my priority is to craft a reply that mirrors the warmth and language for effective rapport. \n\nOpting for a natural response in Korean not only aligns with linguistic consistency but also fosters comfort for the user. Reciprocating with \"안녕하세요!\" ensures a friendly and polite acknowledgment, while appending an offer of assistant line \"오늘 어떻게 도와드릴까요?\" expands the interaction to address their needs proactively. This approach maintains engagement and encourages further dialogue.", "reasoning_content": "The user initiated with \"안녕?\", a Korean greeting coupled with a question mark, indicating a welcoming and potentially inquisitive tone. Since this translates to \"Hello?\" in English, reflecting a standard conversational opening, my priority is to craft a reply that mirrors the warmth and language for effective rapport. \n\nOpting for a natural response in Korean not only aligns with linguistic consistency but also fosters comfort for the user. Reciprocating with \"안녕하세요!\" ensures a friendly and polite acknowledgment, while appending an offer of assistant line \"오늘 어떻게 도와드릴까요?\" expands the interaction to address their needs proactively. This approach maintains engagement and encourages further dialogue." }, "logprobs": null, "finish_reason": "stop", "stop_reason": 151336, "token_ids": null } ], "service_tier": null, "system_fingerprint": null, "usage": { "prompt_tokens": 9, "total_tokens": 183, "completion_tokens": 174, "prompt_tokens_details": null }, "prompt_logprobs": null, "prompt_token_ids": null, "kv_transfer_params": null }Essential Elements of an Effective PR Description Checklist
supported_models.mdandexamplesfor a new model.