[Frontend] Fix default_chat_template_kwargs handling in Responses API#37739
[Frontend] Fix default_chat_template_kwargs handling in Responses API#37739sidsaha-ai wants to merge 2 commits intovllm-project:mainfrom
Conversation
Signed-off-by: Sid Saha <siddharthsaha@Siddharths-MacBook-Pro.local>
|
👋 Hi! Thank you for contributing to the vLLM project. 💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels. Just a reminder: PRs would not trigger full CI run by default. Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging. To run CI, PR reviewers can either: Add If you have any questions, please reach out to us on Slack at https://slack.vllm.ai. 🚀 |
|
Documentation preview: https://vllm--37739.org.readthedocs.build/en/37739/ |
There was a problem hiding this comment.
Code Review
This pull request correctly addresses the issue of default_chat_template_kwargs not being handled in the /v1/responses API. The changes effectively propagate both server-level defaults and per-request chat_template_kwargs to the prompt rendering and reasoning parser logic. The added unit and end-to-end tests provide good coverage for the new functionality. I found one minor issue in the documentation that needs to be addressed.
docs/features/reasoning_outputs.md
Outdated
|
|
||
| ## Limitations | ||
|
|
||
| - The reasoning content is only available for online serving's chat completion endpoint (`/v1/chat/completions`). |
There was a problem hiding this comment.
This documentation appears to be outdated with the changes in this PR. While this PR adds support for reasoning outputs in the /v1/responses endpoint, this line still states that reasoning content is only available for /v1/chat/completions. This should be updated to include /v1/responses to reflect the new capability.
There was a problem hiding this comment.
Fixed in fac98b1 by updating the limitations section to include /v1/responses alongside /v1/chat/completions.
Signed-off-by: Sid Saha <siddharthsaha@Siddharths-MacBook-Pro.local>
| "and vLLM will ignore it." | ||
| ), | ||
| ) | ||
| chat_template_kwargs: dict[str, Any] | None = Field( |
There was a problem hiding this comment.
Thanks~ @sidsaha-ai
This is a known issue. The reason we haven’t implemented it so far is that we wanted to wait and see whether OpenAI would introduce a similar field.
Otherwise, introducing these fields would cause the Responses API to overlap with chat completions.
There was a problem hiding this comment.
Cool. Should we then wait and close this PR? Or should I go ahead with rebasing and can get approval.
|
This pull request has merge conflicts that must be resolved before it can be |
Summary
--default-chat-template-kwargswas already available in the shared render stack, but the/v1/responsesserving path still dropped those defaults when building prompts and when instantiating the reasoning parser used to post-process non-streaming responses.This meant Responses API requests could still behave as if Qwen3 thinking was enabled even when the server was started with
--default-chat-template-kwargs '{"enable_thinking": false}', which in turn could leaveoutput_textempty and move all generated text into reasoning output.Changes
default_chat_template_kwargsintoOpenAIServingResponseschat_template_kwargstoResponsesRequestchat_template_kwargsfor responses prompt renderingchat_template_kwargssupport for/v1/responsesTesting
PATH="/Users/siddharthsaha/python_envs/vllm-pr/bin:$PATH" /Users/siddharthsaha/python_envs/vllm-pr/bin/python -m pytest tests/entrypoints/openai/responses/test_protocol.py -qPATH="/Users/siddharthsaha/python_envs/vllm-pr/bin:$PATH" /Users/siddharthsaha/python_envs/vllm-pr/bin/python -m pytest tests/entrypoints/openai/responses/test_serving_responses.py -qPATH="/Users/siddharthsaha/python_envs/vllm-pr/bin:$PATH" /Users/siddharthsaha/python_envs/vllm-pr/bin/python -m pytest tests/entrypoints/openai/responses/test_chat_template_kwargs.py -qPATH="/Users/siddharthsaha/python_envs/vllm-pr/bin:$PATH" /Users/siddharthsaha/python_envs/vllm-pr/bin/pre-commit run --files docs/features/reasoning_outputs.md tests/entrypoints/openai/responses/test_protocol.py tests/entrypoints/openai/responses/test_serving_responses.py vllm/entrypoints/openai/generate/api_router.py vllm/entrypoints/openai/parser/responses_parser.py vllm/entrypoints/openai/responses/context.py vllm/entrypoints/openai/responses/protocol.py vllm/entrypoints/openai/responses/serving.py vllm/parser/abstract_parser.pyRelated
This addresses the
/v1/responsesside of default/per-request chat template kwargs handling for reasoning models.