Skip to content

[Frontend] Fix default_chat_template_kwargs handling in Responses API#37739

Open
sidsaha-ai wants to merge 2 commits intovllm-project:mainfrom
sidsaha-ai:fix/responses-default-chat-template-kwargs
Open

[Frontend] Fix default_chat_template_kwargs handling in Responses API#37739
sidsaha-ai wants to merge 2 commits intovllm-project:mainfrom
sidsaha-ai:fix/responses-default-chat-template-kwargs

Conversation

@sidsaha-ai
Copy link
Copy Markdown

Summary

--default-chat-template-kwargs was already available in the shared render stack, but the /v1/responses serving path still dropped those defaults when building prompts and when instantiating the reasoning parser used to post-process non-streaming responses.

This meant Responses API requests could still behave as if Qwen3 thinking was enabled even when the server was started with --default-chat-template-kwargs '{"enable_thinking": false}', which in turn could leave output_text empty and move all generated text into reasoning output.

Changes

  • pass default_chat_template_kwargs into OpenAIServingResponses
  • add chat_template_kwargs to ResponsesRequest
  • merge server defaults with per-request chat_template_kwargs for responses prompt rendering
  • pass the merged template kwargs through all responses-side reasoning parser paths, including the unified non-streaming parser wrapper
  • document chat_template_kwargs support for /v1/responses
  • add unit coverage plus an end-to-end Responses API regression test for server defaults and per-request override

Testing

  • PATH="/Users/siddharthsaha/python_envs/vllm-pr/bin:$PATH" /Users/siddharthsaha/python_envs/vllm-pr/bin/python -m pytest tests/entrypoints/openai/responses/test_protocol.py -q
  • PATH="/Users/siddharthsaha/python_envs/vllm-pr/bin:$PATH" /Users/siddharthsaha/python_envs/vllm-pr/bin/python -m pytest tests/entrypoints/openai/responses/test_serving_responses.py -q
  • PATH="/Users/siddharthsaha/python_envs/vllm-pr/bin:$PATH" /Users/siddharthsaha/python_envs/vllm-pr/bin/python -m pytest tests/entrypoints/openai/responses/test_chat_template_kwargs.py -q
  • PATH="/Users/siddharthsaha/python_envs/vllm-pr/bin:$PATH" /Users/siddharthsaha/python_envs/vllm-pr/bin/pre-commit run --files docs/features/reasoning_outputs.md tests/entrypoints/openai/responses/test_protocol.py tests/entrypoints/openai/responses/test_serving_responses.py vllm/entrypoints/openai/generate/api_router.py vllm/entrypoints/openai/parser/responses_parser.py vllm/entrypoints/openai/responses/context.py vllm/entrypoints/openai/responses/protocol.py vllm/entrypoints/openai/responses/serving.py vllm/parser/abstract_parser.py

Related

This addresses the /v1/responses side of default/per-request chat template kwargs handling for reasoning models.

Signed-off-by: Sid Saha <siddharthsaha@Siddharths-MacBook-Pro.local>
@github-actions
Copy link
Copy Markdown

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

If you have any questions, please reach out to us on Slack at https://slack.vllm.ai.

🚀

@mergify
Copy link
Copy Markdown

mergify bot commented Mar 21, 2026

Documentation preview: https://vllm--37739.org.readthedocs.build/en/37739/

@mergify mergify bot added documentation Improvements or additions to documentation frontend labels Mar 21, 2026
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request correctly addresses the issue of default_chat_template_kwargs not being handled in the /v1/responses API. The changes effectively propagate both server-level defaults and per-request chat_template_kwargs to the prompt rendering and reasoning parser logic. The added unit and end-to-end tests provide good coverage for the new functionality. I found one minor issue in the documentation that needs to be addressed.


## Limitations

- The reasoning content is only available for online serving's chat completion endpoint (`/v1/chat/completions`).
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

This documentation appears to be outdated with the changes in this PR. While this PR adds support for reasoning outputs in the /v1/responses endpoint, this line still states that reasoning content is only available for /v1/chat/completions. This should be updated to include /v1/responses to reflect the new capability.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in fac98b1 by updating the limitations section to include /v1/responses alongside /v1/chat/completions.

Signed-off-by: Sid Saha <siddharthsaha@Siddharths-MacBook-Pro.local>
"and vLLM will ignore it."
),
)
chat_template_kwargs: dict[str, Any] | None = Field(
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks~ @sidsaha-ai

This is a known issue. The reason we haven’t implemented it so far is that we wanted to wait and see whether OpenAI would introduce a similar field.

Otherwise, introducing these fields would cause the Responses API to overlap with chat completions.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool. Should we then wait and close this PR? Or should I go ahead with rebasing and can get approval.

@mergify
Copy link
Copy Markdown

mergify bot commented Mar 24, 2026

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @sidsaha-ai.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

documentation Improvements or additions to documentation frontend needs-rebase

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants