[Frontend] Add sampling parameters to Responses API by DanielMe · Pull Request #32609 · vllm-project/vllm

DanielMe · 2026-01-19T15:42:43Z

Purpose

Port essential sampling parameters from /v1/chat/completions to /v1/responses API to provide basic generation control for users of the Responses API.

Added Parameters:

~~presence_penalty, frequency_penalty,~~ repetition_penalty - repetition control
~~min_p,~~ seed, stop, ignore_eos, ~~min_tokens~~ - sampling control
~~prompt_logprobs, spaces_between_special_tokens~~ - ~~output control~~
~~include_stop_str_in_output, truncate_prompt_tokens~~ - ~~formatting control~~
~~logits_processors, allowed_token_ids,~~ vllm_xargs - advanced control

Key Changes:

Extended ResponsesRequest protocol with 5 core sampling parameters
~~Added logits_processor_pattern parameter to to_sampling_params() method~~

Test Plan

Unit Tests:

Adding unit tests just to validate the API mapping:

pytest tests/entrypoints/openai/responses/test_sampling_params.py -v

Integration Test:

Adding an integration test to prove that the end-to-end call works with the new parameters.

pytest tests/entrypoints/openai/responses/test_simple.py::test_extra_sampling_params -v

Test Result

tests/entrypoints/openai/responses/test_sampling_params.py::TestResponsesRequestSamplingParams::test_basic_sampling_params PASSED                   [ 20%]
tests/entrypoints/openai/responses/test_sampling_params.py::TestResponsesRequestSamplingParams::test_extra_sampling_params PASSED                   [ 40%]
tests/entrypoints/openai/responses/test_sampling_params.py::TestResponsesRequestSamplingParams::test_stop_string_conversion PASSED                  [ 60%]
tests/entrypoints/openai/responses/test_sampling_params.py::TestResponsesRequestSamplingParams::test_default_values PASSED                          [ 80%]
tests/entrypoints/openai/responses/test_sampling_params.py::TestResponsesRequestSamplingParams::test_seed_bounds_validation PASSED                  [100%]

tests/entrypoints/openai/responses/test_simple.py::test_basic[Qwen/Qwen3-8B] PASSED                                                                                                 [ 16%]
tests/entrypoints/openai/responses/test_simple.py::test_enable_response_messages[Qwen/Qwen3-8B] PASSED                                                                              [ 33%]
tests/entrypoints/openai/responses/test_simple.py::test_reasoning_item[Qwen/Qwen3-8B] PASSED                                                                                        [ 50%]
tests/entrypoints/openai/responses/test_simple.py::test_streaming_output_consistency[Qwen/Qwen3-8B] PASSED                                                                          [ 66%]
tests/entrypoints/openai/responses/test_simple.py::test_max_tokens[Qwen/Qwen3-8B] PASSED                                                                                            [ 83%]
tests/entrypoints/openai/responses/test_simple.py::test_extra_sampling_params[Qwen/Qwen3-8B] PASSED                                                                                 [100%]

Essential Elements of an Effective PR Description Checklist

The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
The test plan, such as providing test command.
The test results, such as pasting the results comparison before and after, or e2e results
(Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
(Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

mergify · 2026-01-19T15:47:03Z

Hi @DanielMe, the pre-commit checks have failed. Please run:

uv pip install pre-commit
pre-commit install
pre-commit run --all-files

Then, commit the changes and push to your branch.

For future commits, pre-commit will run automatically on changed files before each commit.

Tip

Is mypy or markdownlint failing?

mypy and markdownlint are run differently in CI. If the failure is related to either of these checks, please use the following commands to run them locally:

# For mypy (substitute "3.10" with the failing version if needed)
pre-commit run --hook-stage manual mypy-3.10
# For markdownlint
pre-commit run --hook-stage manual markdownlint

gemini-code-assist

Code Review

This pull request adds several sampling parameters to the /v1/responses API to align it with the /v1/chat/completions endpoint, which is a great step towards feature parity. The changes are well-tested with new unit and integration tests. I've found one high-severity issue related to inconsistent handling of default sampling parameters, which could cause model-wide configurations to be ignored. My review includes a suggestion to address this.

vllm/entrypoints/openai/responses/protocol.py

DanielMe · 2026-01-20T10:20:27Z

This is a user-facing change. Happy to provide a snippet for the the release notes draft in the Google Doc. As it stands I do not have access to that doc to see what the structure is like and what it already contains for the Responses API.

chaunceyjiang · 2026-01-21T10:48:45Z

https://www.openresponses.org/reference

Actually, the Responses API has a public specification now. Many of the parameters in this PR come from ChatCompletion, so I’d suggest using the ChatCompletion interface directly instead.

My main concern is that introducing these parameters could lead to conflicts with the public Responses API specification in the future.

DanielMe · 2026-01-21T11:28:40Z

Thanks, that spec is a good pointer. I’m actually leaning on that public spec as a justification for this PR, not an argument against it.

The Open Responses spec explicitly allows implementations to extend existing schemas with additional, implementation-specific fields, as long as core semantics stay intact, fields are optional, and extensions are documented (and treated as vendor-specific). That’s exactly what this PR is doing: it adds optional request fields to expose vLLM-specific capabilities without changing the required behavior of the Responses API.

Also, vLLM in general and the Responses API in particular already has an established pattern of supporting “extra parameters” beyond the vanilla OpenAI surface (documented in the OpenAI-compatible server docs), and /v1/responses already supports extra request fields today (e.g., request_id, priority, cache_salt, top_k etc.). So extending /responses is consistent with existing vLLM behavior rather than a new precedent. On the “just use ChatCompletions instead” point: the main goal here is parity so users who adopt /v1/responses (and the Open Responses ecosystem built around it) don’t have to fall back to /chat/completions just to access basic sampling controls. This PR is narrowly scoped to port sampling knobs that already exist in vLLM’s ChatCompletions surface over to Responses.

Re: future conflicts with the spec - since these are optional extensions, the blast radius is low:

If the spec later standardizes some of these fields with the same semantics, we’re already aligned.
If the spec standardizes something different, we can alias/deprecate with a compatibility window.

If you’d prefer to minimize collision risk up front, we could also rename/namespace the fields (e.g., vllm_* or push them under vllm_xargs) while keeping the same functionality.

If the maintainers’ preference is “keep /responses minimal and push advanced control to /chat/completions”, I get the trade-off - but I think it undercuts the intent of adopting /responses as the forward-looking surface when those controls already exist and are safe to expose as vendor extensions.

qandrew · 2026-01-22T20:41:12Z

hi @DanielMe thanks for putting this together! i'm curious is someone asking for all these extra params to reach parity with chat completions?

In general I am happy to add more features to responsesAPI, if there is a need for it, as maintenance of these extra params will need to happen

DanielMe · 2026-01-22T22:00:37Z

Thanks @qandrew ! This PR comes out of a concrete need for stop and vllm_xargs primarily and then, a bit less important for our use case, seed, repetition_penalty, prompt_logprobs and ignore_eos for running a custom model. We have a preference for using the Responses API over Chat Completions because of the better structured output and the momentum behind this API.

While working on this proposal I figured that other people may find use for the other sampling params already supported by Chat Completions and they turned out easy to port.

That being said, I can see the case for going on a case-by-case basis depending on concrete need to avoid committing to future maintenance of an attribute nobody asked for.

I could reduce the PR to only the six fields for which we have concrete use cases or only the two fields which are the most important ones?

Port essential sampling parameters from ChatCompletionRequest to ResponsesRequest to provide basic generation control for /v1/responses users. Added parameters: - stop: Stop sequences for generation control - seed: Random seed for reproducible generation - repetition_penalty: Control repetition in generated text - ignore_eos: Whether to ignore end-of-sequence tokens - vllm_xargs: Custom extension arguments for advanced use cases Changes: - Add 5 parameter fields to ResponsesRequest in protocol.py - Update to_sampling_params() with default value handling for repetition_penalty - Add unit tests for parameter mapping (5 tests covering all parameters) - Add integration test for end-to-end validation Signed-off-by: Daniel Mescheder <dmesch@amazon.com>

DanielMe · 2026-01-26T21:15:20Z

@qandrew , I have reduced the PR to only those fields for which we have a validated immediate need. Does this look better?

chaunceyjiang · 2026-01-28T13:38:12Z

vllm/entrypoints/openai/responses/protocol.py

    # TODO: consider supporting non harmony messages as well
    previous_input_messages: list[OpenAIHarmonyMessage | dict] | None = None
+
+    repetition_penalty: float | None = None


Compared to the previous version, these additions make it much more reasonable.

Let's invite others to review this.

/cc @qandrew @yeqcharlotte @DarkLight1337

qandrew

lgtm, thanks!

qandrew · 2026-01-30T07:31:57Z

cc @chaunceyjiang we can merge in?

chaunceyjiang · 2026-01-30T07:33:23Z

@qandrew ok

DanielMe · 2026-02-02T20:33:36Z

Thanks. If I understand the merge policy correctly, this will also require review from @NickLucche, @aarnphm , @DarkLight1337 and @robertgshaw2-redhat .

Signed-off-by: Daniel Mescheder <dmesch@amazon.com> Co-authored-by: Daniel Mescheder <dmesch@amazon.com> Signed-off-by: Pai <416932041@qq.com>

Signed-off-by: Daniel Mescheder <dmesch@amazon.com> Co-authored-by: Daniel Mescheder <dmesch@amazon.com> Signed-off-by: felix01.yu <felix01.yu@vipshop.com>

Signed-off-by: Daniel Mescheder <dmesch@amazon.com> Co-authored-by: Daniel Mescheder <dmesch@amazon.com>

DanielMe requested review from DarkLight1337, NickLucche, aarnphm, chaunceyjiang and robertgshaw2-redhat as code owners January 19, 2026 15:42

mergify bot added the frontend label Jan 19, 2026

gemini-code-assist bot reviewed Jan 19, 2026

View reviewed changes

vllm/entrypoints/openai/responses/protocol.py Outdated Show resolved Hide resolved

DanielMe force-pushed the add-sampling-parameters-to-responses branch 2 times, most recently from 4bd4bc4 to c3d4c7e Compare January 19, 2026 19:10

DanielMe mentioned this pull request Jan 22, 2026

[RFC]: Clarify policy for Open Responses API extensions in vLLM #32850

Open

1 task

DanielMe force-pushed the add-sampling-parameters-to-responses branch from aa4c7fc to 8eb2f24 Compare January 26, 2026 21:08

DanielMe added 2 commits January 27, 2026 09:56

Merge branch 'main' into add-sampling-parameters-to-responses

91fcfae

Merge branch 'main' into add-sampling-parameters-to-responses

c41ee2c

chaunceyjiang reviewed Jan 28, 2026

View reviewed changes

qandrew approved these changes Jan 28, 2026

View reviewed changes

Merge branch 'main' into add-sampling-parameters-to-responses

8be34d4

chaunceyjiang mentioned this pull request Jan 30, 2026

[RFC]: Align with the openresponses spec. #33381

Open

3 tasks

chaunceyjiang added the ready ONLY add when PR is ready to merge/full CI is needed label Jan 30, 2026

chaunceyjiang enabled auto-merge (squash) January 30, 2026 07:33

Merge branch 'main' into add-sampling-parameters-to-responses

bb8bfaf

DarkLight1337 approved these changes Feb 3, 2026

View reviewed changes

chaunceyjiang merged commit 4c4b6f7 into vllm-project:main Feb 3, 2026
43 checks passed

DanielMe deleted the add-sampling-parameters-to-responses branch February 3, 2026 09:11

ItzDEXX pushed a commit to ItzDEXX/vllm that referenced this pull request Feb 19, 2026

[Frontend] Add sampling parameters to Responses API (vllm-project#32609)

0d16e06

Signed-off-by: Daniel Mescheder <dmesch@amazon.com> Co-authored-by: Daniel Mescheder <dmesch@amazon.com>

This was referenced Mar 3, 2026

[Responses API] Structured output + reasoning via structural tag embedding #35873

Closed

[Responses API] Structured output + reasoning via structural tag embedding #35904

Open

Uh oh!

Conversation

DanielMe commented Jan 19, 2026 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Test Plan

Test Result

Uh oh!

mergify bot commented Jan 19, 2026

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

DanielMe commented Jan 20, 2026

Uh oh!

chaunceyjiang commented Jan 21, 2026

Uh oh!

DanielMe commented Jan 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

qandrew commented Jan 22, 2026

Uh oh!

DanielMe commented Jan 22, 2026

Uh oh!

DanielMe commented Jan 26, 2026

Uh oh!

chaunceyjiang Jan 28, 2026

Choose a reason for hiding this comment

Uh oh!

qandrew left a comment

Choose a reason for hiding this comment

Uh oh!

qandrew commented Jan 30, 2026

Uh oh!

chaunceyjiang commented Jan 30, 2026

Uh oh!

DanielMe commented Feb 2, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

DanielMe commented Jan 19, 2026 •

edited by github-actions bot

Loading

DanielMe commented Jan 21, 2026 •

edited

Loading