Skip to content

[Frontend] Add sampling parameters to Responses API#32609

Merged
chaunceyjiang merged 5 commits intovllm-project:mainfrom
DanielMe:add-sampling-parameters-to-responses
Feb 3, 2026
Merged

[Frontend] Add sampling parameters to Responses API#32609
chaunceyjiang merged 5 commits intovllm-project:mainfrom
DanielMe:add-sampling-parameters-to-responses

Conversation

@DanielMe
Copy link
Copy Markdown
Contributor

@DanielMe DanielMe commented Jan 19, 2026

Purpose

Port essential sampling parameters from /v1/chat/completions to /v1/responses API to provide basic generation control for users of the Responses API.

Added Parameters:

  • presence_penalty, frequency_penalty, repetition_penalty - repetition control
  • min_p, seed, stop, ignore_eos, min_tokens - sampling control
  • prompt_logprobs, spaces_between_special_tokens - output control
  • include_stop_str_in_output, truncate_prompt_tokens - formatting control
  • logits_processors, allowed_token_ids, vllm_xargs - advanced control

Key Changes:

  • Extended ResponsesRequest protocol with 5 core sampling parameters
  • Added logits_processor_pattern parameter to to_sampling_params() method

Test Plan

Unit Tests:

Adding unit tests just to validate the API mapping:

pytest tests/entrypoints/openai/responses/test_sampling_params.py -v

Integration Test:

Adding an integration test to prove that the end-to-end call works with the new parameters.

pytest tests/entrypoints/openai/responses/test_simple.py::test_extra_sampling_params -v

Test Result

tests/entrypoints/openai/responses/test_sampling_params.py::TestResponsesRequestSamplingParams::test_basic_sampling_params PASSED                   [ 20%]
tests/entrypoints/openai/responses/test_sampling_params.py::TestResponsesRequestSamplingParams::test_extra_sampling_params PASSED                   [ 40%]
tests/entrypoints/openai/responses/test_sampling_params.py::TestResponsesRequestSamplingParams::test_stop_string_conversion PASSED                  [ 60%]
tests/entrypoints/openai/responses/test_sampling_params.py::TestResponsesRequestSamplingParams::test_default_values PASSED                          [ 80%]
tests/entrypoints/openai/responses/test_sampling_params.py::TestResponsesRequestSamplingParams::test_seed_bounds_validation PASSED                  [100%] 
tests/entrypoints/openai/responses/test_simple.py::test_basic[Qwen/Qwen3-8B] PASSED                                                                                                 [ 16%]
tests/entrypoints/openai/responses/test_simple.py::test_enable_response_messages[Qwen/Qwen3-8B] PASSED                                                                              [ 33%]
tests/entrypoints/openai/responses/test_simple.py::test_reasoning_item[Qwen/Qwen3-8B] PASSED                                                                                        [ 50%]
tests/entrypoints/openai/responses/test_simple.py::test_streaming_output_consistency[Qwen/Qwen3-8B] PASSED                                                                          [ 66%]
tests/entrypoints/openai/responses/test_simple.py::test_max_tokens[Qwen/Qwen3-8B] PASSED                                                                                            [ 83%]
tests/entrypoints/openai/responses/test_simple.py::test_extra_sampling_params[Qwen/Qwen3-8B] PASSED                                                                                 [100%]

Essential Elements of an Effective PR Description Checklist
  • The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
  • The test plan, such as providing test command.
  • The test results, such as pasting the results comparison before and after, or e2e results
  • (Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
  • (Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

@mergify
Copy link
Copy Markdown

mergify bot commented Jan 19, 2026

Hi @DanielMe, the pre-commit checks have failed. Please run:

uv pip install pre-commit
pre-commit install
pre-commit run --all-files

Then, commit the changes and push to your branch.

For future commits, pre-commit will run automatically on changed files before each commit.

Tip

Is mypy or markdownlint failing?
mypy and markdownlint are run differently in CI. If the failure is related to either of these checks, please use the following commands to run them locally:
# For mypy (substitute "3.10" with the failing version if needed)
pre-commit run --hook-stage manual mypy-3.10
# For markdownlint
pre-commit run --hook-stage manual markdownlint

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request adds several sampling parameters to the /v1/responses API to align it with the /v1/chat/completions endpoint, which is a great step towards feature parity. The changes are well-tested with new unit and integration tests. I've found one high-severity issue related to inconsistent handling of default sampling parameters, which could cause model-wide configurations to be ignored. My review includes a suggestion to address this.

@DanielMe DanielMe force-pushed the add-sampling-parameters-to-responses branch 2 times, most recently from 4bd4bc4 to c3d4c7e Compare January 19, 2026 19:10
@DanielMe
Copy link
Copy Markdown
Contributor Author

This is a user-facing change. Happy to provide a snippet for the the release notes draft in the Google Doc. As it stands I do not have access to that doc to see what the structure is like and what it already contains for the Responses API.

@chaunceyjiang
Copy link
Copy Markdown
Collaborator

https://www.openresponses.org/reference

Actually, the Responses API has a public specification now. Many of the parameters in this PR come from ChatCompletion, so I’d suggest using the ChatCompletion interface directly instead.

My main concern is that introducing these parameters could lead to conflicts with the public Responses API specification in the future.

@DanielMe
Copy link
Copy Markdown
Contributor Author

DanielMe commented Jan 21, 2026

Thanks, that spec is a good pointer. I’m actually leaning on that public spec as a justification for this PR, not an argument against it.

The Open Responses spec explicitly allows implementations to extend existing schemas with additional, implementation-specific fields, as long as core semantics stay intact, fields are optional, and extensions are documented (and treated as vendor-specific). That’s exactly what this PR is doing: it adds optional request fields to expose vLLM-specific capabilities without changing the required behavior of the Responses API.

Also, vLLM in general and the Responses API in particular already has an established pattern of supporting “extra parameters” beyond the vanilla OpenAI surface (documented in the OpenAI-compatible server docs), and /v1/responses already supports extra request fields today (e.g., request_id, priority, cache_salt, top_k etc.). So extending /responses is consistent with existing vLLM behavior rather than a new precedent. On the “just use ChatCompletions instead” point: the main goal here is parity so users who adopt /v1/responses (and the Open Responses ecosystem built around it) don’t have to fall back to /chat/completions just to access basic sampling controls. This PR is narrowly scoped to port sampling knobs that already exist in vLLM’s ChatCompletions surface over to Responses.

Re: future conflicts with the spec - since these are optional extensions, the blast radius is low:

  • If the spec later standardizes some of these fields with the same semantics, we’re already aligned.
  • If the spec standardizes something different, we can alias/deprecate with a compatibility window.

If you’d prefer to minimize collision risk up front, we could also rename/namespace the fields (e.g., vllm_* or push them under vllm_xargs) while keeping the same functionality.

If the maintainers’ preference is “keep /responses minimal and push advanced control to /chat/completions”, I get the trade-off - but I think it undercuts the intent of adopting /responses as the forward-looking surface when those controls already exist and are safe to expose as vendor extensions.

@qandrew
Copy link
Copy Markdown
Contributor

qandrew commented Jan 22, 2026

hi @DanielMe thanks for putting this together! i'm curious is someone asking for all these extra params to reach parity with chat completions?

In general I am happy to add more features to responsesAPI, if there is a need for it, as maintenance of these extra params will need to happen

@DanielMe
Copy link
Copy Markdown
Contributor Author

Thanks @qandrew ! This PR comes out of a concrete need for stop and vllm_xargs primarily and then, a bit less important for our use case, seed, repetition_penalty, prompt_logprobs and ignore_eos for running a custom model. We have a preference for using the Responses API over Chat Completions because of the better structured output and the momentum behind this API.

While working on this proposal I figured that other people may find use for the other sampling params already supported by Chat Completions and they turned out easy to port.

That being said, I can see the case for going on a case-by-case basis depending on concrete need to avoid committing to future maintenance of an attribute nobody asked for.

I could reduce the PR to only the six fields for which we have concrete use cases or only the two fields which are the most important ones?

Port essential sampling parameters from ChatCompletionRequest to ResponsesRequest
to provide basic generation control for /v1/responses users.

Added parameters:
- stop: Stop sequences for generation control
- seed: Random seed for reproducible generation
- repetition_penalty: Control repetition in generated text
- ignore_eos: Whether to ignore end-of-sequence tokens
- vllm_xargs: Custom extension arguments for advanced use cases

Changes:
- Add 5 parameter fields to ResponsesRequest in protocol.py
- Update to_sampling_params() with default value handling for repetition_penalty
- Add unit tests for parameter mapping (5 tests covering all parameters)
- Add integration test for end-to-end validation

Signed-off-by: Daniel Mescheder <dmesch@amazon.com>
@DanielMe DanielMe force-pushed the add-sampling-parameters-to-responses branch from aa4c7fc to 8eb2f24 Compare January 26, 2026 21:08
@DanielMe
Copy link
Copy Markdown
Contributor Author

@qandrew , I have reduced the PR to only those fields for which we have a validated immediate need. Does this look better?

# TODO: consider supporting non harmony messages as well
previous_input_messages: list[OpenAIHarmonyMessage | dict] | None = None

repetition_penalty: float | None = None
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Compared to the previous version, these additions make it much more reasonable.

Let's invite others to review this.

/cc @qandrew @yeqcharlotte @DarkLight1337

Copy link
Copy Markdown
Contributor

@qandrew qandrew left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm, thanks!

@qandrew
Copy link
Copy Markdown
Contributor

qandrew commented Jan 30, 2026

cc @chaunceyjiang we can merge in?

@chaunceyjiang
Copy link
Copy Markdown
Collaborator

@qandrew ok

@chaunceyjiang chaunceyjiang added the ready ONLY add when PR is ready to merge/full CI is needed label Jan 30, 2026
@chaunceyjiang chaunceyjiang enabled auto-merge (squash) January 30, 2026 07:33
@DanielMe
Copy link
Copy Markdown
Contributor Author

DanielMe commented Feb 2, 2026

Thanks. If I understand the merge policy correctly, this will also require review from @NickLucche, @aarnphm , @DarkLight1337 and @robertgshaw2-redhat .

@chaunceyjiang chaunceyjiang merged commit 4c4b6f7 into vllm-project:main Feb 3, 2026
43 checks passed
@DanielMe DanielMe deleted the add-sampling-parameters-to-responses branch February 3, 2026 09:11
PiratePai pushed a commit to PiratePai/epd_shm that referenced this pull request Feb 3, 2026
Signed-off-by: Daniel Mescheder <dmesch@amazon.com>
Co-authored-by: Daniel Mescheder <dmesch@amazon.com>
Signed-off-by: Pai <416932041@qq.com>
PiratePai pushed a commit to PiratePai/epd_shm that referenced this pull request Feb 3, 2026
Signed-off-by: Daniel Mescheder <dmesch@amazon.com>
Co-authored-by: Daniel Mescheder <dmesch@amazon.com>
Signed-off-by: Pai <416932041@qq.com>
gameofdimension pushed a commit to gameofdimension/vllm that referenced this pull request Feb 5, 2026
Signed-off-by: Daniel Mescheder <dmesch@amazon.com>
Co-authored-by: Daniel Mescheder <dmesch@amazon.com>
Signed-off-by: felix01.yu <felix01.yu@vipshop.com>
ItzDEXX pushed a commit to ItzDEXX/vllm that referenced this pull request Feb 19, 2026
Signed-off-by: Daniel Mescheder <dmesch@amazon.com>
Co-authored-by: Daniel Mescheder <dmesch@amazon.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

frontend ready ONLY add when PR is ready to merge/full CI is needed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants