[Frontend] Add sampling parameters to Responses API#32609
[Frontend] Add sampling parameters to Responses API#32609chaunceyjiang merged 5 commits intovllm-project:mainfrom
Conversation
|
Hi @DanielMe, the pre-commit checks have failed. Please run: uv pip install pre-commit
pre-commit install
pre-commit run --all-filesThen, commit the changes and push to your branch. For future commits, Tip Is
|
There was a problem hiding this comment.
Code Review
This pull request adds several sampling parameters to the /v1/responses API to align it with the /v1/chat/completions endpoint, which is a great step towards feature parity. The changes are well-tested with new unit and integration tests. I've found one high-severity issue related to inconsistent handling of default sampling parameters, which could cause model-wide configurations to be ignored. My review includes a suggestion to address this.
4bd4bc4 to
c3d4c7e
Compare
|
This is a user-facing change. Happy to provide a snippet for the the release notes draft in the Google Doc. As it stands I do not have access to that doc to see what the structure is like and what it already contains for the Responses API. |
|
https://www.openresponses.org/reference Actually, the Responses API has a public specification now. Many of the parameters in this PR come from ChatCompletion, so I’d suggest using the ChatCompletion interface directly instead. My main concern is that introducing these parameters could lead to conflicts with the public Responses API specification in the future. |
|
Thanks, that spec is a good pointer. I’m actually leaning on that public spec as a justification for this PR, not an argument against it. The Open Responses spec explicitly allows implementations to extend existing schemas with additional, implementation-specific fields, as long as core semantics stay intact, fields are optional, and extensions are documented (and treated as vendor-specific). That’s exactly what this PR is doing: it adds optional request fields to expose vLLM-specific capabilities without changing the required behavior of the Responses API. Also, vLLM in general and the Responses API in particular already has an established pattern of supporting “extra parameters” beyond the vanilla OpenAI surface (documented in the OpenAI-compatible server docs), and Re: future conflicts with the spec - since these are optional extensions, the blast radius is low:
If you’d prefer to minimize collision risk up front, we could also rename/namespace the fields (e.g., vllm_* or push them under If the maintainers’ preference is “keep |
|
hi @DanielMe thanks for putting this together! i'm curious is someone asking for all these extra params to reach parity with chat completions? In general I am happy to add more features to responsesAPI, if there is a need for it, as maintenance of these extra params will need to happen |
|
Thanks @qandrew ! This PR comes out of a concrete need for While working on this proposal I figured that other people may find use for the other sampling params already supported by Chat Completions and they turned out easy to port. That being said, I can see the case for going on a case-by-case basis depending on concrete need to avoid committing to future maintenance of an attribute nobody asked for. I could reduce the PR to only the six fields for which we have concrete use cases or only the two fields which are the most important ones? |
Port essential sampling parameters from ChatCompletionRequest to ResponsesRequest to provide basic generation control for /v1/responses users. Added parameters: - stop: Stop sequences for generation control - seed: Random seed for reproducible generation - repetition_penalty: Control repetition in generated text - ignore_eos: Whether to ignore end-of-sequence tokens - vllm_xargs: Custom extension arguments for advanced use cases Changes: - Add 5 parameter fields to ResponsesRequest in protocol.py - Update to_sampling_params() with default value handling for repetition_penalty - Add unit tests for parameter mapping (5 tests covering all parameters) - Add integration test for end-to-end validation Signed-off-by: Daniel Mescheder <dmesch@amazon.com>
aa4c7fc to
8eb2f24
Compare
|
@qandrew , I have reduced the PR to only those fields for which we have a validated immediate need. Does this look better? |
| # TODO: consider supporting non harmony messages as well | ||
| previous_input_messages: list[OpenAIHarmonyMessage | dict] | None = None | ||
|
|
||
| repetition_penalty: float | None = None |
There was a problem hiding this comment.
Compared to the previous version, these additions make it much more reasonable.
Let's invite others to review this.
|
cc @chaunceyjiang we can merge in? |
|
@qandrew ok |
|
Thanks. If I understand the merge policy correctly, this will also require review from @NickLucche, @aarnphm , @DarkLight1337 and @robertgshaw2-redhat . |
Signed-off-by: Daniel Mescheder <dmesch@amazon.com> Co-authored-by: Daniel Mescheder <dmesch@amazon.com> Signed-off-by: Pai <416932041@qq.com>
Signed-off-by: Daniel Mescheder <dmesch@amazon.com> Co-authored-by: Daniel Mescheder <dmesch@amazon.com> Signed-off-by: Pai <416932041@qq.com>
Signed-off-by: Daniel Mescheder <dmesch@amazon.com> Co-authored-by: Daniel Mescheder <dmesch@amazon.com> Signed-off-by: felix01.yu <felix01.yu@vipshop.com>
Signed-off-by: Daniel Mescheder <dmesch@amazon.com> Co-authored-by: Daniel Mescheder <dmesch@amazon.com>
Purpose
Port essential sampling parameters from
/v1/chat/completionsto/v1/responsesAPI to provide basic generation control for users of the Responses API.Added Parameters:
presence_penalty,frequency_penalty,repetition_penalty- repetition controlmin_p,seed,stop,ignore_eos,- sampling controlmin_tokens-prompt_logprobs,spaces_between_special_tokensoutput control-include_stop_str_in_output,truncate_prompt_tokensformatting controllogits_processors,allowed_token_ids,vllm_xargs- advanced controlKey Changes:
ResponsesRequestprotocol with 5 core sampling parametersAddedlogits_processor_patternparameter toto_sampling_params()methodTest Plan
Unit Tests:
Adding unit tests just to validate the API mapping:
Integration Test:
Adding an integration test to prove that the end-to-end call works with the new parameters.
Test Result
Essential Elements of an Effective PR Description Checklist
supported_models.mdandexamplesfor a new model.