[BugFix][Frontend] pass kv_transfer_params through to sampling_params#38094
[BugFix][Frontend] pass kv_transfer_params through to sampling_params#38094hhk7734 wants to merge 1 commit intovllm-project:mainfrom
Conversation
There was a problem hiding this comment.
Code Review
This pull request introduces support for kv_transfer_params in disaggregated serving requests and makes sampling_params optional by providing a default. New tests have been added to cover these scenarios. A high-severity issue was identified in the to_sampling_params method, which currently mutates the original sampling_params object instead of working on a copy, potentially leading to unexpected side effects.
| ) | ||
|
|
||
| def to_sampling_params(self) -> SamplingParams: | ||
| params = self.sampling_params |
There was a problem hiding this comment.
This method mutates self.sampling_params in place because params is a reference, not a copy. This can lead to unexpected side effects, as the state of the GenerateRequest object is modified. A method with a to_... naming convention should not have side effects on the instance it's called on.
The caller in serving.py also modifies the returned object. This reinforces the need to work on a copy to avoid altering the original request object's state.
Please create a deep copy of self.sampling_params. Assuming it's a Pydantic model, model_copy(deep=True) is the idiomatic way to do this.
| params = self.sampling_params | |
| params = self.sampling_params.model_copy(deep=True) |
…ake sampling_params optional kv_transfer_params in GenerateRequest were not being forwarded to the engine. Add to_sampling_params() that merges kv_transfer_params into extra_args, and default sampling_params so callers can omit it. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: Hyeonki Hong <hyeonki.hong@moreh.io>
c6c5190 to
75ace00
Compare
Purpose
Fix two issues in the disaggregated serving
GenerateRequest:kv_transfer_paramssilently dropped: When a client sendskv_transfer_paramsin a generate request, the field is parsed but never forwarded to the engine'sSamplingParams. This means disaggregated prefill/decode scheduling flags (do_remote_decode,do_remote_prefill, etc.) have no effect via the token-level serving endpoint.sampling_paramsunnecessarily required: The field has no default, forcing every caller to provide it even when default sampling is acceptable.Changes
GenerateRequest.to_sampling_params()that mergeskv_transfer_paramsintoSamplingParams.extra_argsbefore handing off to the engine.sampling_paramstoSamplingParams()viaField(default_factory=...).request.to_sampling_params()inServingTokensinstead of accessing the raw field.Test Plan
Test Result
{ "request_id": "905c6d76ee0a7603", "choices": [ { "index": 0, "logprobs": null, "finish_reason": "length", "token_ids": [ 29882, 29906, 29900, 29906, 29906, 31054, 31093, 29871, 237, 181, 131, 239, 134, 140, 240, 152, 163, 29871, 30970, 29871 ] } ], "prompt_logprobs": null, "kv_transfer_params": { "do_remote_prefill": true, "do_remote_decode": false, "remote_block_ids": [ [ 1, 2 ] ], "remote_engine_id": "0ce78dd0-7144-4375-86ed-71dee6d9a81c_dp0", "remote_request_id": "generate-tokens-bcc9f408bb55b99e-9446e206", "remote_host": "<prefillerIP>", "remote_port": 5600, "tp_size": 1 } }{ "request_id": "b7fe3643234c27e3", "choices": [ { "index": 0, "logprobs": null, "finish_reason": "length", "token_ids": [ 29871, 239, 161, 139, 31137, 31054, 31136, 29871, 237, 181, 194, 239, 188, 30393, 240, 152, 163, 239, 159, 191 ] } ], "prompt_logprobs": null, "kv_transfer_params": null }Essential Elements of an Effective PR Description Checklist
supported_models.mdandexamplesfor a new model.