[BugFix][Frontend] pass kv_transfer_params through to sampling_params by hhk7734 · Pull Request #38094 · vllm-project/vllm

hhk7734 · 2026-03-25T10:16:54Z

Purpose

Fix two issues in the disaggregated serving GenerateRequest:

kv_transfer_params silently dropped: When a client sends kv_transfer_params in a generate request, the field is parsed but never forwarded to the engine's SamplingParams. This means disaggregated prefill/decode scheduling flags (do_remote_decode, do_remote_prefill, etc.) have no effect via the token-level serving endpoint.
sampling_params unnecessarily required: The field has no default, forcing every caller to provide it even when default sampling is acceptable.

Changes

Add GenerateRequest.to_sampling_params() that merges kv_transfer_params into SamplingParams.extra_args before handing off to the engine.
Default sampling_params to SamplingParams() via Field(default_factory=...).
Call request.to_sampling_params() in ServingTokens instead of accessing the raw field.

Test Plan

curl -XPOST http://<prefillerIP>:8000/inference/v1/generate \
    -H "Content-Type: application/json" \
    -d '{
      "token_ids": [15339, 11, 1268, 527, 499, 30],
      "sampling_params": {
        "max_tokens": 20
      },
      "kv_transfer_params": {
        "do_remote_decode": true,
        "do_remote_prefill": false
      }
    }' | jq

curl -XPOST http://<decoderIP>:8000/inference/v1/generate \
    -H "Content-Type: application/json" \
    -d '{
      "token_ids": [15339, 11, 1268, 527, 499, 30],
      "sampling_params": {
        "max_tokens": 20
      },
      "kv_transfer_params": {
        "do_remote_prefill": true,
        "do_remote_decode": false,
        "remote_block_ids": [
          [
            1,
            2
          ]
        ],
        "remote_engine_id": "0ce78dd0-7144-4375-86ed-71dee6d9a81c_dp0",
        "remote_request_id": "generate-tokens-bcc9f408bb55b99e-9446e206",
        "remote_host": "<prefillerIP>",
        "remote_port": 5600,
        "tp_size": 1
     }
    }' | jq

Test Result

{
  "request_id": "905c6d76ee0a7603",
  "choices": [
    {
      "index": 0,
      "logprobs": null,
      "finish_reason": "length",
      "token_ids": [
        29882,
        29906,
        29900,
        29906,
        29906,
        31054,
        31093,
        29871,
        237,
        181,
        131,
        239,
        134,
        140,
        240,
        152,
        163,
        29871,
        30970,
        29871
      ]
    }
  ],
  "prompt_logprobs": null,
  "kv_transfer_params": {
    "do_remote_prefill": true,
    "do_remote_decode": false,
    "remote_block_ids": [
      [
        1,
        2
      ]
    ],
    "remote_engine_id": "0ce78dd0-7144-4375-86ed-71dee6d9a81c_dp0",
    "remote_request_id": "generate-tokens-bcc9f408bb55b99e-9446e206",
    "remote_host": "<prefillerIP>",
    "remote_port": 5600,
    "tp_size": 1
  }
}

{
  "request_id": "b7fe3643234c27e3",
  "choices": [
    {
      "index": 0,
      "logprobs": null,
      "finish_reason": "length",
      "token_ids": [
        29871,
        239,
        161,
        139,
        31137,
        31054,
        31136,
        29871,
        237,
        181,
        194,
        239,
        188,
        30393,
        240,
        152,
        163,
        239,
        159,
        191
      ]
    }
  ],
  "prompt_logprobs": null,
  "kv_transfer_params": null
}

main (APIServer pid=8) INFO 03-26 14:20:00 [loggers.py:259] Engine 000: Avg prompt throughput: 0.1 tokens/s, Avg generation throughput: 2.0 tokens/s, Running: 0 reqs, Waiting: 0 reqs, GPU KV cache usage: 0.0%, Prefix cache hit rate: 0.0%, External prefix cache hit rate: 100.0%
main (APIServer pid=8) INFO 03-26 14:20:00 [metrics.py:103] KV Transfer metrics: Num successful transfers=1, Avg xfer time (ms)=11.411, P90 xfer time (ms)=11.411, Avg post time (ms)=0.33, P90 post time (ms)=0.33, Avg MB per transfer=2.0, Throughput (MB/s)=175.269, Avg number of descriptors=32.0
main (APIServer pid=8) INFO 03-26 14:20:10 [loggers.py:259] Engine 000: Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 0.0 tokens/s, Running: 0 reqs, Waiting: 0 reqs, GPU KV cache usage: 0.0%, Prefix cache hit rate: 0.0%, External prefix cache hit rate: 100.0%

Essential Elements of an Effective PR Description Checklist

The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
The test plan, such as providing test command.
The test results, such as pasting the results comparison before and after, or e2e results
(Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
(Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

gemini-code-assist

Code Review

This pull request introduces support for kv_transfer_params in disaggregated serving requests and makes sampling_params optional by providing a default. New tests have been added to cover these scenarios. A high-severity issue was identified in the to_sampling_params method, which currently mutates the original sampling_params object instead of working on a copy, potentially leading to unexpected side effects.

gemini-code-assist · 2026-03-25T10:22:31Z

    )

+    def to_sampling_params(self) -> SamplingParams:
+        params = self.sampling_params


This method mutates self.sampling_params in place because params is a reference, not a copy. This can lead to unexpected side effects, as the state of the GenerateRequest object is modified. A method with a to_... naming convention should not have side effects on the instance it's called on.

The caller in serving.py also modifies the returned object. This reinforces the need to work on a copy to avoid altering the original request object's state.

Please create a deep copy of self.sampling_params. Assuming it's a Pydantic model, model_copy(deep=True) is the idiomatic way to do this.

Suggested change

params = self.sampling_params

params = self.sampling_params.model_copy(deep=True)

…ake sampling_params optional kv_transfer_params in GenerateRequest were not being forwarded to the engine. Add to_sampling_params() that merges kv_transfer_params into extra_args, and default sampling_params so callers can omit it. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: Hyeonki Hong <hyeonki.hong@moreh.io>

hhk7734 requested review from DarkLight1337, NickLucche, aarnphm, njhill and robertgshaw2-redhat as code owners March 25, 2026 10:16

mergify bot added frontend bug Something isn't working labels Mar 25, 2026

gemini-code-assist bot reviewed Mar 25, 2026

View reviewed changes

hhk7734 force-pushed the fix/tito_kv_transfer_params branch from c6c5190 to 75ace00 Compare March 26, 2026 07:20

NickLucche mentioned this pull request Mar 26, 2026

[P/D] Prefill compute optimizations with bi-directional KV cache transfers between P and D nodes #32553

Open

5 tasks

hhk7734 mentioned this pull request Apr 8, 2026

Add Moreh as a contributor to the adopters list llm-d/llm-d#1111

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[BugFix][Frontend] pass kv_transfer_params through to sampling_params#38094

[BugFix][Frontend] pass kv_transfer_params through to sampling_params#38094
hhk7734 wants to merge 1 commit intovllm-project:mainfrom
moreh-dev:fix/tito_kv_transfer_params

hhk7734 commented Mar 25, 2026 •

edited

Loading

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Mar 25, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

	params = self.sampling_params
	params = self.sampling_params.model_copy(deep=True)

Uh oh!

Conversation

hhk7734 commented Mar 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Changes

Test Plan

Test Result

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Mar 25, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

hhk7734 commented Mar 25, 2026 •

edited

Loading