[NIXL] delay req_id clean for _req_to_save to finished_request by xuechendi · Pull Request #31048 · vllm-project/vllm

xuechendi · 2025-12-19T22:35:21Z

Purpose

Follow up on #30419 (comment) to clean _req_to_save in request_finished instead of build_connecter_meta
@orozery @NickLucche, would like to know your thoughts.

accuracy is verified locally. I verified all requests gets cleaned by adding req_id print out locally.

Test Plan

Test Result

Essential Elements of an Effective PR Description Checklist

The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
The test plan, such as providing test command.
The test results, such as pasting the results comparison before and after, or e2e results
(Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
(Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

Signed-off-by: Chendi Xue <chendi.xue@intel.com>

gemini-code-assist

Code Review

This pull request refactors the cleanup logic for _reqs_need_save in the NixlConnectorScheduler. The change moves the responsibility of removing a request from _reqs_need_save from build_connector_meta to request_finished. Previously, the cleanup was tied to the completion of prefill scheduling. Now, it happens when the request's lifecycle on the current instance ends, regardless of whether it finished successfully, was aborted, or capped. This simplifies the logic by centralizing the cleanup and removing the need to track partial prefill states for this purpose. The changes appear correct and improve the code's clarity and maintainability. I don't see any critical or high-severity issues.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

chatgpt-codex-connector · 2025-12-19T22:39:47Z

vllm/distributed/kv_transfer/kv_connector/v1/nixl_connector.py

+        # request is finished, now remove it from self._reqs_need_save
+        self._reqs_need_save.pop(request.request_id, None)


Stop re-saving host buffer KV on every scheduler tick

When use_host_buffer remote decode is enabled, _reqs_need_save is now only cleared in request_finished, so the entry survives into every subsequent scheduling step. The loop in build_connector_meta (lines 688‑699) iterates over scheduled cached requests each tick and will keep adding the same request to reqs_to_save; wait_for_save then calls save_kv_to_host, resulting in the same KV blocks being copied to host on every decode iteration rather than once after prefill. This introduces repeated blocking D2H transfers and a significant performance regression for any long-running request using host-buffered remote decode.

Useful? React with 👍 / 👎.

orozery · 2025-12-22T08:44:32Z

vllm/distributed/kv_transfer/kv_connector/v1/nixl_connector.py

+        # request is finished, now remove it from self._reqs_need_save
+        self._reqs_need_save.pop(request.request_id, None)


I've been trying to think if a possible future flow where request.kv_transfer_params are cleared during the life-cycle of a request.
For example, a future flow where the request transfer is handled by some other flow, which clears kv_transfer_params to indicate no further transfer-related action is needed (including delaying freeing of blocks).
In that case, we will not clear the request of self._reqs_need_save.

I think it would be safer to move this pop before the if not params check.

delayed req_id clean for _req_to_save to finished_request

40f71e8

Signed-off-by: Chendi Xue <chendi.xue@intel.com>

xuechendi requested review from ApostaC and NickLucche as code owners December 19, 2025 22:35

mergify bot added the kv-connector label Dec 19, 2025

gemini-code-assist bot reviewed Dec 19, 2025

View reviewed changes

chatgpt-codex-connector bot reviewed Dec 19, 2025

View reviewed changes

orozery reviewed Dec 22, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[NIXL] delay req_id clean for _req_to_save to finished_request #31048

[NIXL] delay req_id clean for _req_to_save to finished_request #31048
xuechendi wants to merge 1 commit intovllm-project:mainfrom
xuechendi:dev/clean_nixl_reqs_queue

xuechendi commented Dec 19, 2025 •

edited by github-actions bot

Loading

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

chatgpt-codex-connector bot left a comment

Uh oh!

chatgpt-codex-connector bot Dec 19, 2025

Uh oh!

orozery Dec 22, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		# request is finished, now remove it from self._reqs_need_save
		self._reqs_need_save.pop(request.request_id, None)

Uh oh!

Conversation

xuechendi commented Dec 19, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Test Plan

Test Result

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector bot Dec 19, 2025

Choose a reason for hiding this comment

Uh oh!

orozery Dec 22, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

xuechendi commented Dec 19, 2025 •

edited by github-actions bot

Loading