Skip to content

[NIXL] delay req_id clean for _req_to_save to finished_request #31048

Open
xuechendi wants to merge 1 commit intovllm-project:mainfrom
xuechendi:dev/clean_nixl_reqs_queue
Open

[NIXL] delay req_id clean for _req_to_save to finished_request #31048
xuechendi wants to merge 1 commit intovllm-project:mainfrom
xuechendi:dev/clean_nixl_reqs_queue

Conversation

@xuechendi
Copy link
Copy Markdown
Contributor

@xuechendi xuechendi commented Dec 19, 2025

Purpose

Follow up on #30419 (comment) to clean _req_to_save in request_finished instead of build_connecter_meta
@orozery @NickLucche, would like to know your thoughts.

accuracy is verified locally. I verified all requests gets cleaned by adding req_id print out locally.

Test Plan

Test Result


Essential Elements of an Effective PR Description Checklist
  • The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
  • The test plan, such as providing test command.
  • The test results, such as pasting the results comparison before and after, or e2e results
  • (Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
  • (Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

Signed-off-by: Chendi Xue <chendi.xue@intel.com>
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request refactors the cleanup logic for _reqs_need_save in the NixlConnectorScheduler. The change moves the responsibility of removing a request from _reqs_need_save from build_connector_meta to request_finished. Previously, the cleanup was tied to the completion of prefill scheduling. Now, it happens when the request's lifecycle on the current instance ends, regardless of whether it finished successfully, was aborted, or capped. This simplifies the logic by centralizing the cleanup and removing the need to track partial prefill states for this purpose. The changes appear correct and improve the code's clarity and maintainability. I don't see any critical or high-severity issues.

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment on lines +735 to +736
# request is finished, now remove it from self._reqs_need_save
self._reqs_need_save.pop(request.request_id, None)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Stop re-saving host buffer KV on every scheduler tick

When use_host_buffer remote decode is enabled, _reqs_need_save is now only cleared in request_finished, so the entry survives into every subsequent scheduling step. The loop in build_connector_meta (lines 688‑699) iterates over scheduled cached requests each tick and will keep adding the same request to reqs_to_save; wait_for_save then calls save_kv_to_host, resulting in the same KV blocks being copied to host on every decode iteration rather than once after prefill. This introduces repeated blocking D2H transfers and a significant performance regression for any long-running request using host-buffered remote decode.

Useful? React with 👍 / 👎.

Comment on lines +735 to +736
# request is finished, now remove it from self._reqs_need_save
self._reqs_need_save.pop(request.request_id, None)
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've been trying to think if a possible future flow where request.kv_transfer_params are cleared during the life-cycle of a request.
For example, a future flow where the request transfer is handled by some other flow, which clears kv_transfer_params to indicate no further transfer-related action is needed (including delaying freeing of blocks).
In that case, we will not clear the request of self._reqs_need_save.

I think it would be safer to move this pop before the if not params check.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants