[NIXL] delay req_id clean for _req_to_save to finished_request #31048
[NIXL] delay req_id clean for _req_to_save to finished_request #31048xuechendi wants to merge 1 commit intovllm-project:mainfrom
Conversation
Signed-off-by: Chendi Xue <chendi.xue@intel.com>
There was a problem hiding this comment.
Code Review
This pull request refactors the cleanup logic for _reqs_need_save in the NixlConnectorScheduler. The change moves the responsibility of removing a request from _reqs_need_save from build_connector_meta to request_finished. Previously, the cleanup was tied to the completion of prefill scheduling. Now, it happens when the request's lifecycle on the current instance ends, regardless of whether it finished successfully, was aborted, or capped. This simplifies the logic by centralizing the cleanup and removing the need to track partial prefill states for this purpose. The changes appear correct and improve the code's clarity and maintainability. I don't see any critical or high-severity issues.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| # request is finished, now remove it from self._reqs_need_save | ||
| self._reqs_need_save.pop(request.request_id, None) |
There was a problem hiding this comment.
Stop re-saving host buffer KV on every scheduler tick
When use_host_buffer remote decode is enabled, _reqs_need_save is now only cleared in request_finished, so the entry survives into every subsequent scheduling step. The loop in build_connector_meta (lines 688‑699) iterates over scheduled cached requests each tick and will keep adding the same request to reqs_to_save; wait_for_save then calls save_kv_to_host, resulting in the same KV blocks being copied to host on every decode iteration rather than once after prefill. This introduces repeated blocking D2H transfers and a significant performance regression for any long-running request using host-buffered remote decode.
Useful? React with 👍 / 👎.
| # request is finished, now remove it from self._reqs_need_save | ||
| self._reqs_need_save.pop(request.request_id, None) |
There was a problem hiding this comment.
I've been trying to think if a possible future flow where request.kv_transfer_params are cleared during the life-cycle of a request.
For example, a future flow where the request transfer is handled by some other flow, which clears kv_transfer_params to indicate no further transfer-related action is needed (including delaying freeing of blocks).
In that case, we will not clear the request of self._reqs_need_save.
I think it would be safer to move this pop before the if not params check.
Purpose
Follow up on #30419 (comment) to clean _req_to_save in
request_finishedinstead ofbuild_connecter_meta@orozery @NickLucche, would like to know your thoughts.
accuracy is verified locally. I verified all requests gets cleaned by adding req_id print out locally.
Test Plan
Test Result
Essential Elements of an Effective PR Description Checklist
supported_models.mdandexamplesfor a new model.