[Feat][KVConnector] Add bind_gpu_block_pool() to KVConnectorBase_V1#39654
[Feat][KVConnector] Add bind_gpu_block_pool() to KVConnectorBase_V1#39654ivanium wants to merge 1 commit intovllm-project:mainfrom
bind_gpu_block_pool() to KVConnectorBase_V1#39654Conversation
Signed-off-by: Yifan Qiao <yifanqiao@inferact.ai>
There was a problem hiding this comment.
Code Review
This pull request refactors the bind_gpu_block_pool method by moving it from the SimpleCPUOffloadConnector to the base KVConnectorBase class. This change allows for a more consistent interface across different connector implementations and simplifies the logic in the scheduler by removing the need for hasattr checks. Feedback was provided regarding the docstring in the base class, which should be updated to advise against direct reference count manipulation in favor of using the BlockPool's touch() and free_blocks() methods to ensure internal consistency.
jonathanc-n
left a comment
There was a problem hiding this comment.
Looks good to me. cc @orozery
NickLucche
left a comment
There was a problem hiding this comment.
I think this is fine, although I have to admit it's a bit specific.
Overall we're missing a "post-init" hook. At the same time, I am trying to consider what other alternatives we might have, like:
- broadening the semantics of this method to an actual "post-init", but we may as well just do this in the v2 interface
- postponing the connector allocation and allowing kwargs to be passed to the connector (for gpu blocks). This way we should be able to get the same result.
However I don't feel strongly that any of the above is better than what we have here.
|
I fed claude with my concerns about making Define a narrow Then ==== Also, I agree with @NickLucche we should try to generalize into a single "scheduler-side initialization" hook. |
|
IMO, what's being proposed here is to expose See also in the design doc
In that vein, I much prefer @orozery's suggestion:
And I'd view that as a post-merge cleanup of #37160 (Honestly, it feels like there is also a lurking, unresolved question about request-oriented vs block-oriented connectors, the potential for a redesign of the connectors API, and whether out-of-tree connectors are worth supporting ... but all that aside, I think we should be more thoughtful about how we expose the block pool to connectors) |
Purpose
Promote
bind_gpu_block_pool()from aSimpleCPUOffloadConnector-only method to a first-class API onKVConnectorBase_V1. This API is OPT-in and remains compatible with all existing connectors; it additionally allows any connector to access the GPU block pool for per-GPU-block status tracking (e.g., ref count management, prefix cache block iteration).Test Plan
Existing CI tests.
Test Result
Passed.
Essential Elements of an Effective PR Description Checklist
supported_models.mdandexamplesfor a new model.