[KVConnector] Introduce bind_scheduler_state API#41011
[KVConnector] Introduce bind_scheduler_state API#41011NickLucche wants to merge 2 commits intovllm-project:mainfrom
bind_scheduler_state API#41011Conversation
There was a problem hiding this comment.
Code Review
This pull request refactors the KV connector interface by introducing a SchedulerState dataclass and a bind_scheduler_state method, replacing the previous bind_gpu_block_pool approach to improve extensibility. Feedback indicates that the SchedulerState docstring is misleading, as it claims to provide read-only access while the internal kv_cache_manager remains mutable.
| class SchedulerState: | ||
| """ | ||
| State of the scheduler that the connector can access, scheduler-side. | ||
| This dataclass ensures read-only access to scheduler state, while enabling |
There was a problem hiding this comment.
The docstring states that this dataclass "ensures read-only access to scheduler state". While the dataclass itself is frozen=True (preventing reassignment of its fields), the kv_cache_manager object it contains is mutable. A connector could technically call mutating methods on the manager. This is more of a design guideline than a technical enforcement, but the docstring might be slightly misleading in its current phrasing.
Signed-off-by: NickLucche <nlucches@redhat.com>
|
@NickLucche The approach here is pretty similar to #39654, but somewhat more combersome. I was rather thinking that |
|
This PR looks okay to me and I think having a Regarding @orozery 's proposal, I am hesitating because of the coupling. In the long run, I actually think we should try to decouple the scheduler and kv connector as much as possible, and perhaps moving all kv_connector stuff inside the KV cache manager. |
I agree that KV connector fits better inside the KV cache manager. However, I think that KV connectors would benefit from being exposed to some scheduler-specific state. To summarize, I agree the connector hooks should be moved inside KV cache manager. |
@ivanium Another way to achieve what I described is creating a two-level coupling:
We can start off (1). I think this is somewhat better than my previous suggestion, as it will allow the KV cache manager (from a GPU prefix cache POV) to benefit from possible optimizations thanks to access to the scheduler waiting list. |
Yes that is because I was mostly fine with @ivanium PR. I just generalized the hook to a post-init one so we don't have to add another one if the next change requires something that isn't strictly block_pool.
Not sure about this, although this encapsulation would be nice to achieve, I think having the KVConnector be at the scheduler level allows for a better request-level interface.
same situation and reasoning for me above |
I am not sure how happy I would be to pipe this state through the manager just to expose it to the connector tbh. |
|
See #39654 (comment) - I think we should be cautious about the API surface we expose to out-of-tree connectors, and this PR opens up even more than |
Alternative approach to #39654.
This PR introduces a generic "post-init hook", scheduler-side, that allows a connector to peek into the state of the scheduler:
while avoiding initialization pattern that are too narrow to one connector's use case (note this is also the driving motivation behind a potential ConnectorV2 API overhaul).
SchedulerStateallows for expanding the kind of data that we may want to pipe-in through this API, while ensuring access is read-only: scheduler-state is not meant to be consumed by the scheduler, it's a scheduler->connector relation.