Skip to content

[KVConnector] Introduce bind_scheduler_state API#41011

Open
NickLucche wants to merge 2 commits intovllm-project:mainfrom
NickLucche:conn-bind-scheduler-state
Open

[KVConnector] Introduce bind_scheduler_state API#41011
NickLucche wants to merge 2 commits intovllm-project:mainfrom
NickLucche:conn-bind-scheduler-state

Conversation

@NickLucche
Copy link
Copy Markdown
Collaborator

Alternative approach to #39654.

This PR introduces a generic "post-init hook", scheduler-side, that allows a connector to peek into the state of the scheduler:

    # Scheduler-side methods
    # ==============================

    def bind_scheduler_state(self, scheduler_state: SchedulerState):
        """
        Bind the scheduler state to the connector.
        This function is called by the scheduler after initialization
        and before the first model execution.
        Args:
            scheduler_state (SchedulerState): the scheduler state.
        """
        return

while avoiding initialization pattern that are too narrow to one connector's use case (note this is also the driving motivation behind a potential ConnectorV2 API overhaul).

SchedulerState allows for expanding the kind of data that we may want to pipe-in through this API, while ensuring access is read-only: scheduler-state is not meant to be consumed by the scheduler, it's a scheduler->connector relation.

Signed-off-by: NickLucche <nlucches@redhat.com>
Copy link
Copy Markdown

@claude claude Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Claude Code Review

This pull request is from a fork — automated review is disabled. A repository maintainer can comment @claude review to run a one-time review.

@NickLucche
Copy link
Copy Markdown
Collaborator Author

cc @orozery @ivanium @ApostaC

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request refactors the KV connector interface by introducing a SchedulerState dataclass and a bind_scheduler_state method, replacing the previous bind_gpu_block_pool approach to improve extensibility. Feedback indicates that the SchedulerState docstring is misleading, as it claims to provide read-only access while the internal kv_cache_manager remains mutable.

class SchedulerState:
"""
State of the scheduler that the connector can access, scheduler-side.
This dataclass ensures read-only access to scheduler state, while enabling
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The docstring states that this dataclass "ensures read-only access to scheduler state". While the dataclass itself is frozen=True (preventing reassignment of its fields), the kv_cache_manager object it contains is mutable. A connector could technically call mutating methods on the manager. This is more of a design guideline than a technical enforcement, but the docstring might be slightly misleading in its current phrasing.

Signed-off-by: NickLucche <nlucches@redhat.com>
@orozery
Copy link
Copy Markdown
Collaborator

orozery commented Apr 27, 2026

@NickLucche The approach here is pretty similar to #39654, but somewhat more combersome.
It still lets the connectors access the block pool, and even the kv cache manager.
It still adds a dependency between base.py and vLLM internals.

I was rather thinking that SchedulerState would actually be an abstract class.
It's implementation would be in some new SchedulerConnectorMixin class, which will be inherited by Scheduler.
I can prepare a prototype if you prefer.

@ivanium
Copy link
Copy Markdown
Contributor

ivanium commented Apr 27, 2026

This PR looks okay to me and I think having a SchedulerState is indeed more general than merely block_pool. But I wonder if we need anything other than block_pool so far. I know currently block_pool is hidden inside the KV cache manager so you cannot expose that in the SchedulerState directly, so I am okay with the current status.

Regarding @orozery 's proposal, I am hesitating because of the coupling. In the long run, I actually think we should try to decouple the scheduler and kv connector as much as possible, and perhaps moving all kv_connector stuff inside the KV cache manager.

@orozery
Copy link
Copy Markdown
Collaborator

orozery commented Apr 28, 2026

Regarding @orozery 's proposal, I am hesitating because of the coupling. In the long run, I actually think we should try to decouple the scheduler and kv connector as much as possible, and perhaps moving all kv_connector stuff inside the KV cache manager.

I agree that KV connector fits better inside the KV cache manager.
i.e. the KV cache manager should be the one querying the KV connector for matched blocks.
I can downgrade my proposal one level down to the KV cache manager.

However, I think that KV connectors would benefit from being exposed to some scheduler-specific state.
Specifically, the list of waiting requests (and their tokens/block hashes, as well as their kv_transfer_params).
This will be useful for connectors planning ahead eviction and pre-fetching.

To summarize, I agree the connector hooks should be moved inside KV cache manager.
But I think we want to allow the connector a read-only view on the Scheduler state.

@orozery
Copy link
Copy Markdown
Collaborator

orozery commented Apr 28, 2026

To summarize, I agree the connector hooks should be moved inside KV cache manager. But I think we want to allow the connector a read-only view on the Scheduler state.

@ivanium Another way to achieve what I described is creating a two-level coupling:

  1. Couple the KV connector with a read-only view of the KV cache manager
  2. Couple the KV cache manager with a read-only view of the Scheduler state

We can start off (1).
Once we have (2), we can extend the KV-connector view to include the Scheduler state view.

I think this is somewhat better than my previous suggestion, as it will allow the KV cache manager (from a GPU prefix cache POV) to benefit from possible optimizations thanks to access to the scheduler waiting list.

@NickLucche
Copy link
Copy Markdown
Collaborator Author

The approach here is pretty similar to #39654, but somewhat more combersome

Yes that is because I was mostly fine with @ivanium PR. I just generalized the hook to a post-init one so we don't have to add another one if the next change requires something that isn't strictly block_pool.

perhaps moving all kv_connector stuff inside the KV cache manager

Not sure about this, although this encapsulation would be nice to achieve, I think having the KVConnector be at the scheduler level allows for a better request-level interface.
ie I also find myself in need to have a way to peek into the waiting request queue for PD here https://docs.google.com/document/d/1i-O6kqY7WfF1lPyyftRpCQt5fwnFYIEDZKCxyB51Sjg/edit?usp=sharing.

However, I think that KV connectors would benefit from being exposed to some scheduler-specific state.

same situation and reasoning for me above

@NickLucche
Copy link
Copy Markdown
Collaborator Author

NickLucche commented Apr 28, 2026

Couple the KV cache manager with a read-only view of the Scheduler state

I am not sure how happy I would be to pipe this state through the manager just to expose it to the connector tbh.
I think the current manager abstraction is quite nice and designed for internal use above all.
While kvconnectors definitely share a lot of the semantics here, they're designed to also allow OOT implementations and not just for internals.
Therefore I would still be slightly more in favor of the most flexibile/powerful coupling, that is scheduler <> connector.

@markmc
Copy link
Copy Markdown
Member

markmc commented Apr 28, 2026

See #39654 (comment) - I think we should be cautious about the API surface we expose to out-of-tree connectors, and this PR opens up even more than BlockPool which is probably already too wide a surface

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants