[KVConnector] Support worker -> scheduler metadata#31964
[KVConnector] Support worker -> scheduler metadata#31964NickLucche merged 7 commits intovllm-project:mainfrom
Conversation
There was a problem hiding this comment.
Code Review
This pull request introduces a new API for workers to send metadata to the scheduler, which is a valuable addition for features like the OffloadingConnector. The implementation is well-structured and includes comprehensive tests. However, I've identified a critical correctness issue in MultiConnector.update_connector_output where state mutation is not handled safely, potentially leaving objects in an inconsistent state if an error occurs. My feedback includes a code suggestion to make this part of the implementation more robust.
vllm/distributed/kv_transfer/kv_connector/v1/multi_connector.py
Outdated
Show resolved
Hide resolved
50468f2 to
bc8ade9
Compare
|
Hi @orozery, the pre-commit checks have failed. Please run: uv pip install pre-commit
pre-commit install
pre-commit run --all-filesThen, commit the changes and push to your branch. For future commits, Tip Is
|
bc8ade9 to
d1de052
Compare
|
cc @NickLucche |
There was a problem hiding this comment.
Agreed with @orozery offline to override this new pipe to coalesce worker->scheduler data propagation for a future Connector v2 interface.
I also believe we're already in partial agreement with @njhill (cc to check out PR if I missed something here).
=>For the time being, I think this is a generic channel that unlocks some development and the impl looks good on my side.
| c.update_connector_output(connector_output) | ||
| finally: | ||
| # restore kv_connector_worker_meta | ||
| connector_output.kv_connector_worker_meta = multi_connector_worker_meta |
| if metadata_list is None and kv_connector_worker_meta is not None: | ||
| metadata_list = [None] * i | ||
| if metadata_list is not None: | ||
| metadata_list.append(kv_connector_worker_meta) |
There was a problem hiding this comment.
I think this is not immediate but I don't have a cleaner solution other than doing an if any() check at the end.
There was a problem hiding this comment.
This is very nice code - pythonic and terse but makes great use of indices of connectors with the tuple.
| """ | ||
| return None | ||
|
|
||
| def build_connector_worker_meta(self) -> KVConnectorWorkerMetadata | None: |
There was a problem hiding this comment.
Would this read better as get_connector_worker_meta(self)?
There was a problem hiding this comment.
I thought to follow the convention being used for the other direction (build_connector_meta).
| output.kv_connector_stats = kv_connector.get_kv_connector_stats() | ||
| output.kv_cache_events = kv_connector.get_kv_connector_kv_cache_events() | ||
| output.kv_connector_worker_meta = kv_connector.build_connector_worker_meta() |
There was a problem hiding this comment.
Does this mean that output.kv_connector_stats and output.kv_cache_events will be integrated into output.kv_connector_worker_meta at some stage? Hence all future data from workers will be returned in output.kv_connector_worker_meta?
| if metadata_list is None and kv_connector_worker_meta is not None: | ||
| metadata_list = [None] * i | ||
| if metadata_list is not None: | ||
| metadata_list.append(kv_connector_worker_meta) |
There was a problem hiding this comment.
This is very nice code - pythonic and terse but makes great use of indices of connectors with the tuple.
My thinking: To ease the common use-case of aggregating across all workers (as in the case of |
hickeyma
left a comment
There was a problem hiding this comment.
My thinking:
KVConnectorWorkerMetadata will remain abstract (just like the respective KVConnectorMetadata), and will basically replace KVConnectorOutput.
The consumer of KVConnectorOutput is the scheduler (an exception to that is KVConnectorKVEvents which is not consumed by the scheduler, but only by the LMCache scheduler-side connector)
Instead of consuming of ModelRunnerOutput.kv_connector_output, the scheduler will consume its required output (stats, finished requests, block errors) via a single new scheduler-side API.
To ease the common use-case of aggregating across all workers (as in the case of finished_sending, finished_recving, etc), we can create a utility class inhering from KVConnectorWorkerMetadata that will include a generic aggregation logic (sort of a generalization of today's KVOutputAggregator.
Sounds good but would like to get a better picture of the overall direction. Do you have a RFC or design doc that I could look at?
I'm working on an RFC for connector API refactoring. It will include other proposed changes as well. |
Can you explain what you mean by removed? How will events from connector be processed by vLLM then? |
The same way they are processed today, using the |
How do they get from workers to scheduler process? |
|
NickLucche
left a comment
There was a problem hiding this comment.
LGTM thanks for the work @orozery , only left one nit on the test, double check to make sure I haven't overlooked that case.
|
|
||
| # ----------------------------- test aggregate ---------------------------- | ||
|
|
||
| # aggregate ({"0a"}, None) and (None, {"1a"}) -> ({"0a"}, {"1a"}) |
There was a problem hiding this comment.
nit missing cae
aggregate (None, {"1a"}) and (None, {"1b"}) -> (None, {. . .} )
There was a problem hiding this comment.
Thanks! I've now added:
# aggregate ({"0a"}, None) and ({"0b"}, None) -> ({"0a", "0b"}, None)
d1de052 to
4068559
Compare
This change removes kv_cache_events as a top-level field on KVConnectorOutput and instead carries KV cache events inside a connector-specific KVConnectorWorkerMetadata subclass (LMCacheWorkerMetadata). This aligns KV event transport with the generic worker-to-scheduler metadata mechanism introduced in PR vllm-project#31964, eliminating redundant aggregation code paths. Signed-off-by: Martin Hickey <martin.hickey@ie.ibm.com>
This commit introduces a new build_worker_connector_meta KV connector API allowing workers to send back arbitrary metadata back to the scheduler-side connector. This aligns with the already existing API build_connector_metadata which allows for the same on the opposite direction (scheduler -> worker). Signed-off-by: Or Ozeri <oro@il.ibm.com>
7db7632 to
263ba3f
Compare
This change removes kv_cache_events as a top-level field on KVConnectorOutput and instead carries KV cache events inside a connector-specific KVConnectorWorkerMetadata subclass (LMCacheWorkerMetadata). This aligns KV event transport with the generic worker-to-scheduler metadata mechanism introduced in PR vllm-project#31964, eliminating redundant aggregation code paths. Signed-off-by: Martin Hickey <martin.hickey@ie.ibm.com>
Signed-off-by: Or Ozeri <oro@il.ibm.com> Co-authored-by: Nicolò Lucchesi <nlucches@redhat.com>
Signed-off-by: Or Ozeri <oro@il.ibm.com> Co-authored-by: Nicolò Lucchesi <nlucches@redhat.com>
This PR introduces a new build_worker_connector_meta KV connector API,
allowing workers to send back arbitrary metadata back to the scheduler-side connector.
This aligns with the already existing API build_connector_metadata which allows for the same on the opposite direction (scheduler -> worker).
In particular, this API is needed for the OffloadingConnector to be able to notify the scheduler-side on offloaded blocks, even before a request is finished.