[KVConnector] Enable get_block_ids_with_load_errors() in LMCache connector #27978
[KVConnector] Enable get_block_ids_with_load_errors() in LMCache connector #27978NickLucche merged 9 commits intovllm-project:mainfrom
Conversation
Signed-off-by: Zirui Liu <ziliu@ddn.com>
|
👋 Hi! Thank you for contributing to the vLLM project. 💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels. Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run You ask your reviewers to trigger select CI tests on top of Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging. To run CI, PR reviewers can either: Add If you have any questions, please reach out to us on Slack at https://slack.vllm.ai. 🚀 |
There was a problem hiding this comment.
Code Review
This pull request enables error handling for failed KV cache loads from LMCache by implementing the get_block_ids_with_load_errors method in the LMCacheConnectorV1. The implementation correctly provides a fallback for older versions of LMCache that do not support this feature. My review includes a suggestion to improve the robustness of the implementation by checking if the method is callable, which prevents potential TypeError exceptions if the attribute exists but is not a method. This is particularly important for ensuring stability when interacting with versioned dependencies.
vllm/distributed/kv_transfer/kv_connector/v1/lmcache_connector.py
Outdated
Show resolved
Hide resolved
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Signed-off-by: ziruiliu <ziliu@ddn.com>
vllm/distributed/kv_transfer/kv_connector/v1/lmcache_connector.py
Outdated
Show resolved
Hide resolved
Signed-off-by: Zirui Liu <ziliu@ddn.com>
Head branch was pushed to by a user without write access
Signed-off-by: Zirui Liu <ziliu@ddn.com>
Signed-off-by: Zirui Liu <ziliu@ddn.com>
|
Hi @NickLucche, coul you please take a look on this change? This is quite straightfoward to enbale LMCache to report block ids those are failed in retrieval |
|
Thanks for contributing @ziruiliu ! |
…ector (vllm-project#27978) Signed-off-by: Zirui Liu <ziliu@ddn.com> Signed-off-by: ziruiliu <ziliu@ddn.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Co-authored-by: Nicolò Lucchesi <nlucches@redhat.com> Signed-off-by: George D. Torres <gdavtor@gmail.com>
…ector (vllm-project#27978) Signed-off-by: Zirui Liu <ziliu@ddn.com> Signed-off-by: ziruiliu <ziliu@ddn.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Co-authored-by: Nicolò Lucchesi <nlucches@redhat.com>
Purpose
In LMCache engine, the PR LMCache/LMCache#1835 implement interface get_block_ids_with_load_errors which was introduced in #19330. So vLLM is able to handle failed cache retrieval from LMCache
Test Plan
See PR's comment at LMCache/LMCache#1835 (comment)
In the test, we try to inject fault in cache retrieval when cache lookup returns OK. vLLM is expected to receive the failed block ids from LMCache and reschedule the request
Test Result
Reschedule happened as expected
Essential Elements of an Effective PR Description Checklist
supported_models.mdandexamplesfor a new model.