[KV Offload] Return None from lookup() for in-flight blocks#41795
Conversation
There was a problem hiding this comment.
Code Review
This pull request updates the lookup method in the CPU manager to distinguish between a missing block and an in-flight write. Previously, the method returned a boolean indicating readiness; it now returns False if the block is not found, None if a write is in-flight, and True if the block is ready. The unit tests have been updated to reflect these new return values. I have no feedback to provide as there were no review comments to evaluate.
| # lookup [1, 2] -> not ready | ||
| assert cpu_manager.lookup(to_key(1), _EMPTY_REQ_CTX) is False | ||
| assert cpu_manager.lookup(to_key(2), _EMPTY_REQ_CTX) is False | ||
| # lookup [1, 2] -> write in-flight, not yet ready |
There was a problem hiding this comment.
Now we don't test the False flow.
Can you add one assert for a lookup returning False?
There was a problem hiding this comment.
A few lines later in the test there are already several assertions covering the False case. Should I add another assertion here as well?
vllm/tests/v1/kv_offload/cpu/test_manager.py
Line 180 in 29609a8
Signed-off-by: Ronen Schaffer <ronen.schaffer@ibm.com>
…ject#41795) Signed-off-by: Ronen Schaffer <ronen.schaffer@ibm.com> Signed-off-by: Ifta Khairul Alam Adil <ikaadil007@gmail.com>
…ject#41795) Signed-off-by: Ronen Schaffer <ronen.schaffer@ibm.com> Signed-off-by: Libin Tang <libin.tang@intel.com>
…ject#41795) Signed-off-by: Ronen Schaffer <ronen.schaffer@ibm.com>
…ject#41795) Signed-off-by: Ronen Schaffer <ronen.schaffer@ibm.com>
…ject#41795) Signed-off-by: Ronen Schaffer <ronen.schaffer@ibm.com>
…ject#41795) Signed-off-by: Ronen Schaffer <ronen.schaffer@ibm.com> Signed-off-by: Matt Van Horn <455140+mvanhorn@users.noreply.github.com>
Purpose
CPUOffloadingManager.lookup()was returningFalsefor both absent blocks and in-flight blocks (write started but not yet complete).The base class defines
Noneto mean "retry later" which is the right semantic for in-flight.This makes the scheduler defer instead of treating an in-flight block as a miss.
Test Plan
Test Result
Essential Elements of an Effective PR Description Checklist
supported_models.mdandexamplesfor a new model.