[Test] Add worker tests for hidden-state prefix cache#2881
Conversation
Validate GPU hidden-state update/merge (pytest markers core_model+cuda+L4 to match Buildkite CUDA unit test selection) and dual prefix-hit block table sharing vs isolation on CPU. Complements tests/core/test_prefix_cache.py; relates to hidden-state prefix caching (vllm-project#2164). Signed-off-by: hongzhigao <761417898@qq.com>
|
Codex usage limits have been reached for code reviews. Please check with the admins of this repo to increase the limits by adding credits. |
|
BLOCKER scan (test-only PR):
The test coverage looks good:
VERDICT: COMMENT (no blockers) Minor nit: The test file path uses but the test is in . Consider if this should be or if is appropriate here. |
|
Thank you very much for your contribution. @yenuo26 PTAL |
Align test placement with core-layer ownership and reviewer feedback. Keep coverage unchanged for GPU->CPU merge path and dual prefix-hit scenarios. Signed-off-by: hongzhigao <761417898@qq.com>
Thanks for the nit! I moved the test from tests/worker/test_hidden_state_prefix_cache.py to tests/core/test_hidden_state_prefix_cache.py to align with layer ownership. |
Thanks for the review and coordination. |
There was a problem hiding this comment.
Pull request overview
Adds new correctness tests covering hidden-state prefix caching merge behavior in OmniTensorPrefixCache, focusing on the “cached prefix + newly computed tail” merge path and multi-request prefix-hit scenarios.
Changes:
- Add a CUDA-focused test that exercises GPU→CPU coercion during cache update/read and validates merged hidden-state sequence correctness.
- Add a CPU test that validates dual prefix-hit behavior for requests sharing the same cached blocks vs using distinct cached blocks.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| """Hidden-state prefix cache correctness tests for OmniTensorPrefixCache merge paths.""" | ||
|
|
||
| from types import SimpleNamespace | ||
|
|
||
| import pytest | ||
| import torch | ||
|
|
||
| from vllm_omni.core.prefix_cache import OmniTensorPrefixCache | ||
|
|
||
| pytestmark = [pytest.mark.core_model] |
Purpose
Add worker-level correctness tests for hidden-state prefix caching (
OmniTensorPrefixCache), aligned with the RFC goal of returning full hidden-state sequences (cached prefix + newly computed tokens) for downstream use when prefix caching is enabled (#2164).This PR does not change runtime behavior; it reduces regression risk for the merge path (GPU → CPU coercion, block-table lookup) and for dual prefix-hit batches, which are not fully covered by
tests/core/test_prefix_cache.py.Test Plan
tests/worker/test_hidden_state_prefix_cache.py