[Test] Add worker tests for hidden-state prefix cache by hongzhi-gao · Pull Request #2881 · vllm-project/vllm-omni

hongzhi-gao · 2026-04-17T12:59:58Z

Purpose

Add worker-level correctness tests for hidden-state prefix caching (OmniTensorPrefixCache), aligned with the RFC goal of returning full hidden-state sequences (cached prefix + newly computed tokens) for downstream use when prefix caching is enabled (#2164).

This PR does not change runtime behavior; it reduces regression risk for the merge path (GPU → CPU coercion, block-table lookup) and for dual prefix-hit batches, which are not fully covered by tests/core/test_prefix_cache.py.

Test Plan

Files added: tests/worker/test_hidden_state_prefix_cache.py

Commands (local):

pytest -q tests/worker/test_hidden_state_prefix_cache.py
# Optional: CPU-only (skip CUDA test), e.g. environments without a GPU:
pytest -q tests/worker/test_hidden_state_prefix_cache.py -m "not cuda"

Validate GPU hidden-state update/merge (pytest markers core_model+cuda+L4 to match Buildkite CUDA unit test selection) and dual prefix-hit block table sharing vs isolation on CPU. Complements tests/core/test_prefix_cache.py; relates to hidden-state prefix caching (vllm-project#2164). Signed-off-by: hongzhigao <761417898@qq.com>

chatgpt-codex-connector · 2026-04-17T13:00:05Z

Codex usage limits have been reached for code reviews. Please check with the admins of this repo to increase the limits by adding credits.
Credits must be used to enable repository wide code reviews.

hsliuustc0106 · 2026-04-17T21:12:55Z

BLOCKER scan (test-only PR):

Correctness of test logic: PASS

The test coverage looks good:

Proper mocking of vLLM's InputBatch.block_table
Tests GPU → CPU coercion for cache operations
Tests dual prefix-hit scenarios (shared and distinct block tables)
Appropriate pytest markers (cuda, cpu, L4)
Clear assertions for both shape and value correctness

VERDICT: COMMENT (no blockers)

Minor nit: The test file path uses but the test is in . Consider if this should be or if is appropriate here.

amy-why-3459 · 2026-04-20T05:38:15Z

Thank you very much for your contribution. @yenuo26 PTAL

Align test placement with core-layer ownership and reviewer feedback. Keep coverage unchanged for GPU->CPU merge path and dual prefix-hit scenarios. Signed-off-by: hongzhigao <761417898@qq.com>

hongzhi-gao · 2026-04-20T10:41:55Z

BLOCKER scan (test-only PR):

Correctness of test logic: PASS

The test coverage looks good:

Proper mocking of vLLM's InputBatch.block_table

Tests GPU → CPU coercion for cache operations

Tests dual prefix-hit scenarios (shared and distinct block tables)

Appropriate pytest markers (cuda, cpu, L4)

Clear assertions for both shape and value correctness

VERDICT: COMMENT (no blockers)

Minor nit: The test file path uses but the test is in . Consider if this should be or if is appropriate here.

Thanks for the nit! I moved the test from tests/worker/test_hidden_state_prefix_cache.py to tests/core/test_hidden_state_prefix_cache.py to align with layer ownership.
Test logic/coverage remains unchanged. PTAL again. @hsliuustc0106

hongzhi-gao · 2026-04-20T10:43:48Z

Thank you very much for your contribution. @yenuo26 PTAL

Thanks for the review and coordination.

Copilot

Pull request overview

Adds new correctness tests covering hidden-state prefix caching merge behavior in OmniTensorPrefixCache, focusing on the “cached prefix + newly computed tail” merge path and multi-request prefix-hit scenarios.

Changes:

Add a CUDA-focused test that exercises GPU→CPU coercion during cache update/read and validates merged hidden-state sequence correctness.
Add a CPU test that validates dual prefix-hit behavior for requests sharing the same cached blocks vs using distinct cached blocks.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

+"""Hidden-state prefix cache correctness tests for OmniTensorPrefixCache merge paths."""
+
+from types import SimpleNamespace
+
+import pytest
+import torch
+
+from vllm_omni.core.prefix_cache import OmniTensorPrefixCache
+
+pytestmark = [pytest.mark.core_model]


hongzhi-gao mentioned this pull request Apr 17, 2026

[RFC]: Enable Prefix Caching with Hidden-State I/O (Multi-round / Service Scenarios) #1184

Open

1 task

hongzhi-gao added 2 commits April 20, 2026 18:39

[Test] Move hidden-state prefix cache tests to core

400989e

Align test placement with core-layer ownership and reviewer feedback. Keep coverage unchanged for GPU->CPU merge path and dual prefix-hit scenarios. Signed-off-by: hongzhigao <761417898@qq.com>

Merge branch 'main' into test/hidden-state-prefix-cache-worker

f1a8a77

yenuo26 added the ready label to trigger buildkite CI label Apr 20, 2026

hongzhi-gao added 2 commits April 21, 2026 10:45

Merge branch 'main' into test/hidden-state-prefix-cache-worker

f4aceed

Merge branch 'main' into test/hidden-state-prefix-cache-worker

effeb90

hsliuustc0106 removed the ready label to trigger buildkite CI label Apr 29, 2026

hsliuustc0106 requested a review from Copilot April 29, 2026 01:04

Copilot started reviewing on behalf of hsliuustc0106 April 29, 2026 01:05 View session

Copilot AI reviewed Apr 29, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Test] Add worker tests for hidden-state prefix cache#2881

[Test] Add worker tests for hidden-state prefix cache#2881
hongzhi-gao wants to merge 5 commits into
vllm-project:mainfrom
hongzhi-gao:test/hidden-state-prefix-cache-worker

hongzhi-gao commented Apr 17, 2026

Uh oh!

chatgpt-codex-connector Bot commented Apr 17, 2026

Uh oh!

hsliuustc0106 commented Apr 17, 2026

Uh oh!

amy-why-3459 commented Apr 20, 2026

Uh oh!

hongzhi-gao commented Apr 20, 2026 •

edited

Loading

Uh oh!

hongzhi-gao commented Apr 20, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Conversation

hongzhi-gao commented Apr 17, 2026

Purpose

Test Plan

Uh oh!

chatgpt-codex-connector Bot commented Apr 17, 2026

Uh oh!

hsliuustc0106 commented Apr 17, 2026

Uh oh!

amy-why-3459 commented Apr 20, 2026

Uh oh!

hongzhi-gao commented Apr 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

hongzhi-gao commented Apr 20, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

hongzhi-gao commented Apr 20, 2026 •

edited

Loading