Skip to content

[Test] Add worker tests for hidden-state prefix cache#2881

Open
hongzhi-gao wants to merge 5 commits into
vllm-project:mainfrom
hongzhi-gao:test/hidden-state-prefix-cache-worker
Open

[Test] Add worker tests for hidden-state prefix cache#2881
hongzhi-gao wants to merge 5 commits into
vllm-project:mainfrom
hongzhi-gao:test/hidden-state-prefix-cache-worker

Conversation

@hongzhi-gao
Copy link
Copy Markdown

Purpose

Add worker-level correctness tests for hidden-state prefix caching (OmniTensorPrefixCache), aligned with the RFC goal of returning full hidden-state sequences (cached prefix + newly computed tokens) for downstream use when prefix caching is enabled (#2164).

This PR does not change runtime behavior; it reduces regression risk for the merge path (GPU → CPU coercion, block-table lookup) and for dual prefix-hit batches, which are not fully covered by tests/core/test_prefix_cache.py.

Test Plan

  • Files added: tests/worker/test_hidden_state_prefix_cache.py
  • Commands (local):
    pytest -q tests/worker/test_hidden_state_prefix_cache.py
    # Optional: CPU-only (skip CUDA test), e.g. environments without a GPU:
    pytest -q tests/worker/test_hidden_state_prefix_cache.py -m "not cuda"

Validate GPU hidden-state update/merge (pytest markers core_model+cuda+L4 to match Buildkite CUDA unit test selection) and dual prefix-hit block table sharing vs isolation on CPU.

Complements tests/core/test_prefix_cache.py; relates to hidden-state prefix caching (vllm-project#2164).

Signed-off-by: hongzhigao <761417898@qq.com>
@chatgpt-codex-connector
Copy link
Copy Markdown

Codex usage limits have been reached for code reviews. Please check with the admins of this repo to increase the limits by adding credits.
Credits must be used to enable repository wide code reviews.

@hsliuustc0106
Copy link
Copy Markdown
Collaborator

BLOCKER scan (test-only PR):

  • Correctness of test logic: PASS

The test coverage looks good:

  • Proper mocking of vLLM's InputBatch.block_table
  • Tests GPU → CPU coercion for cache operations
  • Tests dual prefix-hit scenarios (shared and distinct block tables)
  • Appropriate pytest markers (cuda, cpu, L4)
  • Clear assertions for both shape and value correctness

VERDICT: COMMENT (no blockers)


Minor nit: The test file path uses but the test is in . Consider if this should be or if is appropriate here.

@amy-why-3459
Copy link
Copy Markdown
Contributor

Thank you very much for your contribution. @yenuo26 PTAL

Align test placement with core-layer ownership and reviewer feedback.
Keep coverage unchanged for GPU->CPU merge path and dual prefix-hit scenarios.

Signed-off-by: hongzhigao <761417898@qq.com>
@hongzhi-gao
Copy link
Copy Markdown
Author

hongzhi-gao commented Apr 20, 2026

BLOCKER scan (test-only PR):

  • Correctness of test logic: PASS

The test coverage looks good:

  • Proper mocking of vLLM's InputBatch.block_table
  • Tests GPU → CPU coercion for cache operations
  • Tests dual prefix-hit scenarios (shared and distinct block tables)
  • Appropriate pytest markers (cuda, cpu, L4)
  • Clear assertions for both shape and value correctness

VERDICT: COMMENT (no blockers)

Minor nit: The test file path uses but the test is in . Consider if this should be or if is appropriate here.

Thanks for the nit! I moved the test from tests/worker/test_hidden_state_prefix_cache.py to tests/core/test_hidden_state_prefix_cache.py to align with layer ownership.
Test logic/coverage remains unchanged. PTAL again. @hsliuustc0106

@hongzhi-gao
Copy link
Copy Markdown
Author

Thank you very much for your contribution. @yenuo26 PTAL

Thanks for the review and coordination.

@yenuo26 yenuo26 added the ready label to trigger buildkite CI label Apr 20, 2026
@hsliuustc0106 hsliuustc0106 removed the ready label to trigger buildkite CI label Apr 29, 2026
@hsliuustc0106 hsliuustc0106 requested a review from Copilot April 29, 2026 01:04
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds new correctness tests covering hidden-state prefix caching merge behavior in OmniTensorPrefixCache, focusing on the “cached prefix + newly computed tail” merge path and multi-request prefix-hit scenarios.

Changes:

  • Add a CUDA-focused test that exercises GPU→CPU coercion during cache update/read and validates merged hidden-state sequence correctness.
  • Add a CPU test that validates dual prefix-hit behavior for requests sharing the same cached blocks vs using distinct cached blocks.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +1 to +10
"""Hidden-state prefix cache correctness tests for OmniTensorPrefixCache merge paths."""

from types import SimpleNamespace

import pytest
import torch

from vllm_omni.core.prefix_cache import OmniTensorPrefixCache

pytestmark = [pytest.mark.core_model]
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants