feat(kv-router): Python-first semantic KV cache provider interface by zbennett10 · Pull Request #7520 · ai-dynamo/dynamo

zbennett10 · 2026-03-19T13:23:49Z

Summary

Whole point of this is to start to define the interface for semantic KV cache lookup which will help provider further speeds to prefill.

Defines a minimal, provider-agnostic SemanticKvCacheProvider Protocol that lets any implementation (embeddings, learned routers, etc.) plug into Dynamo's KV routing. When RadixTree exact-prefix matching misses (e.g., same document, different instruction), the semantic provider finds cached prompts with similar content. The router then queries the RadixTree with the donor's tokens to locate the worker holding reusable KV blocks.

There is a lot more to do in this space. We have proven this up to the router level (we have our own "semantic router") with our own implementation of a provider that we use.

Files

File	Purpose
`lib/bindings/python/src/dynamo/llm/semantic_kv.py`	`SemanticKvCacheProvider` Protocol + `SemanticMatch` dataclass
`lib/bindings/python/src/dynamo/llm/semantic_kv_simple.py`	Reference impl (Jaccard similarity, thread-safe, time-bounded)
`lib/bindings/python/tests/test_semantic_kv.py`	17 unit tests + RadixTree integration test
`lib/bindings/python/tests/test_semantic_kv_e2e.sh`	E2E smoke test script
`components/src/dynamo/router/semantic.py`	Provider factory with registry
`components/src/dynamo/router/__main__.py`	Semantic pre-routing + donor registration
`components/src/dynamo/common/configuration/groups/kv_router_args.py`	`--semantic-kv-provider` CLI arg
`lib/bindings/python/src/dynamo/llm/__init__.py`	Exports

Interface

@runtime_checkable
class SemanticKvCacheProvider(Protocol):
    async def find_semantic_match(self, token_ids, prompt_text=None) -> Optional[SemanticMatch]: ...
    async def register_donor(self, donor_id, token_ids, prompt_text=None) -> None: ...
    def on_eviction(self, donor_id) -> None: ...

Validation

E2E validated on EKS with SemBlend (https://github.com/WorldFlowAI/semblend) - 3.4x speedup at 8K context

copy-pr-bot · 2026-03-19T13:23:54Z

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

github-actions · 2026-03-19T13:23:59Z

👋 Hi zbennett10! Thank you for contributing to ai-dynamo/dynamo.

Just a reminder: The NVIDIA Test Github Validation CI runs an essential subset of the testing framework to quickly catch errors.Your PR reviewers may elect to test the changes comprehensively before approving your changes.

🚀

zbennett10 · 2026-03-19T13:27:33Z

cc @kkranen / @harryskim related to question at Dynamo talk at GTC regarding semantic kv-cache reuse.

coderabbitai · 2026-03-19T13:30:28Z

Walkthrough

Adding semantic KV cache lookup capabilities by introducing new public modules, configuration structures, and a generic SemanticKvIndexer wrapper that augments the existing indexer with semantic matching logic using an injected SemanticCacheLookupProvider.

Changes

Cohort / File(s)	Summary
Module Exports `lib/kv-router/src/indexer/mod.rs`	Added three new public submodules: `semantic`, `semantic_config`, and `semantic_indexer`.
Semantic Abstractions `lib/kv-router/src/indexer/semantic.rs`	Introduced `SemanticMatchResult` struct, `SemanticCacheLookupProvider` trait with semantic cache lookup and registration methods, and a lightweight `SemanticConfig` struct with `enabled` and `overlap_threshold` fields.
Semantic Configuration `lib/kv-router/src/indexer/semantic_config.rs`	Added detailed `SemanticConfig` struct with `Serialize`/`Deserialize` support for configuring semantic search behavior, including thresholds, embedding endpoints, block sizes, and feature toggles.
Semantic Indexer Implementation `lib/kv-router/src/indexer/semantic_indexer.rs`	Implemented `SemanticKvIndexer<I>` generic wrapper around `KvIndexerInterface` with semantic-aware `find_matches_for_request` logic that performs exact lookups, conditionally augments with semantic matching, merges donor and exact scores, and tracks statistics across multiple branches. Includes unit tests with mock implementations.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~50 minutes

🚥 Pre-merge checks | ✅ 1 | ❌ 2

❌ Failed checks (2 warnings)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 25.81% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.
Title check	⚠️ Warning	Title describes Python-first semantic KV interface, but raw_summary and pr_objectives show Rust trait additions to lib/kv-router/src/indexer/, creating a significant mismatch between claimed intent and actual changeset.	Clarify whether the PR adds Rust traits (as shown in raw_summary) or pivots to Python-only as the title suggests. Update title to match actual changes.

✅ Passed checks (1 passed)

Check name	Status	Explanation
Description check	✅ Passed	PR description provides clear overview, specific file changes, and implementation details, but lacks explicit Related Issues references and structured reviewer guidance.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 2

🧹 Nitpick comments (2)

lib/kv-router/src/indexer/semantic_indexer.rs (2)
213-217: Test gap: MockProvider always returns None, leaving semantic-hit path untested.

The current tests verify passthrough, disabled config, short-token skip, and delegation. However, since MockProvider.find_semantic_match always returns None, the semantic hit path (lines 112-131) including the merge logic is never exercised.

Consider adding a test with a provider that returns Some(SemanticMatchResult { ... }) to verify:

Donor verification via inner indexer

Score merging when donor_best > best_overlap

semantic_hits counter increment
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@lib/kv-router/src/indexer/semantic_indexer.rs` around lines 213 - 217, Add a
new test that replaces the current MockProvider (which always returns None) with
a provider implementing SemanticCacheLookupProvider whose find_semantic_match
returns Some(SemanticMatchResult { id, tokens, score, ... }) so the semantic-hit
branch (lines handling donor verification, score merging with donor_best >
best_overlap, and incrementing semantic_hits) is exercised; in the test, stub or
mock the inner indexer to validate donor verification via inner indexer methods,
assert that when donor_best > best_overlap the merge logic updates the chosen
match and that the semantic_hits counter is incremented accordingly (refer to
MockProvider, find_semantic_match, SemanticMatchResult, donor_best,
best_overlap, semantic_hits, and the inner indexer used by the indexer under
test).
98-102: Consider making the minimum token threshold configurable.

The 100-token minimum is hardcoded. If this threshold needs tuning for different workloads, consider adding it to SemanticConfig (similar to min_tokens_for_semantic in the unused semantic_config.rs).

This is a minor suggestion; the hardcoded value is reasonable for now.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@lib/kv-router/src/indexer/semantic_indexer.rs` around lines 98 - 102, Replace
the hardcoded 100-token cutoff in the tokens.len() check inside
semantic_indexer.rs with a configurable threshold from SemanticConfig: add a
min_tokens_for_semantic (default 100) to the SemanticConfig struct (or reuse the
existing field in semantic_config.rs), update any constructors/builders to set a
default, and then reference config.min_tokens_for_semantic in the if
tokens.len() < ... check (preserving the self.stats.semantic_misses.fetch_add
call and returning exact_scores); ensure all places that construct
SemanticIndexer or read SemanticConfig are updated to supply or inherit the
default.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@lib/kv-router/src/indexer/semantic_config.rs`:
- Line 1: The file semantic_config.rs is missing the standard SPDX copyright
header used across the crate; add the same top-of-file header present in other
files in this crate (the SPDX identifier and copyright notice) to the very
beginning of lib/kv-router/src/indexer/semantic_config.rs so it matches the
project's header format and lints will pass.
- Around line 5-39: There are two conflicting SemanticConfig definitions (the
8-field struct in semantic_config.rs and the 2-field struct in semantic.rs)
causing ambiguity and an unused public export; remove the duplicate by
consolidating into a single canonical config used by the indexer (or rename
semantic_config.rs's struct to SemanticProviderConfig if it truly represents a
different concept), then update all references/imports (notably the
semantic_indexer.rs import of super::semantic::{SemanticCacheLookupProvider,
SemanticConfig}) to point to the unified/renamed type and ensure the default
values (enabled default and overlap/min_similarity fields) are reconciled so
only one authoritative config and default set exist.

---

Nitpick comments:
In `@lib/kv-router/src/indexer/semantic_indexer.rs`:
- Around line 213-217: Add a new test that replaces the current MockProvider
(which always returns None) with a provider implementing
SemanticCacheLookupProvider whose find_semantic_match returns
Some(SemanticMatchResult { id, tokens, score, ... }) so the semantic-hit branch
(lines handling donor verification, score merging with donor_best >
best_overlap, and incrementing semantic_hits) is exercised; in the test, stub or
mock the inner indexer to validate donor verification via inner indexer methods,
assert that when donor_best > best_overlap the merge logic updates the chosen
match and that the semantic_hits counter is incremented accordingly (refer to
MockProvider, find_semantic_match, SemanticMatchResult, donor_best,
best_overlap, semantic_hits, and the inner indexer used by the indexer under
test).
- Around line 98-102: Replace the hardcoded 100-token cutoff in the tokens.len()
check inside semantic_indexer.rs with a configurable threshold from
SemanticConfig: add a min_tokens_for_semantic (default 100) to the
SemanticConfig struct (or reuse the existing field in semantic_config.rs),
update any constructors/builders to set a default, and then reference
config.min_tokens_for_semantic in the if tokens.len() < ... check (preserving
the self.stats.semantic_misses.fetch_add call and returning exact_scores);
ensure all places that construct SemanticIndexer or read SemanticConfig are
updated to supply or inherit the default.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 8bfe83e5-31e9-4025-bd1c-2f7463fd6fdd

📥 Commits

Reviewing files that changed from the base of the PR and between 967ba9a and f4436fb.

📒 Files selected for processing (4)

lib/kv-router/src/indexer/mod.rs
lib/kv-router/src/indexer/semantic.rs
lib/kv-router/src/indexer/semantic_config.rs
lib/kv-router/src/indexer/semantic_indexer.rs

PeaBrane · 2026-03-20T19:54:29Z

Thanks for your contribution!

A quick thought on direction here: before pushing this further into Rust, it may be worth checking the CodeRabbit comments and the CI feedback first, then considering whether the right first step is to prototype this in the with our Python-binded KvRouter instead.

If the semantic-match idea really needs extra provider-side plumbing and heuristics anyway, a Python-side example implementation could be a cleaner way to prove the value and UX end to end before downstreaming a more fixed interface into Rust core.

zbennett10 · 2026-03-22T01:31:51Z

Thanks for your contribution!

A quick thought on direction here: before pushing this further into Rust, it may be worth checking the CodeRabbit comments and the CI feedback first, then considering whether the right first step is to prototype this in the with our Python-binded KvRouter instead.

If the semantic-match idea really needs extra provider-side plumbing and heuristics anyway, a Python-side example implementation could be a cleaner way to prove the value and UX end to end before downstreaming a more fixed interface into Rust core.

@PeaBrane Thank you for the feedback! That makes sense. Yes I believe it needs a provider-like interface for things like an embedding donor store (for semantic similarity search, etc.) Great - I will look at that. The general idea here is that this interface will start to get the ball rolling in terms of future advancements related to fleet-level semantic KV cache reuse at scale.

zbennett10 · 2026-03-27T19:57:48Z

Thanks for your contribution!

A quick thought on direction here: before pushing this further into Rust, it may be worth checking the CodeRabbit comments and the CI feedback first, then considering whether the right first step is to prototype this in the with our Python-binded KvRouter instead.

If the semantic-match idea really needs extra provider-side plumbing and heuristics anyway, a Python-side example implementation could be a cleaner way to prove the value and UX end to end before downstreaming a more fixed interface into Rust core.

I've updated this @PeaBrane - let me know your thoughts. Thanks

zbennett10 · 2026-04-03T18:41:23Z

@PeaBrane anything else you need from me on this? Just checking in. Thanks

Replace the Rust-only semantic KV cache lookup from the previous iteration with a Python-first approach, as requested by reviewers. Adds: - SemanticKvCacheProvider Protocol (provider-agnostic interface) - SemanticMatch frozen dataclass (lookup result) - SimpleSemanticProvider reference implementation (Jaccard similarity, thread-safe, time-bounded O(N) scan, LRU via OrderedDict) - 17 unit tests + RadixTree integration test + e2e smoke script - --semantic-kv-provider CLI arg (Python-only, does not flow to Rust) - Router integration: semantic pre-routing + donor registration The interface is intentionally minimal: no embedding model, no vector store, no similarity thresholds prescribed. Implementations provide their own matching strategy. The reference impl demonstrates the contract without external dependencies. Zero Rust changes. The existing KvIndexerInterface trait and RadixTree are unchanged — semantic lookup augments routing at the Python level. Signed-off-by: Zach Bennett <zach@worldflowai.com>

zbennett10 requested a review from a team as a code owner March 19, 2026 13:23

pull-request-size Bot added the size/L label Mar 19, 2026

github-actions Bot added feat external-contribution Pull request is from an external contributor labels Mar 19, 2026

coderabbitai Bot reviewed Mar 19, 2026

View reviewed changes

Comment thread lib/kv-router/src/indexer/semantic_config.rs Outdated

Comment thread lib/kv-router/src/indexer/semantic_config.rs Outdated

zbennett10 mentioned this pull request Mar 19, 2026

[Core] Add register_model() to KVConnectorBase_V1 for CacheBlend vllm-project/vllm#37339

Open

zbennett10 changed the title ~~feat(kv-router): Add SemanticCacheLookupProvider trait for approximate KV cache matching~~ feat(kv-router): Python-first semantic KV cache provider interface Mar 27, 2026

zbennett10 force-pushed the feat/semantic-kv-cache-lookup branch from f4436fb to e96c7c4 Compare March 27, 2026 19:54

zbennett10 requested a review from a team as a code owner March 27, 2026 19:54

zbennett10 requested a review from a team March 27, 2026 19:54

pull-request-size Bot added size/XL and removed size/L labels Mar 27, 2026

github-actions Bot added the router Relates to routing, KV-aware routing, etc. label Mar 27, 2026

zbennett10 force-pushed the feat/semantic-kv-cache-lookup branch from e96c7c4 to 31929cd Compare March 27, 2026 19:56

zbennett10 force-pushed the feat/semantic-kv-cache-lookup branch 2 times, most recently from cea1c25 to d6a393c Compare March 27, 2026 20:02

pull-request-size Bot added size/XXL and removed size/XL labels Mar 27, 2026

zbennett10 force-pushed the feat/semantic-kv-cache-lookup branch from d6a393c to 46d1298 Compare March 27, 2026 20:04

pull-request-size Bot added size/XL and removed size/XXL labels Mar 27, 2026

zbennett10 force-pushed the feat/semantic-kv-cache-lookup branch from 46d1298 to ed16953 Compare April 14, 2026 12:50

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(kv-router): Python-first semantic KV cache provider interface#7520

feat(kv-router): Python-first semantic KV cache provider interface#7520
zbennett10 wants to merge 1 commit intoai-dynamo:mainfrom
WorldFlowAI:feat/semantic-kv-cache-lookup

zbennett10 commented Mar 19, 2026 •

edited

Loading

Uh oh!

copy-pr-bot Bot commented Mar 19, 2026

Uh oh!

github-actions Bot commented Mar 19, 2026

Uh oh!

zbennett10 commented Mar 19, 2026

Uh oh!

coderabbitai Bot commented Mar 19, 2026 •

edited

Loading

❌ Failed checks (2 warnings)

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Uh oh!

Uh oh!

PeaBrane commented Mar 20, 2026 •

edited

Loading

Uh oh!

zbennett10 commented Mar 22, 2026 •

edited

Loading

Uh oh!

zbennett10 commented Mar 27, 2026

Uh oh!

zbennett10 commented Apr 3, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

zbennett10 commented Mar 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Files

Interface

Validation

Uh oh!

copy-pr-bot Bot commented Mar 19, 2026

Uh oh!

github-actions Bot commented Mar 19, 2026

Uh oh!

zbennett10 commented Mar 19, 2026

Uh oh!

coderabbitai Bot commented Mar 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

❌ Failed checks (2 warnings)

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

PeaBrane commented Mar 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

zbennett10 commented Mar 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

zbennett10 commented Mar 27, 2026

Uh oh!

zbennett10 commented Apr 3, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

zbennett10 commented Mar 19, 2026 •

edited

Loading

coderabbitai Bot commented Mar 19, 2026 •

edited

Loading

PeaBrane commented Mar 20, 2026 •

edited

Loading

zbennett10 commented Mar 22, 2026 •

edited

Loading