Skip to content

feat(kv-router): Python-first semantic KV cache provider interface#7520

Open
zbennett10 wants to merge 1 commit intoai-dynamo:mainfrom
WorldFlowAI:feat/semantic-kv-cache-lookup
Open

feat(kv-router): Python-first semantic KV cache provider interface#7520
zbennett10 wants to merge 1 commit intoai-dynamo:mainfrom
WorldFlowAI:feat/semantic-kv-cache-lookup

Conversation

@zbennett10
Copy link
Copy Markdown

@zbennett10 zbennett10 commented Mar 19, 2026

Summary

Whole point of this is to start to define the interface for semantic KV cache lookup which will help provider further speeds to prefill.

Defines a minimal, provider-agnostic SemanticKvCacheProvider Protocol that lets any implementation (embeddings, learned routers, etc.) plug into Dynamo's KV routing. When RadixTree exact-prefix matching misses (e.g., same document, different instruction), the semantic provider finds cached prompts with similar content. The router then queries the RadixTree with the donor's tokens to locate the worker holding reusable KV blocks.

There is a lot more to do in this space. We have proven this up to the router level (we have our own "semantic router") with our own implementation of a provider that we use.

Files

File Purpose
lib/bindings/python/src/dynamo/llm/semantic_kv.py SemanticKvCacheProvider Protocol + SemanticMatch dataclass
lib/bindings/python/src/dynamo/llm/semantic_kv_simple.py Reference impl (Jaccard similarity, thread-safe, time-bounded)
lib/bindings/python/tests/test_semantic_kv.py 17 unit tests + RadixTree integration test
lib/bindings/python/tests/test_semantic_kv_e2e.sh E2E smoke test script
components/src/dynamo/router/semantic.py Provider factory with registry
components/src/dynamo/router/__main__.py Semantic pre-routing + donor registration
components/src/dynamo/common/configuration/groups/kv_router_args.py --semantic-kv-provider CLI arg
lib/bindings/python/src/dynamo/llm/__init__.py Exports

Interface

@runtime_checkable
class SemanticKvCacheProvider(Protocol):
    async def find_semantic_match(self, token_ids, prompt_text=None) -> Optional[SemanticMatch]: ...
    async def register_donor(self, donor_id, token_ids, prompt_text=None) -> None: ...
    def on_eviction(self, donor_id) -> None: ...

Validation

@zbennett10 zbennett10 requested a review from a team as a code owner March 19, 2026 13:23
@copy-pr-bot
Copy link
Copy Markdown

copy-pr-bot Bot commented Mar 19, 2026

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@github-actions
Copy link
Copy Markdown
Contributor

👋 Hi zbennett10! Thank you for contributing to ai-dynamo/dynamo.

Just a reminder: The NVIDIA Test Github Validation CI runs an essential subset of the testing framework to quickly catch errors.Your PR reviewers may elect to test the changes comprehensively before approving your changes.

🚀

@github-actions github-actions Bot added feat external-contribution Pull request is from an external contributor labels Mar 19, 2026
@zbennett10
Copy link
Copy Markdown
Author

cc @kkranen / @harryskim related to question at Dynamo talk at GTC regarding semantic kv-cache reuse.

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Mar 19, 2026

Walkthrough

Adding semantic KV cache lookup capabilities by introducing new public modules, configuration structures, and a generic SemanticKvIndexer wrapper that augments the existing indexer with semantic matching logic using an injected SemanticCacheLookupProvider.

Changes

Cohort / File(s) Summary
Module Exports
lib/kv-router/src/indexer/mod.rs
Added three new public submodules: semantic, semantic_config, and semantic_indexer.
Semantic Abstractions
lib/kv-router/src/indexer/semantic.rs
Introduced SemanticMatchResult struct, SemanticCacheLookupProvider trait with semantic cache lookup and registration methods, and a lightweight SemanticConfig struct with enabled and overlap_threshold fields.
Semantic Configuration
lib/kv-router/src/indexer/semantic_config.rs
Added detailed SemanticConfig struct with Serialize/Deserialize support for configuring semantic search behavior, including thresholds, embedding endpoints, block sizes, and feature toggles.
Semantic Indexer Implementation
lib/kv-router/src/indexer/semantic_indexer.rs
Implemented SemanticKvIndexer<I> generic wrapper around KvIndexerInterface with semantic-aware find_matches_for_request logic that performs exact lookups, conditionally augments with semantic matching, merges donor and exact scores, and tracks statistics across multiple branches. Includes unit tests with mock implementations.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~50 minutes

🚥 Pre-merge checks | ✅ 1 | ❌ 2

❌ Failed checks (2 warnings)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 25.81% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
Title check ⚠️ Warning Title describes Python-first semantic KV interface, but raw_summary and pr_objectives show Rust trait additions to lib/kv-router/src/indexer/, creating a significant mismatch between claimed intent and actual changeset. Clarify whether the PR adds Rust traits (as shown in raw_summary) or pivots to Python-only as the title suggests. Update title to match actual changes.
✅ Passed checks (1 passed)
Check name Status Explanation
Description check ✅ Passed PR description provides clear overview, specific file changes, and implementation details, but lacks explicit Related Issues references and structured reviewer guidance.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (2)
lib/kv-router/src/indexer/semantic_indexer.rs (2)

213-217: Test gap: MockProvider always returns None, leaving semantic-hit path untested.

The current tests verify passthrough, disabled config, short-token skip, and delegation. However, since MockProvider.find_semantic_match always returns None, the semantic hit path (lines 112-131) including the merge logic is never exercised.

Consider adding a test with a provider that returns Some(SemanticMatchResult { ... }) to verify:

  1. Donor verification via inner indexer
  2. Score merging when donor_best > best_overlap
  3. semantic_hits counter increment
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@lib/kv-router/src/indexer/semantic_indexer.rs` around lines 213 - 217, Add a
new test that replaces the current MockProvider (which always returns None) with
a provider implementing SemanticCacheLookupProvider whose find_semantic_match
returns Some(SemanticMatchResult { id, tokens, score, ... }) so the semantic-hit
branch (lines handling donor verification, score merging with donor_best >
best_overlap, and incrementing semantic_hits) is exercised; in the test, stub or
mock the inner indexer to validate donor verification via inner indexer methods,
assert that when donor_best > best_overlap the merge logic updates the chosen
match and that the semantic_hits counter is incremented accordingly (refer to
MockProvider, find_semantic_match, SemanticMatchResult, donor_best,
best_overlap, semantic_hits, and the inner indexer used by the indexer under
test).

98-102: Consider making the minimum token threshold configurable.

The 100-token minimum is hardcoded. If this threshold needs tuning for different workloads, consider adding it to SemanticConfig (similar to min_tokens_for_semantic in the unused semantic_config.rs).

This is a minor suggestion; the hardcoded value is reasonable for now.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@lib/kv-router/src/indexer/semantic_indexer.rs` around lines 98 - 102, Replace
the hardcoded 100-token cutoff in the tokens.len() check inside
semantic_indexer.rs with a configurable threshold from SemanticConfig: add a
min_tokens_for_semantic (default 100) to the SemanticConfig struct (or reuse the
existing field in semantic_config.rs), update any constructors/builders to set a
default, and then reference config.min_tokens_for_semantic in the if
tokens.len() < ... check (preserving the self.stats.semantic_misses.fetch_add
call and returning exact_scores); ensure all places that construct
SemanticIndexer or read SemanticConfig are updated to supply or inherit the
default.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@lib/kv-router/src/indexer/semantic_config.rs`:
- Line 1: The file semantic_config.rs is missing the standard SPDX copyright
header used across the crate; add the same top-of-file header present in other
files in this crate (the SPDX identifier and copyright notice) to the very
beginning of lib/kv-router/src/indexer/semantic_config.rs so it matches the
project's header format and lints will pass.
- Around line 5-39: There are two conflicting SemanticConfig definitions (the
8-field struct in semantic_config.rs and the 2-field struct in semantic.rs)
causing ambiguity and an unused public export; remove the duplicate by
consolidating into a single canonical config used by the indexer (or rename
semantic_config.rs's struct to SemanticProviderConfig if it truly represents a
different concept), then update all references/imports (notably the
semantic_indexer.rs import of super::semantic::{SemanticCacheLookupProvider,
SemanticConfig}) to point to the unified/renamed type and ensure the default
values (enabled default and overlap/min_similarity fields) are reconciled so
only one authoritative config and default set exist.

---

Nitpick comments:
In `@lib/kv-router/src/indexer/semantic_indexer.rs`:
- Around line 213-217: Add a new test that replaces the current MockProvider
(which always returns None) with a provider implementing
SemanticCacheLookupProvider whose find_semantic_match returns
Some(SemanticMatchResult { id, tokens, score, ... }) so the semantic-hit branch
(lines handling donor verification, score merging with donor_best >
best_overlap, and incrementing semantic_hits) is exercised; in the test, stub or
mock the inner indexer to validate donor verification via inner indexer methods,
assert that when donor_best > best_overlap the merge logic updates the chosen
match and that the semantic_hits counter is incremented accordingly (refer to
MockProvider, find_semantic_match, SemanticMatchResult, donor_best,
best_overlap, semantic_hits, and the inner indexer used by the indexer under
test).
- Around line 98-102: Replace the hardcoded 100-token cutoff in the tokens.len()
check inside semantic_indexer.rs with a configurable threshold from
SemanticConfig: add a min_tokens_for_semantic (default 100) to the
SemanticConfig struct (or reuse the existing field in semantic_config.rs),
update any constructors/builders to set a default, and then reference
config.min_tokens_for_semantic in the if tokens.len() < ... check (preserving
the self.stats.semantic_misses.fetch_add call and returning exact_scores);
ensure all places that construct SemanticIndexer or read SemanticConfig are
updated to supply or inherit the default.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 8bfe83e5-31e9-4025-bd1c-2f7463fd6fdd

📥 Commits

Reviewing files that changed from the base of the PR and between 967ba9a and f4436fb.

📒 Files selected for processing (4)
  • lib/kv-router/src/indexer/mod.rs
  • lib/kv-router/src/indexer/semantic.rs
  • lib/kv-router/src/indexer/semantic_config.rs
  • lib/kv-router/src/indexer/semantic_indexer.rs

Comment thread lib/kv-router/src/indexer/semantic_config.rs Outdated
Comment thread lib/kv-router/src/indexer/semantic_config.rs Outdated
@PeaBrane
Copy link
Copy Markdown
Contributor

PeaBrane commented Mar 20, 2026

Thanks for your contribution!

A quick thought on direction here: before pushing this further into Rust, it may be worth checking the CodeRabbit comments and the CI feedback first, then considering whether the right first step is to prototype this in the with our Python-binded KvRouter instead.

If the semantic-match idea really needs extra provider-side plumbing and heuristics anyway, a Python-side example implementation could be a cleaner way to prove the value and UX end to end before downstreaming a more fixed interface into Rust core.

@zbennett10
Copy link
Copy Markdown
Author

zbennett10 commented Mar 22, 2026

Thanks for your contribution!

A quick thought on direction here: before pushing this further into Rust, it may be worth checking the CodeRabbit comments and the CI feedback first, then considering whether the right first step is to prototype this in the with our Python-binded KvRouter instead.

If the semantic-match idea really needs extra provider-side plumbing and heuristics anyway, a Python-side example implementation could be a cleaner way to prove the value and UX end to end before downstreaming a more fixed interface into Rust core.

@PeaBrane Thank you for the feedback! That makes sense. Yes I believe it needs a provider-like interface for things like an embedding donor store (for semantic similarity search, etc.) Great - I will look at that. The general idea here is that this interface will start to get the ball rolling in terms of future advancements related to fleet-level semantic KV cache reuse at scale.

@zbennett10 zbennett10 changed the title feat(kv-router): Add SemanticCacheLookupProvider trait for approximate KV cache matching feat(kv-router): Python-first semantic KV cache provider interface Mar 27, 2026
@zbennett10 zbennett10 force-pushed the feat/semantic-kv-cache-lookup branch from f4436fb to e96c7c4 Compare March 27, 2026 19:54
@zbennett10 zbennett10 requested a review from a team as a code owner March 27, 2026 19:54
@zbennett10 zbennett10 requested a review from a team March 27, 2026 19:54
@github-actions github-actions Bot added the router Relates to routing, KV-aware routing, etc. label Mar 27, 2026
@zbennett10 zbennett10 force-pushed the feat/semantic-kv-cache-lookup branch from e96c7c4 to 31929cd Compare March 27, 2026 19:56
@zbennett10
Copy link
Copy Markdown
Author

Thanks for your contribution!

A quick thought on direction here: before pushing this further into Rust, it may be worth checking the CodeRabbit comments and the CI feedback first, then considering whether the right first step is to prototype this in the with our Python-binded KvRouter instead.

If the semantic-match idea really needs extra provider-side plumbing and heuristics anyway, a Python-side example implementation could be a cleaner way to prove the value and UX end to end before downstreaming a more fixed interface into Rust core.

I've updated this @PeaBrane - let me know your thoughts. Thanks

@zbennett10 zbennett10 force-pushed the feat/semantic-kv-cache-lookup branch 2 times, most recently from cea1c25 to d6a393c Compare March 27, 2026 20:02
@zbennett10 zbennett10 force-pushed the feat/semantic-kv-cache-lookup branch from d6a393c to 46d1298 Compare March 27, 2026 20:04
@zbennett10
Copy link
Copy Markdown
Author

@PeaBrane anything else you need from me on this? Just checking in. Thanks

Replace the Rust-only semantic KV cache lookup from the previous
iteration with a Python-first approach, as requested by reviewers.

Adds:
- SemanticKvCacheProvider Protocol (provider-agnostic interface)
- SemanticMatch frozen dataclass (lookup result)
- SimpleSemanticProvider reference implementation (Jaccard similarity,
  thread-safe, time-bounded O(N) scan, LRU via OrderedDict)
- 17 unit tests + RadixTree integration test + e2e smoke script
- --semantic-kv-provider CLI arg (Python-only, does not flow to Rust)
- Router integration: semantic pre-routing + donor registration

The interface is intentionally minimal: no embedding model, no vector
store, no similarity thresholds prescribed. Implementations provide
their own matching strategy. The reference impl demonstrates the
contract without external dependencies.

Zero Rust changes. The existing KvIndexerInterface trait and RadixTree
are unchanged — semantic lookup augments routing at the Python level.

Signed-off-by: Zach Bennett <zach@worldflowai.com>
@zbennett10 zbennett10 force-pushed the feat/semantic-kv-cache-lookup branch from 46d1298 to ed16953 Compare April 14, 2026 12:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

external-contribution Pull request is from an external contributor feat router Relates to routing, KV-aware routing, etc. size/XL

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants