feat(kv-router): Python-first semantic KV cache provider interface#7520
feat(kv-router): Python-first semantic KV cache provider interface#7520zbennett10 wants to merge 1 commit intoai-dynamo:mainfrom
Conversation
|
👋 Hi zbennett10! Thank you for contributing to ai-dynamo/dynamo. Just a reminder: The 🚀 |
|
cc @kkranen / @harryskim related to question at Dynamo talk at GTC regarding semantic kv-cache reuse. |
WalkthroughAdding semantic KV cache lookup capabilities by introducing new public modules, configuration structures, and a generic Changes
Estimated code review effort🎯 4 (Complex) | ⏱️ ~50 minutes 🚥 Pre-merge checks | ✅ 1 | ❌ 2❌ Failed checks (2 warnings)
✅ Passed checks (1 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 2
🧹 Nitpick comments (2)
lib/kv-router/src/indexer/semantic_indexer.rs (2)
213-217: Test gap:MockProvideralways returnsNone, leaving semantic-hit path untested.The current tests verify passthrough, disabled config, short-token skip, and delegation. However, since
MockProvider.find_semantic_matchalways returnsNone, the semantic hit path (lines 112-131) including the merge logic is never exercised.Consider adding a test with a provider that returns
Some(SemanticMatchResult { ... })to verify:
- Donor verification via inner indexer
- Score merging when
donor_best > best_overlapsemantic_hitscounter increment🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@lib/kv-router/src/indexer/semantic_indexer.rs` around lines 213 - 217, Add a new test that replaces the current MockProvider (which always returns None) with a provider implementing SemanticCacheLookupProvider whose find_semantic_match returns Some(SemanticMatchResult { id, tokens, score, ... }) so the semantic-hit branch (lines handling donor verification, score merging with donor_best > best_overlap, and incrementing semantic_hits) is exercised; in the test, stub or mock the inner indexer to validate donor verification via inner indexer methods, assert that when donor_best > best_overlap the merge logic updates the chosen match and that the semantic_hits counter is incremented accordingly (refer to MockProvider, find_semantic_match, SemanticMatchResult, donor_best, best_overlap, semantic_hits, and the inner indexer used by the indexer under test).
98-102: Consider making the minimum token threshold configurable.The 100-token minimum is hardcoded. If this threshold needs tuning for different workloads, consider adding it to
SemanticConfig(similar tomin_tokens_for_semanticin the unusedsemantic_config.rs).This is a minor suggestion; the hardcoded value is reasonable for now.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@lib/kv-router/src/indexer/semantic_indexer.rs` around lines 98 - 102, Replace the hardcoded 100-token cutoff in the tokens.len() check inside semantic_indexer.rs with a configurable threshold from SemanticConfig: add a min_tokens_for_semantic (default 100) to the SemanticConfig struct (or reuse the existing field in semantic_config.rs), update any constructors/builders to set a default, and then reference config.min_tokens_for_semantic in the if tokens.len() < ... check (preserving the self.stats.semantic_misses.fetch_add call and returning exact_scores); ensure all places that construct SemanticIndexer or read SemanticConfig are updated to supply or inherit the default.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@lib/kv-router/src/indexer/semantic_config.rs`:
- Line 1: The file semantic_config.rs is missing the standard SPDX copyright
header used across the crate; add the same top-of-file header present in other
files in this crate (the SPDX identifier and copyright notice) to the very
beginning of lib/kv-router/src/indexer/semantic_config.rs so it matches the
project's header format and lints will pass.
- Around line 5-39: There are two conflicting SemanticConfig definitions (the
8-field struct in semantic_config.rs and the 2-field struct in semantic.rs)
causing ambiguity and an unused public export; remove the duplicate by
consolidating into a single canonical config used by the indexer (or rename
semantic_config.rs's struct to SemanticProviderConfig if it truly represents a
different concept), then update all references/imports (notably the
semantic_indexer.rs import of super::semantic::{SemanticCacheLookupProvider,
SemanticConfig}) to point to the unified/renamed type and ensure the default
values (enabled default and overlap/min_similarity fields) are reconciled so
only one authoritative config and default set exist.
---
Nitpick comments:
In `@lib/kv-router/src/indexer/semantic_indexer.rs`:
- Around line 213-217: Add a new test that replaces the current MockProvider
(which always returns None) with a provider implementing
SemanticCacheLookupProvider whose find_semantic_match returns
Some(SemanticMatchResult { id, tokens, score, ... }) so the semantic-hit branch
(lines handling donor verification, score merging with donor_best >
best_overlap, and incrementing semantic_hits) is exercised; in the test, stub or
mock the inner indexer to validate donor verification via inner indexer methods,
assert that when donor_best > best_overlap the merge logic updates the chosen
match and that the semantic_hits counter is incremented accordingly (refer to
MockProvider, find_semantic_match, SemanticMatchResult, donor_best,
best_overlap, semantic_hits, and the inner indexer used by the indexer under
test).
- Around line 98-102: Replace the hardcoded 100-token cutoff in the tokens.len()
check inside semantic_indexer.rs with a configurable threshold from
SemanticConfig: add a min_tokens_for_semantic (default 100) to the
SemanticConfig struct (or reuse the existing field in semantic_config.rs),
update any constructors/builders to set a default, and then reference
config.min_tokens_for_semantic in the if tokens.len() < ... check (preserving
the self.stats.semantic_misses.fetch_add call and returning exact_scores);
ensure all places that construct SemanticIndexer or read SemanticConfig are
updated to supply or inherit the default.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 8bfe83e5-31e9-4025-bd1c-2f7463fd6fdd
📒 Files selected for processing (4)
lib/kv-router/src/indexer/mod.rslib/kv-router/src/indexer/semantic.rslib/kv-router/src/indexer/semantic_config.rslib/kv-router/src/indexer/semantic_indexer.rs
|
Thanks for your contribution! A quick thought on direction here: before pushing this further into Rust, it may be worth checking the CodeRabbit comments and the CI feedback first, then considering whether the right first step is to prototype this in the with our Python-binded If the semantic-match idea really needs extra provider-side plumbing and heuristics anyway, a Python-side example implementation could be a cleaner way to prove the value and UX end to end before downstreaming a more fixed interface into Rust core. |
@PeaBrane Thank you for the feedback! That makes sense. Yes I believe it needs a provider-like interface for things like an embedding donor store (for semantic similarity search, etc.) Great - I will look at that. The general idea here is that this interface will start to get the ball rolling in terms of future advancements related to fleet-level semantic KV cache reuse at scale. |
f4436fb to
e96c7c4
Compare
e96c7c4 to
31929cd
Compare
I've updated this @PeaBrane - let me know your thoughts. Thanks |
cea1c25 to
d6a393c
Compare
d6a393c to
46d1298
Compare
|
@PeaBrane anything else you need from me on this? Just checking in. Thanks |
Replace the Rust-only semantic KV cache lookup from the previous iteration with a Python-first approach, as requested by reviewers. Adds: - SemanticKvCacheProvider Protocol (provider-agnostic interface) - SemanticMatch frozen dataclass (lookup result) - SimpleSemanticProvider reference implementation (Jaccard similarity, thread-safe, time-bounded O(N) scan, LRU via OrderedDict) - 17 unit tests + RadixTree integration test + e2e smoke script - --semantic-kv-provider CLI arg (Python-only, does not flow to Rust) - Router integration: semantic pre-routing + donor registration The interface is intentionally minimal: no embedding model, no vector store, no similarity thresholds prescribed. Implementations provide their own matching strategy. The reference impl demonstrates the contract without external dependencies. Zero Rust changes. The existing KvIndexerInterface trait and RadixTree are unchanged — semantic lookup augments routing at the Python level. Signed-off-by: Zach Bennett <zach@worldflowai.com>
46d1298 to
ed16953
Compare
Summary
Whole point of this is to start to define the interface for semantic KV cache lookup which will help provider further speeds to prefill.
Defines a minimal, provider-agnostic
SemanticKvCacheProviderProtocol that lets any implementation (embeddings, learned routers, etc.) plug into Dynamo's KV routing. When RadixTree exact-prefix matching misses (e.g., same document, different instruction), the semantic provider finds cached prompts with similar content. The router then queries the RadixTree with the donor's tokens to locate the worker holding reusable KV blocks.There is a lot more to do in this space. We have proven this up to the router level (we have our own "semantic router") with our own implementation of a provider that we use.
Files
lib/bindings/python/src/dynamo/llm/semantic_kv.pySemanticKvCacheProviderProtocol +SemanticMatchdataclasslib/bindings/python/src/dynamo/llm/semantic_kv_simple.pylib/bindings/python/tests/test_semantic_kv.pylib/bindings/python/tests/test_semantic_kv_e2e.shcomponents/src/dynamo/router/semantic.pycomponents/src/dynamo/router/__main__.pycomponents/src/dynamo/common/configuration/groups/kv_router_args.py--semantic-kv-providerCLI arglib/bindings/python/src/dynamo/llm/__init__.pyInterface
Validation