[Core] feat(mem_cache): add SemanticPrefixProvider hook to RadixCache#20806
Open
zbennett10 wants to merge 1 commit intosgl-project:mainfrom
Open
[Core] feat(mem_cache): add SemanticPrefixProvider hook to RadixCache#20806zbennett10 wants to merge 1 commit intosgl-project:mainfrom
zbennett10 wants to merge 1 commit intosgl-project:mainfrom
Conversation
Adds a first-class SemanticPrefixProvider interface that allows external systems to provide approximate/semantic KV cache matches when the exact radix-tree lookup returns zero cached tokens. Changes: - New file: python/sglang/srt/mem_cache/semantic_prefix.py Abstract base class SemanticPrefixProvider with on_prefix_miss, on_request_cached, on_init, and on_shutdown hooks. SemanticPrefixResult dataclass carries alternate_token_ids, hit hint, skip_insert flag, opaque metadata, and optional source_id for logging. - python/sglang/srt/mem_cache/radix_cache.py RadixCache.set_semantic_provider(provider): register/clear the provider. RadixCache.match_prefix: calls _match_prefix_exact; if result is empty and a provider is registered and params.req is available, calls provider.on_prefix_miss and re-runs exact lookup with alternate tokens. RadixCache._match_prefix_exact: extracted inner implementation (no semantic fallback), callable independently. RadixCache.cache_finished_req: calls provider.on_request_cached after a successful insert so the provider can register the request as a future donor. - test/srt/test_semantic_prefix_provider.py 34 unit tests covering: set_semantic_provider lifecycle, exact-hit passthrough, miss-triggered callback, None return fallback, alternate- token lookup, extra_key preservation, exception propagation, source_id logging, on_request_cached hook, and multi-request independence. - test/srt/conftest.py Sparse-checkout helper: stubs sglang.lang.* so mem_cache tests can run without a full SGLang install during local development.
Contributor
|
Warning You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again! |
Collaborator
|
Hi, @zbennett10 can I reach you on the sglang Slack? |
This was referenced Mar 19, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
This PR adds a
SemanticPrefixProviderinterface toRadixCachethat enables approximate/semantic KV cache matching as a fallback when the exact radix-tree lookup returns zero cached tokens.Motivation: When prompts are semantically similar but lexically different (different instruction wording, sentence ordering, or template fields), the exact radix-tree lookup returns 0 hits. A semantic provider can identify a donor request whose KV is already resident and suggest its token IDs for lookup — eliminating redundant prefill computation without any changes to the core attention or KV storage mechanisms.
Use cases this enables:
Changes
New file:
python/sglang/srt/mem_cache/semantic_prefix.pyTwo public types:
SemanticPrefixResultdataclass — carriesalternate_token_ids,num_cached_tokens,skip_insert,metadata, andsource_idSemanticPrefixProviderABC —on_prefix_miss(rid, token_ids),on_request_cached(rid, token_ids),on_init(),on_shutdown()Modified:
python/sglang/srt/mem_cache/radix_cache.pyRadixCache.__init__: addsself._semantic_provider = NoneRadixCache.set_semantic_provider(provider): registers a provider, callson_init()RadixCache.match_prefix: refactored to call_match_prefix_exact, then apply semantic fallback when result is empty andparams.reqis availableRadixCache._match_prefix_exact: extracted inner implementation (no semantic fallback) — existing callers that callmatch_prefixwithoutparams.reqare unaffectedRadixCache.cache_finished_req: callsprovider.on_request_cachedafter a successful insert so the provider can register the request as a future donorZero behavioral change when no provider is registered — the
_semantic_providerattribute defaults toNoneand all new code paths are guarded byself._semantic_provider is not None.Tests
New file:
test/srt/test_semantic_prefix_provider.py— 29 unit tests covering:TestSetSemanticProvider(6 tests): provider lifecycle,on_initcalled once, clearing withNone, replacingTestMatchPrefixNoProvider(3 tests): baseline exact-match behavior unchangedTestMatchPrefixSemanticFallback(10 tests): provider called only on miss, not called withoutparams.req, returnsNone→ cold prefill, alternate tokens looked up,extra_keypreserved, exception propagation,source_idloggingTestMatchPrefixExact(3 tests):_match_prefix_exactnever calls providerTestOnRequestCachedHook(4 tests): called on insert, not on skip-insert, not without provider, correct token IDsTestMultipleRequests(2 tests): independence across requests, provider replacementAll 29 tests pass against the fork (validated in cloud on A10G).
New file:
test/srt/conftest.py— sparse-checkout helper that stubssglang.lang.*so mem_cache tests can run without a full install during local development. Has no effect in CI where the full package is installed.Design Notes
on_prefix_missis called synchronously inside the scheduler step so it must be fast. Heavy embedding/similarity search should be done asynchronously and results staged before the call.params.req is not None, so internal callers that pass onlykey(e.g.cache_unfinished_req) are never affected._match_prefix_exactis a public-but-private-style method (_prefix) so integrations can call it directly to bypass semantic fallback in specific scenarios.Related
This interface is analogous to the
SemanticLookupProviderinterface being proposed for LMCache (LMCache/LMCache#2803).Test Plan
RadixCache.create_simulated()works with provider registered and unregisteredTestMatchPrefixNoProvider)