[Core] feat(mem_cache): add SemanticPrefixProvider hook to RadixCache by zbennett10 · Pull Request #20806 · sgl-project/sglang

zbennett10 · 2026-03-18T02:13:43Z

Summary

This PR adds a SemanticPrefixProvider interface to RadixCache that enables approximate/semantic KV cache matching as a fallback when the exact radix-tree lookup returns zero cached tokens.

Motivation: When prompts are semantically similar but lexically different (different instruction wording, sentence ordering, or template fields), the exact radix-tree lookup returns 0 hits. A semantic provider can identify a donor request whose KV is already resident and suggest its token IDs for lookup — eliminating redundant prefill computation without any changes to the core attention or KV storage mechanisms.

Use cases this enables:

Semantic KV sharing (e.g. SemBlend): look up semantically similar documents already in the cache
Fuzzy prefix matching: tolerate small edits at prefix boundaries
RAG-aware caching: reuse cached KV for retrieved contexts with varied instruction phrasing
Topic-based KV sharing: share computation across requests with the same subject matter

Changes

New file: python/sglang/srt/mem_cache/semantic_prefix.py

Two public types:

SemanticPrefixResult dataclass — carries alternate_token_ids, num_cached_tokens, skip_insert, metadata, and source_id
SemanticPrefixProvider ABC — on_prefix_miss(rid, token_ids), on_request_cached(rid, token_ids), on_init(), on_shutdown()

Modified: python/sglang/srt/mem_cache/radix_cache.py

RadixCache.__init__: adds self._semantic_provider = None
RadixCache.set_semantic_provider(provider): registers a provider, calls on_init()
RadixCache.match_prefix: refactored to call _match_prefix_exact, then apply semantic fallback when result is empty and params.req is available
RadixCache._match_prefix_exact: extracted inner implementation (no semantic fallback) — existing callers that call match_prefix without params.req are unaffected
RadixCache.cache_finished_req: calls provider.on_request_cached after a successful insert so the provider can register the request as a future donor

Zero behavioral change when no provider is registered — the _semantic_provider attribute defaults to None and all new code paths are guarded by self._semantic_provider is not None.

Tests

New file: test/srt/test_semantic_prefix_provider.py — 29 unit tests covering:

TestSetSemanticProvider (6 tests): provider lifecycle, on_init called once, clearing with None, replacing
TestMatchPrefixNoProvider (3 tests): baseline exact-match behavior unchanged
TestMatchPrefixSemanticFallback (10 tests): provider called only on miss, not called without params.req, returns None → cold prefill, alternate tokens looked up, extra_key preserved, exception propagation, source_id logging
TestMatchPrefixExact (3 tests): _match_prefix_exact never calls provider
TestOnRequestCachedHook (4 tests): called on insert, not on skip-insert, not without provider, correct token IDs
TestMultipleRequests (2 tests): independence across requests, provider replacement

All 29 tests pass against the fork (validated in cloud on A10G).

New file: test/srt/conftest.py — sparse-checkout helper that stubs sglang.lang.* so mem_cache tests can run without a full install during local development. Has no effect in CI where the full package is installed.

Design Notes

on_prefix_miss is called synchronously inside the scheduler step so it must be fast. Heavy embedding/similarity search should be done asynchronously and results staged before the call.
The fallback only activates when params.req is not None, so internal callers that pass only key (e.g. cache_unfinished_req) are never affected.
_match_prefix_exact is a public-but-private-style method (_ prefix) so integrations can call it directly to bypass semantic fallback in specific scenarios.

Test Plan

29 unit tests pass (CPU-only, no GPU required)
RadixCache.create_simulated() works with provider registered and unregistered
No regressions to exact-match behavior (verified by TestMatchPrefixNoProvider)
Full SGLang test suite (CI)

Adds a first-class SemanticPrefixProvider interface that allows external systems to provide approximate/semantic KV cache matches when the exact radix-tree lookup returns zero cached tokens. Changes: - New file: python/sglang/srt/mem_cache/semantic_prefix.py Abstract base class SemanticPrefixProvider with on_prefix_miss, on_request_cached, on_init, and on_shutdown hooks. SemanticPrefixResult dataclass carries alternate_token_ids, hit hint, skip_insert flag, opaque metadata, and optional source_id for logging. - python/sglang/srt/mem_cache/radix_cache.py RadixCache.set_semantic_provider(provider): register/clear the provider. RadixCache.match_prefix: calls _match_prefix_exact; if result is empty and a provider is registered and params.req is available, calls provider.on_prefix_miss and re-runs exact lookup with alternate tokens. RadixCache._match_prefix_exact: extracted inner implementation (no semantic fallback), callable independently. RadixCache.cache_finished_req: calls provider.on_request_cached after a successful insert so the provider can register the request as a future donor. - test/srt/test_semantic_prefix_provider.py 34 unit tests covering: set_semantic_provider lifecycle, exact-hit passthrough, miss-triggered callback, None return fallback, alternate- token lookup, extra_key preservation, exception propagation, source_id logging, on_request_cached hook, and multi-request independence. - test/srt/conftest.py Sparse-checkout helper: stubs sglang.lang.* so mem_cache tests can run without a full SGLang install during local development.

gemini-code-assist · 2026-03-18T02:13:47Z

Warning

You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again!

hzh0425 · 2026-03-18T03:07:37Z

Hi, @zbennett10 can I reach you on the sglang Slack?

zbennett10 requested review from Ying1123, hanming-lu, hnyls2002, hzh0425, ispobock, merrymercy, xiezhq-hermann and yizhang2077 as code owners March 18, 2026 02:13

hzh0425 self-assigned this Mar 18, 2026

This was referenced Mar 19, 2026

feat(kv-router): Python-first semantic KV cache provider interface ai-dynamo/dynamo#7520

Open

[Core] Add register_model() to KVConnectorBase_V1 for CacheBlend vllm-project/vllm#37339

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Core] feat(mem_cache): add SemanticPrefixProvider hook to RadixCache#20806

[Core] feat(mem_cache): add SemanticPrefixProvider hook to RadixCache#20806
zbennett10 wants to merge 1 commit intosgl-project:mainfrom
WorldFlowAI:semblend/semantic-prefix-provider

zbennett10 commented Mar 18, 2026 •

edited

Loading

Uh oh!

gemini-code-assist Bot commented Mar 18, 2026

Uh oh!

hzh0425 commented Mar 18, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

zbennett10 commented Mar 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Changes

Tests

Design Notes

Related

Test Plan

Uh oh!

gemini-code-assist Bot commented Mar 18, 2026

Uh oh!

hzh0425 commented Mar 18, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

zbennett10 commented Mar 18, 2026 •

edited

Loading