feat: enable ephemeral kv cache sessions via sglang by ishandhanani · Pull Request #7384 · ai-dynamo/dynamo

ishandhanani · 2026-03-14T20:28:51Z

Summary

Extract session affinity into a standalone StickySessionRouter with a trait-based AffinityStore (in-memory default, pluggable for Redis/etcd in multi-router deployments)
Slim AgentController to session lifecycle RPCs only (open/close) -- removed cache_control client, PinAction, and the affinity DashMap
Replace the fire-and-forget pin_prefix RPC with inline retention_seconds injection -- cache control TTL now flows as a field on the generate request, enabling SGLang's priority-based eviction with time decay
Remove dead pin_prefix/cache_control Python handler code

Data flow

sequenceDiagram
    participant Client
    participant Preprocessor
    participant StickyRouter as StickySessionRouter
    participant KVRouter as KV Router
    participant AgentCtrl as AgentController
    participant Worker as SGLang Worker
    participant Cache as Radix Cache

    Client->>Preprocessor: nvext.cache_control{ttl: "5m"}<br/>nvext.agent_hints{priority: 50}<br/>nvext.session_params{id: "sub-1", rid}

    Preprocessor->>Preprocessor: Extract routing hints<br/>cache_control_ttl=300<br/>priority=50

    Preprocessor->>StickyRouter: resolve(session_params.id="sub-1")
    StickyRouter-->>KVRouter: worker_42 (from affinity table)
    Note over StickyRouter: Refreshes TTL on hit<br/>(sliding window)

    KVRouter->>KVRouter: select_worker() -> worker_42<br/>(pinned by sticky affinity)

    KVRouter->>AgentCtrl: on_routed(request, worker_42)
    Note over AgentCtrl: Open: fire open_session RPC +<br/>sticky.bind("sub-1", 42, ttl)<br/>Close: sticky.unbind + defer close

    KVRouter->>KVRouter: Inject extra_args.retention_seconds=300<br/>from cache_control_ttl

    KVRouter->>Worker: async_generate(<br/>  retention_seconds=300,<br/>  priority=50,<br/>  session_params={id, rid})

    Worker->>Cache: Insert with priority=50,<br/>retention_duration=300s
    Note over Cache: Survives over priority=0 blocks<br/>Decays to 0 after 5min idle

    Worker-->>Client: Stream response tokens

    Note over KVRouter,Worker: On stream end (RequestGuard)
    KVRouter-)Worker: close_session("sub-1")<br/>[fire-and-forget, if deferred]

Sticky session routing

The StickySessionRouter is a pure routing-layer abstraction -- no event plane, no I/O. It maintains a session_id -> worker_id mapping with sliding-window TTL (refreshed on every resolve).

pub trait AffinityStore: Send + Sync {
    fn get(&self, session_id: &str) -> Option<u64>;
    fn put(&self, session_id: &str, worker_id: u64, ttl: Duration);
    fn remove(&self, session_id: &str);
}

The default InMemoryAffinityStore uses a DashMap with a background reaper. The trait is designed so that Redis/etcd/NATS KV backends can be swapped in for multi-router deployments where affinity needs to be shared across router instances.

Key changes

File	Change
`lib/llm/src/kv_router/sticky_sessions.rs`	NEW -- `AffinityStore` trait, `InMemoryAffinityStore`, `StickySessionRouter`, 6 unit tests
`lib/llm/src/kv_router/agent_controller.rs`	Slimmed to session lifecycle RPCs only. Removed cache_control client, PinAction, affinity DashMap
`lib/llm/src/kv_router/push_router.rs`	Wires `StickySessionRouter` + `AgentController` + `retention_seconds` injection
`components/.../handler_base.py`	Removed `pin_prefix`, `cache_control` methods and route registration. Added `_retention_kwargs`
`components/.../decode_handler.py`	Forwards `retention_seconds` from `extra_args` to `async_generate()`
`docs/backends/sglang/agents.md`	Updated cache pinning -> cache retention docs

Test plan

cargo test -p dynamo-llm --lib -- 768 passed (6 new sticky_sessions tests)
ruff check components/ -- clean
Manual: launch SGLang agg with --enable-cache-control --radix-eviction-policy priority, send multi-turn requests with session_params, verify sticky routing in logs
Manual: send request with cache_control: {type: "ephemeral", ttl: "5m"}, verify retention_seconds=300 in SGLang generate call

coderabbitai · 2026-03-14T20:43:01Z

Walkthrough

The changes introduce streaming KV session lifecycle management across the request pipeline. New protocol types define session control actions, runtime infrastructure provides client connectivity and session state management, and request handlers expose endpoints for opening and closing sessions.

Changes

Cohort / File(s)	Summary
Session Control Protocol Types `lib/llm/src/protocols/openai/nvext.rs`, `lib/llm/src/protocols/common/preprocessor.rs`	Adds SessionControl struct with action and session_id, SessionAction enum (Open/Close), session_control and session_params fields to NvExt, and session_control field to RoutingHints.
Session Control Client & Infrastructure `lib/llm/src/kv_router/session_control.rs`, `lib/llm/src/kv_router.rs`	Defines SessionControlClient type and spawn_open_session/spawn_close_session functions for fire-and-forget session operations; module re-exports new items.
KV Router Session Integration `lib/llm/src/kv_router/push_router.rs`	Introduces SessionCloseState struct for deferred session closing, extends KvPushRouter with lazy-initialized session_control_cell, integrates session open/close actions into RequestGuard lifecycle.
Request Processing & Handler Endpoints `lib/llm/src/preprocessor.rs`, `components/src/dynamo/sglang/request_handlers/handler_base.py`, `components/src/dynamo/sglang/request_handlers/llm/decode_handler.py`	Forwards session params and control through preprocessor into requests, adds open_session/close_session/session_control handler methods and endpoint registration, extracts and propagates session_params to engine.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~28 minutes

Poem

🐰 A router hops through sessions new,
Opening, closing, keeping true,
KV streams in flowing dance,
State machines with deferred chance!
Sessions bloom where queries soar,
Ready now to handle more! ✨

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Description check	⚠️ Warning	The PR description does not match the PR objectives. The description discusses extracting session affinity and removing cache_control, but the objectives describe adding session_control endpoint for KV isolation.	Replace or significantly revise the description to align with the actual changes: adding session_control endpoint support, SessionControl/SessionAction types in nvext, and session lifecycle forwarding through router and worker.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Docstring Coverage	✅ Passed	Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Title check	✅ Passed	The title 'feat: enable ephemeral kv cache sessions via sglang' accurately captures the main feature being added: enabling ephemeral KV cache sessions using SGLang, which aligns with the PR's motivation of supporting session-scoped KV for subagent isolation.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

Tip

You can disable the changed files summary in the walkthrough.

Disable the reviews.changed_files_summary setting to disable the changed files summary in the walkthrough.

coderabbitai

Actionable comments posted: 4

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@lib/llm/src/kv_router/push_router.rs`:
- Around line 520-539: The current session-close logic in the async block that
builds session_close_state (symbols: session_close_state, SessionCloseState,
session_control, SessionAction::Close, instance_id) can schedule Close against
whatever instance_id was selected earlier and may hit the wrong worker; change
the code to verify worker affinity before creating sc_client: if the incoming
request already includes backend_instance_id or a phase-specific pinned worker,
allow the close; otherwise perform a session-to-worker lookup (or consult
session metadata) to resolve the correct worker for sc.session_id and compare it
with instance_id, and if they differ return None / early-fail so the close is
not sent to a wrong worker; use the same component/client creation flow
(chooser.client().endpoint.component(),
session_control_cell.get_or_try_init(...), create_session_control_client) only
after confirming affinity matches, and ensure SessionCloseState contains the
resolved instance id and sc_client for the pinned worker.
- Around line 501-517: The branch that handles session_control ==
Some(SessionAction::Open) must fail fast instead of logging and continuing when
the router flag is off (self.session_control_cell is None) or when
create_session_control_client(...) fails; update the session open handling in
push_router.rs so that if sc.action == SessionAction::Open and either
self.session_control_cell.is_none() or cell.get_or_try_init(...).await returns
Err, the function returns an Err (propagate an appropriate error) rather than
falling through; keep use of create_session_control_client,
session_control_cell, SessionAction::Open, and spawn_open_session, but change
the error path to return an error immediately with a clear message indicating
session open failed.
- Around line 168-170: The drop path must mirror finish() by sending the
explicit close before freeing scheduler state: in RequestGuard::drop() check
self.session_close_state and, if present, call
spawn_close_session(&state.sc_client, &state.session_id, state.instance_id,
&self.context_id) (or otherwise invoke the same close routine used by finish()),
but guard against double-closing by atomically taking or marking the
session_close_state as consumed (e.g., swap Option to None or use an AtomicBool)
so finish() and Drop cannot both send the close; ensure existing scheduler
cleanup still runs after the close call.

In `@lib/llm/src/preprocessor.rs`:
- Around line 251-267: preprocess_request() currently forwards
nvext().session_params into preprocessed.extra_args but the
NvCreateCompletionRequest path builds common_request directly and drops
session_params; update the NvCreateCompletionRequest handling to reuse the same
propagation logic: after creating common_request (or the result of
builder.build()/preprocessed), detect request.nvext().and_then(|ext|
ext.session_params.clone()) and insert it into common_request.extra_args as a
JSON object field "session_params" (matching the existing pattern that checks
serde_json::Value::Object and calls map.insert), or alternatively explicitly
reject nvext.session_params for completions; modify the code around where
common_request is constructed in the NvCreateCompletionRequest flow to apply
this same insertion using the same symbols (session_params, extra_args,
common_request/NvCreateCompletionRequest handling).

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: ed29e2d0-290d-473e-a148-ce42108bb358

📥 Commits

Reviewing files that changed from the base of the PR and between ed939f0 and 50e34c0.

📒 Files selected for processing (8)

components/src/dynamo/sglang/request_handlers/handler_base.py
components/src/dynamo/sglang/request_handlers/llm/decode_handler.py
lib/llm/src/kv_router.rs
lib/llm/src/kv_router/push_router.rs
lib/llm/src/kv_router/session_control.rs
lib/llm/src/preprocessor.rs
lib/llm/src/protocols/common/preprocessor.rs
lib/llm/src/protocols/openai/nvext.rs

- Add StickySessionRouter with trait-based AffinityStore for session affinity (in-memory default, pluggable for multi-router deployments) - Add AgentController for session lifecycle RPCs (open/close) with synchronous open to ensure session exists before first request - Replace fire-and-forget pin_prefix RPC with inline retention_seconds injection, enabling SGLang priority-based eviction with time decay - Register session_control as a discoverable service endpoint - Forward session_params through preprocessor to backend - Add SessionControl, SessionParams, SessionAction types to nvext - Update SGLang imports to use sglang.srt.utils.network - Update docs: cache pinning -> cache retention

ishandhanani · 2026-03-27T13:45:21Z

Superseded by the split PRs:

fix: improve GLM 4.7 responses handling for Codex #7666 fix: improve GLM 4.7 responses handling for Codex
feat(sglang): add ephemeral KV session routing #7665 feat: add ephemeral KV session routing on top of GLM responses fixes

Closing this combined PR so review stays on the smaller scoped branches.

ishandhanani requested a review from a team as a code owner March 14, 2026 20:28

ishandhanani requested a review from a team March 14, 2026 20:28

ishandhanani requested a review from a team as a code owner March 14, 2026 20:28

pull-request-size Bot added the size/L label Mar 14, 2026

github-actions Bot added feat backend::sglang Relates to the sglang backend frontend `python -m dynamo.frontend` and `dynamo-run in=http|text|grpc` router Relates to routing, KV-aware routing, etc. labels Mar 14, 2026

coderabbitai Bot reviewed Mar 14, 2026

View reviewed changes

Comment thread lib/llm/src/kv_router/push_router.rs Outdated

Comment thread lib/llm/src/kv_router/push_router.rs Outdated

Comment thread lib/llm/src/kv_router/push_router.rs Outdated

Comment thread lib/llm/src/preprocessor.rs Outdated

pull-request-size Bot added size/XL and removed size/L labels Mar 14, 2026

copy-pr-bot Bot temporarily deployed to GITLAB March 14, 2026 20:54 Inactive

copy-pr-bot Bot temporarily deployed to GITLAB March 14, 2026 20:55 Inactive

copy-pr-bot Bot temporarily deployed to GITLAB March 14, 2026 20:58 Inactive

copy-pr-bot Bot temporarily deployed to GITLAB March 14, 2026 21:00 Inactive

copy-pr-bot Bot temporarily deployed to GITLAB March 14, 2026 21:08 Inactive

copy-pr-bot Bot temporarily deployed to GITLAB March 14, 2026 21:09 Inactive

copy-pr-bot Bot temporarily deployed to GITLAB March 14, 2026 21:42 Inactive

copy-pr-bot Bot temporarily deployed to GITLAB March 14, 2026 21:43 Inactive

copy-pr-bot Bot temporarily deployed to GITLAB March 14, 2026 21:47 Inactive

copy-pr-bot Bot temporarily deployed to GITLAB March 14, 2026 21:48 Inactive

github-actions Bot added the documentation Improvements or additions to documentation label Mar 14, 2026

ishandhanani changed the title ~~feat: session_control endpoint for subagent KV isolation~~ refactor: extract StickySessionRouter, replace pin_prefix RPC with retention_seconds Mar 20, 2026

github-actions Bot added refactor and removed feat labels Mar 20, 2026

ishandhanani requested a review from a team as a code owner March 20, 2026 21:07

pull-request-size Bot added size/XXL and removed size/XL labels Mar 20, 2026

copy-pr-bot Bot temporarily deployed to GITLAB March 20, 2026 21:07 Inactive

ishandhanani force-pushed the idhanani/session-control-event-plane branch from 0ad0e98 to fd0c25c Compare March 20, 2026 22:22

copy-pr-bot Bot temporarily deployed to GITLAB March 20, 2026 22:22 Inactive

copy-pr-bot Bot temporarily deployed to GITLAB March 20, 2026 22:26 Inactive

ishandhanani force-pushed the idhanani/session-control-event-plane branch from fd0c25c to 1504f07 Compare March 20, 2026 22:44

copy-pr-bot Bot temporarily deployed to GITLAB March 20, 2026 22:44 Inactive

copy-pr-bot Bot temporarily deployed to GITLAB March 20, 2026 22:51 Inactive

ishandhanani force-pushed the idhanani/session-control-event-plane branch from 1504f07 to 93bd5cb Compare March 20, 2026 22:52

copy-pr-bot Bot temporarily deployed to GITLAB March 20, 2026 22:52 Inactive

ishandhanani force-pushed the idhanani/session-control-event-plane branch from 93bd5cb to 03b74cc Compare March 20, 2026 22:57

copy-pr-bot Bot temporarily deployed to GITLAB March 20, 2026 22:57 Inactive

ishandhanani force-pushed the idhanani/session-control-event-plane branch from 03b74cc to 80b5d67 Compare March 20, 2026 23:10

copy-pr-bot Bot temporarily deployed to GITLAB March 20, 2026 23:10 Inactive

ishandhanani force-pushed the idhanani/session-control-event-plane branch from 80b5d67 to 0b8d705 Compare March 20, 2026 23:23

copy-pr-bot Bot temporarily deployed to GITLAB March 20, 2026 23:23 Inactive

copy-pr-bot Bot temporarily deployed to GITLAB March 20, 2026 23:43 Inactive

feat: wire session control into streaming sessions

4bc281c

copy-pr-bot Bot temporarily deployed to GITLAB March 21, 2026 05:18 Inactive

copy-pr-bot Bot temporarily deployed to GITLAB March 21, 2026 05:24 Inactive

glm47 for codex

33e12b8

copy-pr-bot Bot temporarily deployed to GITLAB March 25, 2026 20:00 Inactive

copy-pr-bot Bot temporarily deployed to GITLAB March 25, 2026 20:01 Inactive

Fix GLM parser recovery for Codex responses

f4eb891

copy-pr-bot Bot temporarily deployed to GITLAB March 25, 2026 20:10 Inactive

copy-pr-bot Bot temporarily deployed to GITLAB March 25, 2026 20:11 Inactive

Improve GLM responses handling for Codex

2ea2af8

copy-pr-bot Bot temporarily deployed to GITLAB March 25, 2026 21:32 Inactive

copy-pr-bot Bot temporarily deployed to GITLAB March 25, 2026 21:36 Inactive

ishandhanani closed this Mar 27, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: enable ephemeral kv cache sessions via sglang #7384

feat: enable ephemeral kv cache sessions via sglang #7384
ishandhanani wants to merge 5 commits into
mainfrom
idhanani/session-control-event-plane

ishandhanani commented Mar 14, 2026 •

edited

Loading

Uh oh!

coderabbitai Bot commented Mar 14, 2026 •

edited

Loading

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ishandhanani commented Mar 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

ishandhanani commented Mar 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Data flow

Sticky session routing

Key changes

Test plan

Uh oh!

coderabbitai Bot commented Mar 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Poem

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ishandhanani commented Mar 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

ishandhanani commented Mar 14, 2026 •

edited

Loading

coderabbitai Bot commented Mar 14, 2026 •

edited

Loading