Skip to content

feat: enable ephemeral kv cache sessions via sglang #7384

Closed
ishandhanani wants to merge 5 commits into
mainfrom
idhanani/session-control-event-plane
Closed

feat: enable ephemeral kv cache sessions via sglang #7384
ishandhanani wants to merge 5 commits into
mainfrom
idhanani/session-control-event-plane

Conversation

@ishandhanani
Copy link
Copy Markdown
Contributor

@ishandhanani ishandhanani commented Mar 14, 2026

Summary

  • Extract session affinity into a standalone StickySessionRouter with a trait-based AffinityStore (in-memory default, pluggable for Redis/etcd in multi-router deployments)
  • Slim AgentController to session lifecycle RPCs only (open/close) -- removed cache_control client, PinAction, and the affinity DashMap
  • Replace the fire-and-forget pin_prefix RPC with inline retention_seconds injection -- cache control TTL now flows as a field on the generate request, enabling SGLang's priority-based eviction with time decay
  • Remove dead pin_prefix/cache_control Python handler code

Data flow

sequenceDiagram
    participant Client
    participant Preprocessor
    participant StickyRouter as StickySessionRouter
    participant KVRouter as KV Router
    participant AgentCtrl as AgentController
    participant Worker as SGLang Worker
    participant Cache as Radix Cache

    Client->>Preprocessor: nvext.cache_control{ttl: "5m"}<br/>nvext.agent_hints{priority: 50}<br/>nvext.session_params{id: "sub-1", rid}

    Preprocessor->>Preprocessor: Extract routing hints<br/>cache_control_ttl=300<br/>priority=50

    Preprocessor->>StickyRouter: resolve(session_params.id="sub-1")
    StickyRouter-->>KVRouter: worker_42 (from affinity table)
    Note over StickyRouter: Refreshes TTL on hit<br/>(sliding window)

    KVRouter->>KVRouter: select_worker() -> worker_42<br/>(pinned by sticky affinity)

    KVRouter->>AgentCtrl: on_routed(request, worker_42)
    Note over AgentCtrl: Open: fire open_session RPC +<br/>sticky.bind("sub-1", 42, ttl)<br/>Close: sticky.unbind + defer close

    KVRouter->>KVRouter: Inject extra_args.retention_seconds=300<br/>from cache_control_ttl

    KVRouter->>Worker: async_generate(<br/>  retention_seconds=300,<br/>  priority=50,<br/>  session_params={id, rid})

    Worker->>Cache: Insert with priority=50,<br/>retention_duration=300s
    Note over Cache: Survives over priority=0 blocks<br/>Decays to 0 after 5min idle

    Worker-->>Client: Stream response tokens

    Note over KVRouter,Worker: On stream end (RequestGuard)
    KVRouter-)Worker: close_session("sub-1")<br/>[fire-and-forget, if deferred]
Loading

Sticky session routing

The StickySessionRouter is a pure routing-layer abstraction -- no event plane, no I/O. It maintains a session_id -> worker_id mapping with sliding-window TTL (refreshed on every resolve).

pub trait AffinityStore: Send + Sync {
    fn get(&self, session_id: &str) -> Option<u64>;
    fn put(&self, session_id: &str, worker_id: u64, ttl: Duration);
    fn remove(&self, session_id: &str);
}

The default InMemoryAffinityStore uses a DashMap with a background reaper. The trait is designed so that Redis/etcd/NATS KV backends can be swapped in for multi-router deployments where affinity needs to be shared across router instances.

Key changes

File Change
lib/llm/src/kv_router/sticky_sessions.rs NEW -- AffinityStore trait, InMemoryAffinityStore, StickySessionRouter, 6 unit tests
lib/llm/src/kv_router/agent_controller.rs Slimmed to session lifecycle RPCs only. Removed cache_control client, PinAction, affinity DashMap
lib/llm/src/kv_router/push_router.rs Wires StickySessionRouter + AgentController + retention_seconds injection
components/.../handler_base.py Removed pin_prefix, cache_control methods and route registration. Added _retention_kwargs
components/.../decode_handler.py Forwards retention_seconds from extra_args to async_generate()
docs/backends/sglang/agents.md Updated cache pinning -> cache retention docs

Test plan

  • cargo test -p dynamo-llm --lib -- 768 passed (6 new sticky_sessions tests)
  • ruff check components/ -- clean
  • Manual: launch SGLang agg with --enable-cache-control --radix-eviction-policy priority, send multi-turn requests with session_params, verify sticky routing in logs
  • Manual: send request with cache_control: {type: "ephemeral", ttl: "5m"}, verify retention_seconds=300 in SGLang generate call

@ishandhanani ishandhanani requested a review from a team as a code owner March 14, 2026 20:28
@ishandhanani ishandhanani requested a review from a team March 14, 2026 20:28
@ishandhanani ishandhanani requested a review from a team as a code owner March 14, 2026 20:28
@github-actions github-actions Bot added feat backend::sglang Relates to the sglang backend frontend `python -m dynamo.frontend` and `dynamo-run in=http|text|grpc` router Relates to routing, KV-aware routing, etc. labels Mar 14, 2026
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Mar 14, 2026

Walkthrough

The changes introduce streaming KV session lifecycle management across the request pipeline. New protocol types define session control actions, runtime infrastructure provides client connectivity and session state management, and request handlers expose endpoints for opening and closing sessions.

Changes

Cohort / File(s) Summary
Session Control Protocol Types
lib/llm/src/protocols/openai/nvext.rs, lib/llm/src/protocols/common/preprocessor.rs
Adds SessionControl struct with action and session_id, SessionAction enum (Open/Close), session_control and session_params fields to NvExt, and session_control field to RoutingHints.
Session Control Client & Infrastructure
lib/llm/src/kv_router/session_control.rs, lib/llm/src/kv_router.rs
Defines SessionControlClient type and spawn_open_session/spawn_close_session functions for fire-and-forget session operations; module re-exports new items.
KV Router Session Integration
lib/llm/src/kv_router/push_router.rs
Introduces SessionCloseState struct for deferred session closing, extends KvPushRouter with lazy-initialized session_control_cell, integrates session open/close actions into RequestGuard lifecycle.
Request Processing & Handler Endpoints
lib/llm/src/preprocessor.rs, components/src/dynamo/sglang/request_handlers/handler_base.py, components/src/dynamo/sglang/request_handlers/llm/decode_handler.py
Forwards session params and control through preprocessor into requests, adds open_session/close_session/session_control handler methods and endpoint registration, extracts and propagates session_params to engine.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~28 minutes

Poem

🐰 A router hops through sessions new,
Opening, closing, keeping true,
KV streams in flowing dance,
State machines with deferred chance!
Sessions bloom where queries soar,
Ready now to handle more!

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Description check ⚠️ Warning The PR description does not match the PR objectives. The description discusses extracting session affinity and removing cache_control, but the objectives describe adding session_control endpoint for KV isolation. Replace or significantly revise the description to align with the actual changes: adding session_control endpoint support, SessionControl/SessionAction types in nvext, and session lifecycle forwarding through router and worker.
✅ Passed checks (2 passed)
Check name Status Explanation
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Title check ✅ Passed The title 'feat: enable ephemeral kv cache sessions via sglang' accurately captures the main feature being added: enabling ephemeral KV cache sessions using SGLang, which aligns with the PR's motivation of supporting session-scoped KV for subagent isolation.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Tip

You can disable the changed files summary in the walkthrough.

Disable the reviews.changed_files_summary setting to disable the changed files summary in the walkthrough.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@lib/llm/src/kv_router/push_router.rs`:
- Around line 520-539: The current session-close logic in the async block that
builds session_close_state (symbols: session_close_state, SessionCloseState,
session_control, SessionAction::Close, instance_id) can schedule Close against
whatever instance_id was selected earlier and may hit the wrong worker; change
the code to verify worker affinity before creating sc_client: if the incoming
request already includes backend_instance_id or a phase-specific pinned worker,
allow the close; otherwise perform a session-to-worker lookup (or consult
session metadata) to resolve the correct worker for sc.session_id and compare it
with instance_id, and if they differ return None / early-fail so the close is
not sent to a wrong worker; use the same component/client creation flow
(chooser.client().endpoint.component(),
session_control_cell.get_or_try_init(...), create_session_control_client) only
after confirming affinity matches, and ensure SessionCloseState contains the
resolved instance id and sc_client for the pinned worker.
- Around line 501-517: The branch that handles session_control ==
Some(SessionAction::Open) must fail fast instead of logging and continuing when
the router flag is off (self.session_control_cell is None) or when
create_session_control_client(...) fails; update the session open handling in
push_router.rs so that if sc.action == SessionAction::Open and either
self.session_control_cell.is_none() or cell.get_or_try_init(...).await returns
Err, the function returns an Err (propagate an appropriate error) rather than
falling through; keep use of create_session_control_client,
session_control_cell, SessionAction::Open, and spawn_open_session, but change
the error path to return an error immediately with a clear message indicating
session open failed.
- Around line 168-170: The drop path must mirror finish() by sending the
explicit close before freeing scheduler state: in RequestGuard::drop() check
self.session_close_state and, if present, call
spawn_close_session(&state.sc_client, &state.session_id, state.instance_id,
&self.context_id) (or otherwise invoke the same close routine used by finish()),
but guard against double-closing by atomically taking or marking the
session_close_state as consumed (e.g., swap Option to None or use an AtomicBool)
so finish() and Drop cannot both send the close; ensure existing scheduler
cleanup still runs after the close call.

In `@lib/llm/src/preprocessor.rs`:
- Around line 251-267: preprocess_request() currently forwards
nvext().session_params into preprocessed.extra_args but the
NvCreateCompletionRequest path builds common_request directly and drops
session_params; update the NvCreateCompletionRequest handling to reuse the same
propagation logic: after creating common_request (or the result of
builder.build()/preprocessed), detect request.nvext().and_then(|ext|
ext.session_params.clone()) and insert it into common_request.extra_args as a
JSON object field "session_params" (matching the existing pattern that checks
serde_json::Value::Object and calls map.insert), or alternatively explicitly
reject nvext.session_params for completions; modify the code around where
common_request is constructed in the NvCreateCompletionRequest flow to apply
this same insertion using the same symbols (session_params, extra_args,
common_request/NvCreateCompletionRequest handling).

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: ed29e2d0-290d-473e-a148-ce42108bb358

📥 Commits

Reviewing files that changed from the base of the PR and between ed939f0 and 50e34c0.

📒 Files selected for processing (8)
  • components/src/dynamo/sglang/request_handlers/handler_base.py
  • components/src/dynamo/sglang/request_handlers/llm/decode_handler.py
  • lib/llm/src/kv_router.rs
  • lib/llm/src/kv_router/push_router.rs
  • lib/llm/src/kv_router/session_control.rs
  • lib/llm/src/preprocessor.rs
  • lib/llm/src/protocols/common/preprocessor.rs
  • lib/llm/src/protocols/openai/nvext.rs

Comment thread lib/llm/src/kv_router/push_router.rs Outdated
Comment thread lib/llm/src/kv_router/push_router.rs Outdated
Comment thread lib/llm/src/kv_router/push_router.rs Outdated
Comment thread lib/llm/src/preprocessor.rs Outdated
@github-actions github-actions Bot added the documentation Improvements or additions to documentation label Mar 14, 2026
@ishandhanani ishandhanani changed the title feat: session_control endpoint for subagent KV isolation refactor: extract StickySessionRouter, replace pin_prefix RPC with retention_seconds Mar 20, 2026
@github-actions github-actions Bot added refactor and removed feat labels Mar 20, 2026
@ishandhanani ishandhanani requested a review from a team as a code owner March 20, 2026 21:07
@ishandhanani ishandhanani force-pushed the idhanani/session-control-event-plane branch from 0ad0e98 to fd0c25c Compare March 20, 2026 22:22
@ishandhanani ishandhanani force-pushed the idhanani/session-control-event-plane branch from fd0c25c to 1504f07 Compare March 20, 2026 22:44
@ishandhanani ishandhanani force-pushed the idhanani/session-control-event-plane branch from 1504f07 to 93bd5cb Compare March 20, 2026 22:52
@ishandhanani ishandhanani force-pushed the idhanani/session-control-event-plane branch from 93bd5cb to 03b74cc Compare March 20, 2026 22:57
@ishandhanani ishandhanani force-pushed the idhanani/session-control-event-plane branch from 03b74cc to 80b5d67 Compare March 20, 2026 23:10
- Add StickySessionRouter with trait-based AffinityStore for session
  affinity (in-memory default, pluggable for multi-router deployments)
- Add AgentController for session lifecycle RPCs (open/close) with
  synchronous open to ensure session exists before first request
- Replace fire-and-forget pin_prefix RPC with inline retention_seconds
  injection, enabling SGLang priority-based eviction with time decay
- Register session_control as a discoverable service endpoint
- Forward session_params through preprocessor to backend
- Add SessionControl, SessionParams, SessionAction types to nvext
- Update SGLang imports to use sglang.srt.utils.network
- Update docs: cache pinning -> cache retention
@ishandhanani ishandhanani force-pushed the idhanani/session-control-event-plane branch from 80b5d67 to 0b8d705 Compare March 20, 2026 23:23
@ishandhanani
Copy link
Copy Markdown
Contributor Author

Superseded by the split PRs:

Closing this combined PR so review stays on the smaller scoped branches.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

backend::sglang Relates to the sglang backend documentation Improvements or additions to documentation feat frontend `python -m dynamo.frontend` and `dynamo-run in=http|text|grpc` router Relates to routing, KV-aware routing, etc. size/XXL

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant