Skip to content

refactor(protocols): unwind cache_control from nvext#7790

Merged
ishandhanani merged 4 commits intomainfrom
idhanani/dyn-unwire-cache-control-nvext
Apr 2, 2026
Merged

refactor(protocols): unwind cache_control from nvext#7790
ishandhanani merged 4 commits intomainfrom
idhanani/dyn-unwire-cache-control-nvext

Conversation

@ishandhanani
Copy link
Copy Markdown
Contributor

@ishandhanani ishandhanani commented Apr 1, 2026

sgl-project/sglang#21884 - will approach this a different way moving forward

What changed

  • remove cache_control from the OpenAI nvext request schema
  • stop Anthropic Messages conversion from collapsing cache_control into nvext
  • stop the preprocessor from deriving routing pin TTLs from request nvext
  • drop the now-unused effective_cache_control trait hooks

Why

This isolates the cache_control to nvext unwiring from the larger sticky-session/session-controller work. Anthropic compatibility parsing is kept intact, but it no longer feeds the OpenAI nvext path.

Impact

Requests can still deserialize Anthropic cache_control fields, but they no longer produce nvext.cache_control or router pinning TTLs through the preprocessor.

Validation

  • cargo fmt --all
  • cargo test -p dynamo-llm cache_control -- --nocapture

Summary by CodeRabbit

Release Notes

  • Chores
    • Removed experimental cache pinning feature and related infrastructure.
    • Removed --enable-cache-control configuration flag and DYN_ENABLE_CACHE_CONTROL environment variable.
    • Removed nvext.cache_control API field from request payloads.
    • Updated documentation to reflect removal of cache pinning capabilities.

@github-actions github-actions bot added refactor frontend `python -m dynamo.frontend` and `dynamo-run in=http|text|grpc` labels Apr 1, 2026
@github-actions github-actions bot added the documentation Improvements or additions to documentation label Apr 1, 2026
@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Apr 1, 2026

@pull-request-size pull-request-size bot added size/XL and removed size/L labels Apr 1, 2026
@github-actions github-actions bot added backend::sglang Relates to the sglang backend router Relates to routing, KV-aware routing, etc. labels Apr 1, 2026
@ishandhanani ishandhanani marked this pull request as ready for review April 1, 2026 23:50
@ishandhanani ishandhanani requested review from a team as code owners April 1, 2026 23:50
@ishandhanani ishandhanani enabled auto-merge (squash) April 1, 2026 23:51
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Apr 2, 2026

Walkthrough

This pull request removes the cache control and cache pinning feature from the system, including configuration fields, request handlers, client infrastructure, protocol definitions, and associated documentation across configuration, frontend, routing, and protocol layers.

Changes

Cohort / File(s) Summary
Router Configuration
components/src/dynamo/common/configuration/groups/kv_router_args.py, lib/kv-router/src/scheduling/config.rs
Removed router_enable_cache_control field from KV router config struct and removed corresponding CLI argument --enable-cache-control with environment variable DYN_ENABLE_CACHE_CONTROL.
Frontend Validation
components/src/dynamo/frontend/frontend_args.py
Removed validation check enforcing --enable-cache-control to be used only with --router-mode=kv.
Request Handler Methods
components/src/dynamo/sglang/request_handlers/handler_base.py
Removed pin_prefix() and cache_control() handler methods and unregistered pin_prefix engine route.
Cache Control Infrastructure
lib/llm/src/kv_router/cache_control.rs, lib/llm/src/kv_router/push_router.rs
Deleted entire cache_control module including CacheControlClient, spawn_pin_prefix, and PinState. Removed lazy cache-control support and client initialization from KvPushRouter.
Module Exports
lib/llm/src/kv_router.rs
Removed public module export and re-exports for cache_control, CacheControlClient, and spawn_pin_prefix.
Protocol Definitions
lib/llm/src/protocols/openai/nvext.rs, lib/llm/src/protocols/openai/chat_completions.rs, lib/llm/src/protocols/unified.rs, lib/llm/src/protocols/anthropic/types.rs
Removed cache_control field from NvExt struct, removed effective_cache_control() trait method, removed cache-control derivation logic, and removed related re-exports and unit tests.
Request Routing
lib/llm/src/protocols/common/preprocessor.rs, lib/llm/src/preprocessor.rs
Removed cache_control_ttl field from RoutingHints struct and removed cache-control TTL extraction and propagation logic.
Python Bindings
lib/bindings/python/rust/llm/entrypoint.rs, lib/bindings/python/src/dynamo/_core.pyi
Removed router_enable_cache_control parameter from KvRouterConfig Python constructor signature and type stub.
Documentation
docs/backends/sglang/agents.md, docs/components/frontend/configuration.md, docs/components/frontend/nvext.md, docs/features/agentic_workloads.md
Removed cache pinning documentation sections, removed nvext.cache_control field reference, removed cache pinning from feature matrix, and updated agentic workloads description to remove cache manager references.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~50 minutes

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately and concisely describes the main change: removing cache_control unwiring from nvext across the protocol layer.
Description check ✅ Passed The description provides clear sections covering what changed, why, impact, and validation steps, addressing key requirements from the template structure.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
lib/bindings/python/rust/llm/entrypoint.rs (1)

61-81: Consider making the constructor keyword-only to prevent accidental positional argument misalignment.

Removing router_enable_cache_control from this public PyO3 constructor shifts all subsequent positional parameters. While verification confirms there are currently no positional callers to KvRouterConfig in the codebase, removing a middle parameter from the signature is fragile API design. To protect against future positional calls binding arguments to unintended parameters, consider either preserving a deprecated placeholder in the original slot or marking parameters as keyword-only using * in the signature.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@lib/bindings/python/rust/llm/entrypoint.rs` around lines 61 - 81, The PyO3
constructor signature for fn new currently exposes many positional args and
removing router_enable_cache_control shifted later parameters; to avoid
accidental positional misalignment, update the #[pyo3(signature = (...))] on new
to make parameters keyword-only by inserting a leading "*" (e.g. signature = (*,
overlap_score_weight=1.0, router_temperature=0.0, ...)), or if you prefer to
preserve exact arg order add a deprecated placeholder parameter named
router_enable_cache_control into the signature and function signature to keep
compatibility; apply the change to the new function declaration so callers must
use keywords and positional binding cannot shift parameters.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@lib/llm/src/protocols/unified.rs`:
- Around line 80-84: extract_cache_breakpoints currently only scans system and
message blocks, so top-level Anthropic cache_control is dropped during
UnifiedRequest::try_from; update extract_cache_breakpoints (and/or the
UnifiedRequest::try_from path that builds cache_breakpoints and
nvext/api_context) to also read the top-level cache_control field, convert it
into the same CacheBreakpoint entries (honoring override semantics vs per-block
cache_control), and ensure those entries are pushed into cache_breakpoints so
api_context preserves them; add a regression test (based on
test_top_level_cache_control_overrides_per_block) that constructs a top-level
cache_control-only request and asserts UnifiedRequest::try_from retains the
expected cache_breakpoints/api_context.

---

Nitpick comments:
In `@lib/bindings/python/rust/llm/entrypoint.rs`:
- Around line 61-81: The PyO3 constructor signature for fn new currently exposes
many positional args and removing router_enable_cache_control shifted later
parameters; to avoid accidental positional misalignment, update the
#[pyo3(signature = (...))] on new to make parameters keyword-only by inserting a
leading "*" (e.g. signature = (*, overlap_score_weight=1.0,
router_temperature=0.0, ...)), or if you prefer to preserve exact arg order add
a deprecated placeholder parameter named router_enable_cache_control into the
signature and function signature to keep compatibility; apply the change to the
new function declaration so callers must use keywords and positional binding
cannot shift parameters.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 27f4552a-91a7-4d19-bec6-479ee4611347

📥 Commits

Reviewing files that changed from the base of the PR and between 546d6a1 and 60e3dfc.

📒 Files selected for processing (19)
  • components/src/dynamo/common/configuration/groups/kv_router_args.py
  • components/src/dynamo/frontend/frontend_args.py
  • components/src/dynamo/sglang/request_handlers/handler_base.py
  • docs/backends/sglang/agents.md
  • docs/components/frontend/configuration.md
  • docs/components/frontend/nvext.md
  • docs/features/agentic_workloads.md
  • lib/bindings/python/rust/llm/entrypoint.rs
  • lib/bindings/python/src/dynamo/_core.pyi
  • lib/kv-router/src/scheduling/config.rs
  • lib/llm/src/kv_router.rs
  • lib/llm/src/kv_router/cache_control.rs
  • lib/llm/src/kv_router/push_router.rs
  • lib/llm/src/preprocessor.rs
  • lib/llm/src/protocols/anthropic/types.rs
  • lib/llm/src/protocols/common/preprocessor.rs
  • lib/llm/src/protocols/openai/chat_completions.rs
  • lib/llm/src/protocols/openai/nvext.rs
  • lib/llm/src/protocols/unified.rs
💤 Files with no reviewable changes (10)
  • components/src/dynamo/frontend/frontend_args.py
  • docs/components/frontend/configuration.md
  • lib/kv-router/src/scheduling/config.rs
  • lib/llm/src/protocols/common/preprocessor.rs
  • lib/llm/src/protocols/openai/chat_completions.rs
  • components/src/dynamo/common/configuration/groups/kv_router_args.py
  • lib/bindings/python/src/dynamo/_core.pyi
  • lib/llm/src/kv_router.rs
  • components/src/dynamo/sglang/request_handlers/handler_base.py
  • lib/llm/src/kv_router/cache_control.rs

@ishandhanani ishandhanani merged commit c09ac69 into main Apr 2, 2026
96 checks passed
@ishandhanani ishandhanani deleted the idhanani/dyn-unwire-cache-control-nvext branch April 2, 2026 00:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

backend::sglang Relates to the sglang backend documentation Improvements or additions to documentation frontend `python -m dynamo.frontend` and `dynamo-run in=http|text|grpc` refactor router Relates to routing, KV-aware routing, etc. size/XL

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants