refactor(protocols): unwind cache_control from nvext#7790
Conversation
WalkthroughThis pull request removes the cache control and cache pinning feature from the system, including configuration fields, request handlers, client infrastructure, protocol definitions, and associated documentation across configuration, frontend, routing, and protocol layers. Changes
Estimated code review effort🎯 4 (Complex) | ⏱️ ~50 minutes 🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (1)
lib/bindings/python/rust/llm/entrypoint.rs (1)
61-81: Consider making the constructor keyword-only to prevent accidental positional argument misalignment.Removing
router_enable_cache_controlfrom this public PyO3 constructor shifts all subsequent positional parameters. While verification confirms there are currently no positional callers toKvRouterConfigin the codebase, removing a middle parameter from the signature is fragile API design. To protect against future positional calls binding arguments to unintended parameters, consider either preserving a deprecated placeholder in the original slot or marking parameters as keyword-only using*in the signature.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@lib/bindings/python/rust/llm/entrypoint.rs` around lines 61 - 81, The PyO3 constructor signature for fn new currently exposes many positional args and removing router_enable_cache_control shifted later parameters; to avoid accidental positional misalignment, update the #[pyo3(signature = (...))] on new to make parameters keyword-only by inserting a leading "*" (e.g. signature = (*, overlap_score_weight=1.0, router_temperature=0.0, ...)), or if you prefer to preserve exact arg order add a deprecated placeholder parameter named router_enable_cache_control into the signature and function signature to keep compatibility; apply the change to the new function declaration so callers must use keywords and positional binding cannot shift parameters.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@lib/llm/src/protocols/unified.rs`:
- Around line 80-84: extract_cache_breakpoints currently only scans system and
message blocks, so top-level Anthropic cache_control is dropped during
UnifiedRequest::try_from; update extract_cache_breakpoints (and/or the
UnifiedRequest::try_from path that builds cache_breakpoints and
nvext/api_context) to also read the top-level cache_control field, convert it
into the same CacheBreakpoint entries (honoring override semantics vs per-block
cache_control), and ensure those entries are pushed into cache_breakpoints so
api_context preserves them; add a regression test (based on
test_top_level_cache_control_overrides_per_block) that constructs a top-level
cache_control-only request and asserts UnifiedRequest::try_from retains the
expected cache_breakpoints/api_context.
---
Nitpick comments:
In `@lib/bindings/python/rust/llm/entrypoint.rs`:
- Around line 61-81: The PyO3 constructor signature for fn new currently exposes
many positional args and removing router_enable_cache_control shifted later
parameters; to avoid accidental positional misalignment, update the
#[pyo3(signature = (...))] on new to make parameters keyword-only by inserting a
leading "*" (e.g. signature = (*, overlap_score_weight=1.0,
router_temperature=0.0, ...)), or if you prefer to preserve exact arg order add
a deprecated placeholder parameter named router_enable_cache_control into the
signature and function signature to keep compatibility; apply the change to the
new function declaration so callers must use keywords and positional binding
cannot shift parameters.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 27f4552a-91a7-4d19-bec6-479ee4611347
📒 Files selected for processing (19)
components/src/dynamo/common/configuration/groups/kv_router_args.pycomponents/src/dynamo/frontend/frontend_args.pycomponents/src/dynamo/sglang/request_handlers/handler_base.pydocs/backends/sglang/agents.mddocs/components/frontend/configuration.mddocs/components/frontend/nvext.mddocs/features/agentic_workloads.mdlib/bindings/python/rust/llm/entrypoint.rslib/bindings/python/src/dynamo/_core.pyilib/kv-router/src/scheduling/config.rslib/llm/src/kv_router.rslib/llm/src/kv_router/cache_control.rslib/llm/src/kv_router/push_router.rslib/llm/src/preprocessor.rslib/llm/src/protocols/anthropic/types.rslib/llm/src/protocols/common/preprocessor.rslib/llm/src/protocols/openai/chat_completions.rslib/llm/src/protocols/openai/nvext.rslib/llm/src/protocols/unified.rs
💤 Files with no reviewable changes (10)
- components/src/dynamo/frontend/frontend_args.py
- docs/components/frontend/configuration.md
- lib/kv-router/src/scheduling/config.rs
- lib/llm/src/protocols/common/preprocessor.rs
- lib/llm/src/protocols/openai/chat_completions.rs
- components/src/dynamo/common/configuration/groups/kv_router_args.py
- lib/bindings/python/src/dynamo/_core.pyi
- lib/llm/src/kv_router.rs
- components/src/dynamo/sglang/request_handlers/handler_base.py
- lib/llm/src/kv_router/cache_control.rs
sgl-project/sglang#21884 - will approach this a different way moving forward
What changed
cache_controlfrom the OpenAInvextrequest schemacache_controlintonvextnvexteffective_cache_controltrait hooksWhy
This isolates the
cache_controltonvextunwiring from the larger sticky-session/session-controller work. Anthropic compatibility parsing is kept intact, but it no longer feeds the OpenAInvextpath.Impact
Requests can still deserialize Anthropic
cache_controlfields, but they no longer producenvext.cache_controlor router pinning TTLs through the preprocessor.Validation
cargo fmt --allcargo test -p dynamo-llm cache_control -- --nocaptureSummary by CodeRabbit
Release Notes
--enable-cache-controlconfiguration flag andDYN_ENABLE_CACHE_CONTROLenvironment variable.nvext.cache_controlAPI field from request payloads.