-
Notifications
You must be signed in to change notification settings - Fork 710
feat: add Prometheus metrics integration for KvStats #2704
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: add Prometheus metrics integration for KvStats #2704
Conversation
WalkthroughAdds Prometheus KvStats gauges and registration to WorkerMetricsPublisher, updates its API and publish flow, introduces metric name constants, wires Python binding to register metrics before endpoint creation, and adds an integration-test helper to build a DistributedRuntime from the current Tokio runtime. Changes
Sequence Diagram(s)sequenceDiagram
autonumber
actor Py as Python
participant RS as Rust WorkerMetricsPublisher
participant Reg as Prometheus Registry
participant EP as Endpoint
Py->>RS: register_prometheus_metrics(component)
RS->>Reg: Create KvStats gauges (active, total, gpu_usage, hit_rate)
Reg-->>RS: Gauges registered
Py->>RS: create_endpoint(component, metrics_labels)
RS->>EP: Initialize endpoint
EP-->>RS: Ready
note over RS: During inference
participant Model as Worker
Model->>RS: publish(ForwardPassMetrics{ kv_stats, ... })
RS->>Reg: Update KvStats gauges from kv_stats
RS-->>Model: Broadcast metrics
participant Scrape as Prometheus
Scrape->>Reg: /metrics scrape
Reg-->>Scrape: KvStats metrics exposed
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Possibly related PRs
Poem
Tip 🔌 Remote MCP (Model Context Protocol) integration is now available!Pro plan users can now connect to remote MCP servers from the Integrations page. Connect with popular remote MCPs such as Notion and Linear to add more context to your reviews and chats. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. CodeRabbit Commands (Invoked using PR/Issue comments)Type Other keywords and placeholders
Status, Documentation and Community
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
lib/bindings/python/rust/llm/kv.rs (1)
61-67: Update Python stub for the newmetrics_labelsparameterThe
.pyistub forcreate_endpointstill only defines a singlecomponentargument. Please update it to match the Rust binding’s signature and default:• File:
lib/bindings/python/src/dynamo/_core.pyi
• Replace the existing stub at line 441 with:- def create_endpoint(self, component: Component) -> None: ... + def create_endpoint( + self, + component: Component, + metrics_labels: Optional[List[Tuple[str, str]]] = None, + ) -> None: ...• Be sure to import the necessary types at the top of the file:
from typing import Optional, List, TupleNo call sites were found still passing a legacy
dp_ranktocreate_endpoint, so no downstream changes are required there.
🧹 Nitpick comments (8)
lib/llm/src/common.rs (1)
19-32: Handy test DRT helper; consider graceful shutdown in testsThe helper is useful and scoped behind cfg flags. To avoid leaked tasks/resources across integration tests, consider returning a small guard or documenting calling
drt.shutdown()at the end of tests. Alternatively, provide a companion helper that shuts the DRT down.If you want, I can add a simple Drop guard type that calls
shutdown()automatically.lib/bindings/python/rust/llm/kv.rs (1)
71-86: Minor: avoid allocating when labels are emptyThe normalization to
Option<Vec<(&str, &str)>>is fine. You can shave a tiny alloc by early-returningNonewhenmetrics_labels.as_deref().map(|v| v.is_empty()).unwrap_or(true)is true, instead of building an empty Vec and then turning it intoNone.lib/runtime/src/metrics/prometheus_names.rs (1)
145-168: Metric naming: consider “_ratio” suffix for 0..1 valuesBoth “gpu_cache_usage_percent” and “gpu_prefix_cache_hit_rate” are documented as 0.0–1.0 fractions, not 0–100 percents. Prometheus best practice typically uses “_ratio”. If you want to align now (before users depend on names), consider renaming to
gpu_cache_usage_ratioandgpu_prefix_cache_hit_ratio(and update call sites/tests).If you keep “percent”, update the docs to “0–100” or “fraction 0.0–1.0” consistently across code and dashboards.
lib/llm/src/kv_router/publisher.rs (5)
486-491: Doc mismatch: it’s an RwLock, not a MutexThe comment says “wrapped in Mutex,” but the field uses
RwLock. Please fix the comment to avoid confusion.- // Prometheus metrics for KvStats (wrapped in Mutex for thread-safe access) + // Prometheus metrics for KvStats (guarded by RwLock for thread-safe access)
493-500: Optional: remove Option<> around gauges to simplify hot-path updatesYou always register all four gauges together; keeping each as
Option<Gauge>adds branches inupdate_from_kvstats. Consider making them plainprometheus::Gaugeand only keep the outerOption<Arc<_>>for “registered or not”.
501-536: Gauge registration: good; consider HELP text unitsHELP strings are clear. If you keep the 0.0–1.0 range, consider saying “fraction (0.0–1.0)” to avoid “percent” confusion, or switch names to
_ratioas noted in prometheus_names.rs.
539-552: Clamp fractional gauges (defensive)If upstream ever exceeds expected bounds, Prometheus will happily ingest it. Clamping
gpu_cache_usage_percandgpu_prefix_cache_hit_rateto [0.0, 1.0] keeps data sane.- if let Some(gauge) = &self.gpu_cache_usage_gauge { - gauge.set(kv_stats.gpu_cache_usage_perc as f64); - } + if let Some(gauge) = &self.gpu_cache_usage_gauge { + let v = kv_stats.gpu_cache_usage_perc.clamp(0.0, 1.0) as f64; + gauge.set(v); + } @@ - if let Some(gauge) = &self.gpu_prefix_cache_hit_rate_gauge { - gauge.set(kv_stats.gpu_prefix_cache_hit_rate as f64); - } + if let Some(gauge) = &self.gpu_prefix_cache_hit_rate_gauge { + let v = kv_stats.gpu_prefix_cache_hit_rate.clamp(0.0, 1.0) as f64; + gauge.set(v); + }
1195-1290: Integration test: clean up DRT and drop unnecessary ArcGreat test. Two tweaks:
- No need to put
drtin anArchere.- Consider shutting the DRT down at the end to avoid background tasks outliving the test.
- let drt = std::sync::Arc::new(create_test_drt_async().await); + let drt = create_test_drt_async().await; @@ - println!( + println!( "✅ KvStatsPrometheusGauges constructor and publish() work correctly with real Component" ); + // Graceful shutdown + drt.shutdown();If
DistributedRuntime::shutdown()is not available here, I can wire a helper intotest_utils.
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
💡 Knowledge Base configuration:
- MCP integration is disabled by default for public repositories
- Jira integration is disabled by default for public repositories
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (4)
lib/bindings/python/rust/llm/kv.rs(1 hunks)lib/llm/src/common.rs(1 hunks)lib/llm/src/kv_router/publisher.rs(3 hunks)lib/runtime/src/metrics/prometheus_names.rs(1 hunks)
🧰 Additional context used
🧠 Learnings (2)
📚 Learning: 2025-06-05T01:04:24.775Z
Learnt from: PeaBrane
PR: ai-dynamo/dynamo#1392
File: launch/dynamo-run/src/subprocess/vllm_v1_inc.py:71-71
Timestamp: 2025-06-05T01:04:24.775Z
Learning: The `create_endpoint` method in `WorkerMetricsPublisher` has backward compatibility maintained through pyo3 signature annotation `#[pyo3(signature = (component, dp_rank = None))]`, making the `dp_rank` parameter optional with a default value of `None`.
Applied to files:
lib/bindings/python/rust/llm/kv.rslib/llm/src/kv_router/publisher.rs
📚 Learning: 2025-08-25T23:24:42.050Z
Learnt from: tzulingk
PR: ai-dynamo/dynamo#2666
File: components/backends/trtllm/src/dynamo/trtllm/publisher.py:0-0
Timestamp: 2025-08-25T23:24:42.050Z
Learning: WorkerMetricsPublisher.create_endpoint method signature has been updated in _core.pyi to include metrics_labels parameter: `def create_endpoint(self, component: str, metrics_labels: Optional[List[Tuple[str, str]]] = None)`, making the metrics_labels parameter optional with default value of None.
Applied to files:
lib/bindings/python/rust/llm/kv.rs
🧬 Code graph analysis (3)
lib/bindings/python/rust/llm/kv.rs (2)
lib/bindings/python/rust/llm/entrypoint.rs (1)
to_pyerr(277-282)lib/bindings/python/rust/lib.rs (1)
to_pyerr(126-131)
lib/llm/src/common.rs (2)
lib/bindings/python/src/dynamo/_core.pyi (1)
DistributedRuntime(30-53)lib/runtime/src/distributed.rs (1)
from_settings_without_discovery(178-181)
lib/llm/src/kv_router/publisher.rs (2)
lib/bindings/python/rust/llm/kv.rs (16)
new(53-59)new(132-144)new(155-167)new(185-208)new(253-271)new(346-350)new(408-443)new(496-504)new(584-596)new(643-703)new(773-783)new(790-802)new(809-821)new(828-842)new(853-896)publish(102-107)lib/llm/src/common.rs (1)
create_test_drt_async(26-31)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
- GitHub Check: Build and Test - dynamo
- GitHub Check: pre-merge-rust (lib/bindings/python)
- GitHub Check: pre-merge-rust (.)
🔇 Additional comments (3)
lib/runtime/src/metrics/prometheus_names.rs (1)
170-176: LGTM: central enumeration for KvStats metricsThe
KVSTATS_METRICSarray is useful for registry validation and iteration. No issues.lib/llm/src/kv_router/publisher.rs (2)
571-579: Hot-path read is cheap and correctUsing a read lock to grab the Arc and update gauges is the right trade-off here. Nice.
599-637: API extension looks good; confirm all Rust call sites updatedThe extra
metrics_labels: Option<&[(&str, &str)]>parameter is propagated, and labels are cloned into owned Strings before the await point. Please confirm all internal callers (if any) passNoneor labels, otherwise this is a breaking change.You can grep for
create_endpoint(in Rust to validate call sites compile against the new signature.
16f76f6 to
edda095
Compare
nnshah1
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
Signed-off-by: Keiven Chang <[email protected]>
|
@kthui @richardhuo-nv for some eyes on rust review @PeaBrane on kv metrics specific review + rust |
RwLock is not necessary since gauges are initialized once and then only read. OnceLock simplifies the code and improves performance on the hot path. Signed-off-by: Keiven Chang <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I understand correctly, this feature adds the KvStatsPrometheusGauges struct that tracks 4 metrics supplied by the KvStats that is already a part of the ForwardPassMetrics.
One question: I wonder if the Arc around the KvStatsPrometheusGauges is needed, unless it lives longer than the WorkerMetricsPublisher?
…theus-format-metrics Signed-off-by: Keiven C <[email protected]>
keivenchang
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @kthui I looked at this more closely and yes, it's only accessed here so it does not need an Arc. I was confused because the Sender and Receiver were using the Arc.
In addition, I made KvStatsPrometheusGauges not to use Option (made it mandatory). That also simplified the code quite a bit.
Removed unnecessary Arc and Option wrappers since gauges are never shared independently and are always initialized. Signed-off-by: Keiven Chang <[email protected]>
Signed-off-by: Keiven C <[email protected]> Signed-off-by: Jason Zhou <[email protected]>
Signed-off-by: Keiven C <[email protected]> Signed-off-by: Michael Shin <[email protected]>
Signed-off-by: Keiven C <[email protected]> Signed-off-by: Krishnan Prashanth <[email protected]>
Signed-off-by: Keiven C <[email protected]> Signed-off-by: nnshah1 <[email protected]>
Overview:
Expose KvStats metrics to Prometheus format for monitoring KV cache utilization and performance.
Details:
KvStatsPrometheusGaugesstruct to manage four key metrics: active blocks, total blocks, GPU cache usage percentage, and prefix cache hit rateWorkerMetricsPublisher::publish()using RwLock for optimal read performance in the hot pathWhere should the reviewer start?
lib/llm/src/kv_router/publisher.rs- Core implementation of KvStatsPrometheusGauges and integration with WorkerMetricsPublisherlib/runtime/src/metrics/prometheus_names.rs- Standardized metric name definitionsRelated Issues: (use one of the action keywords Closes / Fixes / Resolves / Relates to)
DIS-506
Summary by CodeRabbit
New Features
Tests
Chores