Skip to content

Conversation

@keivenchang
Copy link
Contributor

@keivenchang keivenchang commented Aug 26, 2025

Overview:

Expose KvStats metrics to Prometheus format for monitoring KV cache utilization and performance.

Details:

  • Added KvStatsPrometheusGauges struct to manage four key metrics: active blocks, total blocks, GPU cache usage percentage, and prefix cache hit rate
  • Integrated Prometheus gauge updates into WorkerMetricsPublisher::publish() using RwLock for optimal read performance in the hot path
  • Registered metrics with Component's MetricsRegistry with standardized "kvstats_" prefix naming convention
  • Added comprehensive integration test to verify metrics registration and value updates
  • Moved test utilities to common.rs for better code organization
  • Example of the new Prometheus metrics:
ubuntu@keivenc-linux:~/dynamo$ curl -s localhost:8081/metrics | grep -v '^#' | grep -v bucket | sort | grep kvstats
dynamo_component_kvstats_active_blocks{dynamo_component="backend",dynamo_namespace="dynamo"} 0
dynamo_component_kvstats_gpu_cache_usage_percent{dynamo_component="backend",dynamo_namespace="dynamo"} 0.00007755545084364712
dynamo_component_kvstats_gpu_prefix_cache_hit_rate{dynamo_component="backend",dynamo_namespace="dynamo"} 0
dynamo_component_kvstats_total_blocks{dynamo_component="backend",dynamo_namespace="dynamo"} 12894

Where should the reviewer start?

  • lib/llm/src/kv_router/publisher.rs - Core implementation of KvStatsPrometheusGauges and integration with WorkerMetricsPublisher
  • lib/runtime/src/metrics/prometheus_names.rs - Standardized metric name definitions

Related Issues: (use one of the action keywords Closes / Fixes / Resolves / Relates to)

DIS-506

Summary by CodeRabbit

  • New Features

    • Exposes Prometheus metrics for KV stats, including active blocks, total blocks, GPU cache usage %, and GPU prefix cache hit rate.
    • Endpoint creation now supports optional metric labels for improved observability.
    • Metrics are registered before endpoint startup to ensure immediate availability.
  • Tests

    • Added integration tests validating KV stats gauge updates and Prometheus output formatting.
  • Chores

    • Introduced standardized metric names and grouping for KV stats to streamline monitoring setup.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Aug 26, 2025

Walkthrough

Adds Prometheus KvStats gauges and registration to WorkerMetricsPublisher, updates its API and publish flow, introduces metric name constants, wires Python binding to register metrics before endpoint creation, and adds an integration-test helper to build a DistributedRuntime from the current Tokio runtime.

Changes

Cohort / File(s) Summary
Prometheus KV Stats integration
lib/llm/src/kv_router/publisher.rs, lib/runtime/src/metrics/prometheus_names.rs
Adds KvStats Prometheus gauges, gauge container, and registration on WorkerMetricsPublisher; extends publish() to update gauges; updates create_endpoint signature to accept metric labels; introduces kvstats metric name constants and KVSTATS_METRICS list.
Python binding registration
lib/bindings/python/rust/llm/kv.rs
Calls register_prometheus_metrics(&component) before create_endpoint, propagating errors to Python.
Test utilities for integration
lib/llm/src/common.rs
Adds #[cfg(all(test, feature = "integration"))] pub mod test_utils with async create_test_drt_async() helper using current Tokio runtime.

Sequence Diagram(s)

sequenceDiagram
  autonumber
  actor Py as Python
  participant RS as Rust WorkerMetricsPublisher
  participant Reg as Prometheus Registry
  participant EP as Endpoint

  Py->>RS: register_prometheus_metrics(component)
  RS->>Reg: Create KvStats gauges (active, total, gpu_usage, hit_rate)
  Reg-->>RS: Gauges registered

  Py->>RS: create_endpoint(component, metrics_labels)
  RS->>EP: Initialize endpoint
  EP-->>RS: Ready

  note over RS: During inference
  participant Model as Worker
  Model->>RS: publish(ForwardPassMetrics{ kv_stats, ... })
  RS->>Reg: Update KvStats gauges from kv_stats
  RS-->>Model: Broadcast metrics

  participant Scrape as Prometheus
  Scrape->>Reg: /metrics scrape
  Reg-->>Scrape: KvStats metrics exposed
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

Poem

I hop through stats with whiskers keen,
Four little gauges, tidy and clean—
Blocks that bustle, caches that gleam,
Hit rates sparkle in a metric stream.
Prometheus hums; endpoints awake—
Another carrot-crisp release to bake! 🥕✨

Tip

🔌 Remote MCP (Model Context Protocol) integration is now available!

Pro plan users can now connect to remote MCP servers from the Integrations page. Connect with popular remote MCPs such as Notion and Linear to add more context to your reviews and chats.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

CodeRabbit Commands (Invoked using PR/Issue comments)

Type @coderabbitai help to get the list of available commands.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

Status, Documentation and Community

  • Visit our Status Page to check the current availability of CodeRabbit.
  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
lib/bindings/python/rust/llm/kv.rs (1)

61-67: Update Python stub for the new metrics_labels parameter

The .pyi stub for create_endpoint still only defines a single component argument. Please update it to match the Rust binding’s signature and default:

• File: lib/bindings/python/src/dynamo/_core.pyi
• Replace the existing stub at line 441 with:

-    def create_endpoint(self, component: Component) -> None: ...
+    def create_endpoint(
+        self,
+        component: Component,
+        metrics_labels: Optional[List[Tuple[str, str]]] = None,
+    ) -> None: ...

• Be sure to import the necessary types at the top of the file:

from typing import Optional, List, Tuple

No call sites were found still passing a legacy dp_rank to create_endpoint, so no downstream changes are required there.

🧹 Nitpick comments (8)
lib/llm/src/common.rs (1)

19-32: Handy test DRT helper; consider graceful shutdown in tests

The helper is useful and scoped behind cfg flags. To avoid leaked tasks/resources across integration tests, consider returning a small guard or documenting calling drt.shutdown() at the end of tests. Alternatively, provide a companion helper that shuts the DRT down.

If you want, I can add a simple Drop guard type that calls shutdown() automatically.

lib/bindings/python/rust/llm/kv.rs (1)

71-86: Minor: avoid allocating when labels are empty

The normalization to Option<Vec<(&str, &str)>> is fine. You can shave a tiny alloc by early-returning None when metrics_labels.as_deref().map(|v| v.is_empty()).unwrap_or(true) is true, instead of building an empty Vec and then turning it into None.

lib/runtime/src/metrics/prometheus_names.rs (1)

145-168: Metric naming: consider “_ratio” suffix for 0..1 values

Both “gpu_cache_usage_percent” and “gpu_prefix_cache_hit_rate” are documented as 0.0–1.0 fractions, not 0–100 percents. Prometheus best practice typically uses “_ratio”. If you want to align now (before users depend on names), consider renaming to gpu_cache_usage_ratio and gpu_prefix_cache_hit_ratio (and update call sites/tests).

If you keep “percent”, update the docs to “0–100” or “fraction 0.0–1.0” consistently across code and dashboards.

lib/llm/src/kv_router/publisher.rs (5)

486-491: Doc mismatch: it’s an RwLock, not a Mutex

The comment says “wrapped in Mutex,” but the field uses RwLock. Please fix the comment to avoid confusion.

-    // Prometheus metrics for KvStats (wrapped in Mutex for thread-safe access)
+    // Prometheus metrics for KvStats (guarded by RwLock for thread-safe access)

493-500: Optional: remove Option<> around gauges to simplify hot-path updates

You always register all four gauges together; keeping each as Option<Gauge> adds branches in update_from_kvstats. Consider making them plain prometheus::Gauge and only keep the outer Option<Arc<_>> for “registered or not”.


501-536: Gauge registration: good; consider HELP text units

HELP strings are clear. If you keep the 0.0–1.0 range, consider saying “fraction (0.0–1.0)” to avoid “percent” confusion, or switch names to _ratio as noted in prometheus_names.rs.


539-552: Clamp fractional gauges (defensive)

If upstream ever exceeds expected bounds, Prometheus will happily ingest it. Clamping gpu_cache_usage_perc and gpu_prefix_cache_hit_rate to [0.0, 1.0] keeps data sane.

-        if let Some(gauge) = &self.gpu_cache_usage_gauge {
-            gauge.set(kv_stats.gpu_cache_usage_perc as f64);
-        }
+        if let Some(gauge) = &self.gpu_cache_usage_gauge {
+            let v = kv_stats.gpu_cache_usage_perc.clamp(0.0, 1.0) as f64;
+            gauge.set(v);
+        }
@@
-        if let Some(gauge) = &self.gpu_prefix_cache_hit_rate_gauge {
-            gauge.set(kv_stats.gpu_prefix_cache_hit_rate as f64);
-        }
+        if let Some(gauge) = &self.gpu_prefix_cache_hit_rate_gauge {
+            let v = kv_stats.gpu_prefix_cache_hit_rate.clamp(0.0, 1.0) as f64;
+            gauge.set(v);
+        }

1195-1290: Integration test: clean up DRT and drop unnecessary Arc

Great test. Two tweaks:

  • No need to put drt in an Arc here.
  • Consider shutting the DRT down at the end to avoid background tasks outliving the test.
-        let drt = std::sync::Arc::new(create_test_drt_async().await);
+        let drt = create_test_drt_async().await;
@@
-        println!(
+        println!(
             "✅ KvStatsPrometheusGauges constructor and publish() work correctly with real Component"
         );
+        // Graceful shutdown
+        drt.shutdown();

If DistributedRuntime::shutdown() is not available here, I can wire a helper into test_utils.

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

💡 Knowledge Base configuration:

  • MCP integration is disabled by default for public repositories
  • Jira integration is disabled by default for public repositories
  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 766d3f2 and 16f76f6.

📒 Files selected for processing (4)
  • lib/bindings/python/rust/llm/kv.rs (1 hunks)
  • lib/llm/src/common.rs (1 hunks)
  • lib/llm/src/kv_router/publisher.rs (3 hunks)
  • lib/runtime/src/metrics/prometheus_names.rs (1 hunks)
🧰 Additional context used
🧠 Learnings (2)
📚 Learning: 2025-06-05T01:04:24.775Z
Learnt from: PeaBrane
PR: ai-dynamo/dynamo#1392
File: launch/dynamo-run/src/subprocess/vllm_v1_inc.py:71-71
Timestamp: 2025-06-05T01:04:24.775Z
Learning: The `create_endpoint` method in `WorkerMetricsPublisher` has backward compatibility maintained through pyo3 signature annotation `#[pyo3(signature = (component, dp_rank = None))]`, making the `dp_rank` parameter optional with a default value of `None`.

Applied to files:

  • lib/bindings/python/rust/llm/kv.rs
  • lib/llm/src/kv_router/publisher.rs
📚 Learning: 2025-08-25T23:24:42.050Z
Learnt from: tzulingk
PR: ai-dynamo/dynamo#2666
File: components/backends/trtllm/src/dynamo/trtllm/publisher.py:0-0
Timestamp: 2025-08-25T23:24:42.050Z
Learning: WorkerMetricsPublisher.create_endpoint method signature has been updated in _core.pyi to include metrics_labels parameter: `def create_endpoint(self, component: str, metrics_labels: Optional[List[Tuple[str, str]]] = None)`, making the metrics_labels parameter optional with default value of None.

Applied to files:

  • lib/bindings/python/rust/llm/kv.rs
🧬 Code graph analysis (3)
lib/bindings/python/rust/llm/kv.rs (2)
lib/bindings/python/rust/llm/entrypoint.rs (1)
  • to_pyerr (277-282)
lib/bindings/python/rust/lib.rs (1)
  • to_pyerr (126-131)
lib/llm/src/common.rs (2)
lib/bindings/python/src/dynamo/_core.pyi (1)
  • DistributedRuntime (30-53)
lib/runtime/src/distributed.rs (1)
  • from_settings_without_discovery (178-181)
lib/llm/src/kv_router/publisher.rs (2)
lib/bindings/python/rust/llm/kv.rs (16)
  • new (53-59)
  • new (132-144)
  • new (155-167)
  • new (185-208)
  • new (253-271)
  • new (346-350)
  • new (408-443)
  • new (496-504)
  • new (584-596)
  • new (643-703)
  • new (773-783)
  • new (790-802)
  • new (809-821)
  • new (828-842)
  • new (853-896)
  • publish (102-107)
lib/llm/src/common.rs (1)
  • create_test_drt_async (26-31)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
  • GitHub Check: Build and Test - dynamo
  • GitHub Check: pre-merge-rust (lib/bindings/python)
  • GitHub Check: pre-merge-rust (.)
🔇 Additional comments (3)
lib/runtime/src/metrics/prometheus_names.rs (1)

170-176: LGTM: central enumeration for KvStats metrics

The KVSTATS_METRICS array is useful for registry validation and iteration. No issues.

lib/llm/src/kv_router/publisher.rs (2)

571-579: Hot-path read is cheap and correct

Using a read lock to grab the Arc and update gauges is the right trade-off here. Nice.


599-637: API extension looks good; confirm all Rust call sites updated

The extra metrics_labels: Option<&[(&str, &str)]> parameter is propagated, and labels are cloned into owned Strings before the await point. Please confirm all internal callers (if any) pass None or labels, otherwise this is a breaking change.

You can grep for create_endpoint( in Rust to validate call sites compile against the new signature.

@keivenchang keivenchang force-pushed the keivenchang/DIS-506__expose-kvstats-to-prometheus-format-metrics branch from 16f76f6 to edda095 Compare August 26, 2025 19:28
@keivenchang keivenchang self-assigned this Aug 26, 2025
Copy link
Contributor

@nnshah1 nnshah1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@rmccorm4
Copy link
Contributor

rmccorm4 commented Aug 28, 2025

@kthui @richardhuo-nv for some eyes on rust review

@PeaBrane on kv metrics specific review + rust

@rmccorm4 rmccorm4 requested a review from PeaBrane August 28, 2025 20:54
RwLock is not necessary since gauges are initialized once and then only read.
OnceLock simplifies the code and improves performance on the hot path.

Signed-off-by: Keiven Chang <[email protected]>
Copy link
Contributor

@kthui kthui left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I understand correctly, this feature adds the KvStatsPrometheusGauges struct that tracks 4 metrics supplied by the KvStats that is already a part of the ForwardPassMetrics.

One question: I wonder if the Arc around the KvStatsPrometheusGauges is needed, unless it lives longer than the WorkerMetricsPublisher?

Copy link
Contributor Author

@keivenchang keivenchang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @kthui I looked at this more closely and yes, it's only accessed here so it does not need an Arc. I was confused because the Sender and Receiver were using the Arc.

In addition, I made KvStatsPrometheusGauges not to use Option (made it mandatory). That also simplified the code quite a bit.

Removed unnecessary Arc and Option wrappers since gauges are never shared
independently and are always initialized.

Signed-off-by: Keiven Chang <[email protected]>
@keivenchang keivenchang merged commit 15539fd into main Aug 29, 2025
15 checks passed
@keivenchang keivenchang deleted the keivenchang/DIS-506__expose-kvstats-to-prometheus-format-metrics branch August 29, 2025 05:12
jasonqinzhou pushed a commit that referenced this pull request Aug 30, 2025
michaelshin pushed a commit that referenced this pull request Sep 2, 2025
KrishnanPrash pushed a commit that referenced this pull request Sep 2, 2025
nnshah1 pushed a commit that referenced this pull request Sep 8, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants