You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: deploy/metrics/README.md
+110-2Lines changed: 110 additions & 2 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -36,7 +36,7 @@ As of Q2 2025, Dynamo HTTP Frontend metrics are exposed when you build container
36
36
37
37
### Available Metrics
38
38
39
-
#### Component Metrics
39
+
#### Backend Component Metrics
40
40
41
41
The core Dynamo backend system automatically exposes metrics with the `dynamo_component_*` prefix for all components that use the `DistributedRuntime` framework:
42
42
@@ -47,6 +47,19 @@ The core Dynamo backend system automatically exposes metrics with the `dynamo_co
47
47
-`dynamo_component_response_bytes_total`: Total bytes sent in responses (counter)
KV router statistics are automatically exposed by LLM workers and KV router components with the `dynamo_component_kvstats_*` prefix. These metrics provide insights into GPU memory usage and cache efficiency:
53
+
54
+
-`dynamo_component_kvstats_active_blocks`: Number of active KV cache blocks currently in use (gauge)
55
+
-`dynamo_component_kvstats_total_blocks`: Total number of KV cache blocks available (gauge)
56
+
-`dynamo_component_kvstats_gpu_cache_usage_percent`: GPU cache usage as a percentage (0.0-1.0) (gauge)
57
+
-`dynamo_component_kvstats_gpu_prefix_cache_hit_rate`: GPU prefix cache hit rate as a percentage (0.0-1.0) (gauge)
58
+
59
+
These metrics are published by:
60
+
-**LLM Workers**: vLLM and TRT-LLM backends publish these metrics through their respective publishers
61
+
-**KV Router**: The KV router component aggregates and exposes these metrics for load balancing decisions
62
+
50
63
#### Specialized Component Metrics
51
64
52
65
Some components expose additional metrics specific to their functionality:
@@ -57,14 +70,80 @@ Some components expose additional metrics specific to their functionality:
57
70
58
71
When using Dynamo HTTP Frontend (`--framework VLLM` or `--framework TRTLLM`), these metrics are automatically exposed with the `dynamo_frontend_*` prefix and include `model` labels containing the model name:
-`dynamo_frontend_requests_total`: Total LLM requests (counter)
66
80
-`dynamo_frontend_time_to_first_token_seconds`: Time to first token (histogram)
67
81
82
+
**Note**: The `dynamo_frontend_inflight_requests_total` metric tracks requests from HTTP handler start until the complete response is finished, while `dynamo_frontend_queued_requests_total` tracks requests from HTTP handler start until first token generation begins (including prefill time). HTTP queue time is a subset of inflight time.
83
+
84
+
#### Request Processing Flow
85
+
86
+
This section explains the distinction between two key metrics used to track request processing:
87
+
88
+
1.**Inflight**: Tracks requests from HTTP handler start until the complete response is finished
89
+
2.**HTTP Queue**: Tracks requests from HTTP handler start until first token generation begins (including prefill time)
-**Inflight**: Measures total request lifetime including processing time
142
+
-**HTTP Queue**: Measures queuing time before processing begins (including prefill time)
143
+
-**HTTP Queue ≤ Inflight** (HTTP queue is a subset of inflight time)
144
+
145
+
¹ **TODO**: Implement the "actual" HTTP queue metric that tracks from request start until first token generation begins, rather than the current implementation that tracks until first token is received by the frontend
146
+
68
147
### Required Files
69
148
70
149
The following configuration files should be present in this directory:
@@ -76,6 +155,35 @@ The following configuration files should be present in this directory:
76
155
-[grafana_dashboards/grafana-dcgm-metrics.json](./grafana_dashboards/grafana-dcgm-metrics.json): Contains Grafana dashboard configuration for DCGM GPU metrics
77
156
-[grafana_dashboards/grafana-llm-metrics.json](./grafana_dashboards/grafana-llm-metrics.json): This file, which is being phased out, contains the Grafana dashboard configuration for LLM-specific metrics. It requires an additional `metrics` component to operate concurrently. A new version is under development.
78
157
158
+
### Metric Name Constants
159
+
160
+
The [prometheus_names.rs](../../lib/runtime/src/metrics/prometheus_names.rs) module provides centralized Prometheus metric name constants and sanitization utilities for the Dynamo metrics system. This module ensures consistency across all components and prevents metric name duplication.
161
+
162
+
#### Key Features
163
+
164
+
-**Centralized Constants**: All Prometheus metric names are defined as constants to avoid duplication and typos
165
+
-**Automatic Sanitization**: Functions to sanitize metric and label names according to Prometheus naming rules
166
+
-**Component Organization**: Metric names are organized by component (frontend, work_handler, nats_client, etc.)
167
+
-**Validation Arrays**: Arrays of metric names for iteration and validation purposes
/// Get the request ID from a primary source, or lastly create a new one if not present
175
+
// TODO: Similar function exists in lib/llm/src/http/service/openai.rs but with different signature and more complex logic (distributed tracing, headers)
0 commit comments