Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
29 changes: 26 additions & 3 deletions content/en/about/faq/metrics-and-logs/metric-expiry.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,15 +4,38 @@ weight: 20
---

Short-lived metrics can hamper the performance of Prometheus, as they often are a large source of label cardinality. Cardinality is a measure of the number of unique values for a label. To manage the impact of your short-lived metrics on Prometheus, you must first identify the high cardinality metrics and labels. Prometheus provides cardinality information at its `/status` page. Additional information can be retrieved [via PromQL](https://www.robustperception.io/which-are-my-biggest-metrics).

There are several ways to reduce the cardinality of Istio metrics:

* On Istio 1.28.0 and above, add the
[`sidecar.istio.io/statsEvictionInterval`](/docs/reference/config/annotations/)
annotation to workload pods to expire metrics for inactive peers. This will
help prevent endless growth of the metric scrape responses from the Istio
proxy and the resulting in large `scrape_samples_scraped` and
`scrape_response_size_bytes` for job instances. This will not prevent
Prometheus TSDB index bloat and label churn because Prometheus must still
record all the unique values. But it will help with excessive scrape-time
memory use.
* Disable host header fallback.
The `destination_service` label is one potential source of high-cardinality.
The values for `destination_service` default to the host header if the Istio proxy is not able to determine the destination service from other request metadata.
If clients are using a variety of host headers, this could result in a large number of values for the `destination_service`.
In this case, follow the [metric customization](/docs/tasks/observability/metrics/customize-metrics/) guide to disable host header fallback mesh wide.
To disable host header fallback for a particular workload or namespace, you need to copy the stats `EnvoyFilter` configuration, update it to have host header fallback disabled, and apply it with a more specific selector.
[This issue](https://github.com/istio/istio/issues/25963#issuecomment-666037411) has more detail on how to achieve this.
* Drop unnecessary labels from collection. If the label with high cardinality is not needed, you can drop it from metric collection via [metric customization](/docs/tasks/observability/metrics/customize-metrics/) using `tags_to_remove`.
* Normalize label values, either through federation or classification.
If the information provided by the label is desired, you can use [Prometheus federation](/docs/ops/best-practices/observability/#using-prometheus-for-production-scale-monitoring) or [request classification](/docs/tasks/observability/metrics/classify-metrics/) to normalize the label.
* Disable unnecessary labels or entire metric series. If the label or metric
with high cardinality is not needed, you can drop it from metric generation
via
[metric customization](/docs/tasks/observability/metrics/customize-metrics/)
using a `Telemetry` resource's
[`metricsOverrides`](/docs/reference/config/telemetry/#MetricsOverrides).
See [Telemetry API](/docs/tasks/observability/telemetry/) for examples.
* Normalize label values through federation or classification.
If the information provided by the label is desired, you can use [Prometheus federation](/docs/ops/best-practices/observability/#using-prometheus-for-production-scale-monitoring), Istio workload k8s labels like [`service.istio.io/workload-name`](/docs/reference/config/labels/index.html), or [request classification](/docs/tasks/observability/metrics/classify-metrics/) to normalize the label.

It is not recommended to use Prometheus scrape-time label rewriting to reduce
cardinality by dropping unwanted labels. Prometheus does not perform
aggregation during label rewriting, so dropping labels may create conflicting
series where two or more series have the same labels but different values. Use
Istio's `Telemetry` configuration to suppress the unwanted dimension(s)
instead.
12 changes: 8 additions & 4 deletions content/en/about/faq/metrics-and-logs/telemetry-v1-vs-v2.md
Original file line number Diff line number Diff line change
Expand Up @@ -41,11 +41,15 @@ v2 which are listed below:
in Mixer-based telemetry. However, more buckets are available by default
in in-proxy telemetry for latency metrics at the lower latency levels.

* **No metric expiration for short-lived metrics**
* **No metric expiration for short-lived metrics by default**
Mixer-based telemetry supported metric expiration whereby metrics which were
not generated for a configurable amount of time were de-registered for
collection by Prometheus. This is useful in scenarios, such as one-off jobs, that generate short-lived metrics. De-registering
collection by Prometheus. This is useful in scenarios, such as one-off jobs,
that generate short-lived metrics. De-registering
the metrics prevents reporting of metrics which would no longer change in the
future, thereby reducing network traffic and storage in Prometheus.
This expiration mechanism is not available in in-proxy telemetry.
The workaround for this can be found [here](/about/faq/#metric-expiry).

The [`sidecar.istio.io/statsEvictionInterval` annotation](/docs/reference/config/annotations/)
provides equivalent functionality in newer Istio versions, but metric expiry
is not enabled by default. See also
[FAQ: metric expiry](/about/faq/#metric-expiry).
65 changes: 44 additions & 21 deletions content/en/docs/reference/config/metrics/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -45,65 +45,81 @@ For TCP traffic, Istio generates the following metrics:

* **Tcp Connections Closed** (`istio_tcp_connections_closed_total`): This is a `COUNTER` incremented for every closed connection.

The metrics Istio emits can be overridden with [the `Telemetry` resource's `metricsOverrides` field](/docs/reference/config/telemetry/#MetricsOverrides); see [Telemetry API](/docs/tasks/observability/telemetry/).

## Labels

* **Reporter**: This identifies the reporter of the request. It is set to `destination`
Labels are added to metrics to identify unique series or provide auxiliary
information.

The label name exposed in Prometheus scrapes and used when referring to the
label in configuration is shown in parentheses below.

* **Reporter** (`reporter`): This identifies the reporter of the request. It is set to `destination`
if report is from a server Istio proxy and `source` if report is from a client
Istio proxy or a gateway.

* **Source Workload**: This identifies the name of source workload which
* **Source Workload** (`source_workload`): This identifies the name of source workload which
controls the source, or `unknown` if the source information is missing.

* **Source Workload Namespace**: This identifies the namespace of the source
See also workload label
[`service.istio.io/workload-name`](/docs/reference/config/labels/index.html)
and proxy env-var `ISTIO_META_WORKLOAD_NAME`.

* **Source Workload Namespace** (`source_workload_namespace`): This identifies the namespace of the source
workload, or `unknown` if the source information is missing.

* **Source Principal**: This identifies the peer principal of the traffic source.
* **Source Principal** (`source_princpial`): This identifies the peer principal of the traffic source.
It is set when peer authentication is used.

* **Source App**: This identifies the source application based on `app` label
* **Source App** (`source_app`): This identifies the source application based on `app` label
of the source workload, or `unknown` if the source information is missing.

* **Source Version**: This identifies the version of the source workload, or
* **Source Version** (`source_version`): This identifies the version of the source workload, or
`unknown` if the source information is missing.

* **Destination Workload**: This identifies the name of destination workload,
* **Destination Workload** (`destination_workload`): This identifies the name of destination workload,
or `unknown` if the destination information is missing.

* **Destination Workload Namespace**: This identifies the namespace of the
See also workload label
[`service.istio.io/workload-name`](/docs/reference/config/labels/index.html)
and proxy env-var `ISTIO_META_WORKLOAD_NAME`.

* **Destination Workload Namespace** (`DESTINATION_WORKLOAD_NAMESPACE`): This identifies the namespace of the
destination workload, or `unknown` if the destination information is
missing.

* **Destination Principal**: This identifies the peer principal of the traffic destination.
* **Destination Principal** (`destination_principal`): This identifies the peer principal of the traffic destination.
It is set when peer authentication is used.

* **Destination App**: This identifies the destination application based on
* **Destination App** (`destination_app`): This identifies the destination application based on
`app` label of the destination workload, or `unknown` if the destination
information is missing.

* **Destination Version**: This identifies the version of the destination workload,
* **Destination Version** (`destination_version`): This identifies the version of the destination workload,
or `unknown` if the destination information is missing.

* **Destination Service**: This identifies destination service host responsible
* **Destination Service** (`destination_service`): This identifies destination service host responsible
for an incoming request. Ex: `details.default.svc.cluster.local`.

* **Destination Service Name**: This identifies the destination service name.
* **Destination Service Name** (`destination_service_name`): This identifies the destination service name.
Ex: `details`.

* **Destination Service Namespace**: This identifies the namespace of
* **Destination Service Namespace** (`destination_service_namespace`): This identifies the namespace of
destination service.

* **Request Protocol**: This identifies the protocol of the request. It is set
* **Request Protocol** (`request_protocol`): This identifies the protocol of the request. It is set
to request or connection protocol.

* **Response Code**: This identifies the response code of the request. This
* **Response Code** (`response_code`): This identifies the response code of the request. This
label is present only on HTTP metrics.

* **Connection Security Policy**: This identifies the service authentication policy of
* **Connection Security Policy** (`connection_security_policy`): This identifies the service authentication policy of
the request. It is set to `mutual_tls` when Istio is used to make communication
secure and report is from destination. It is set to `unknown` when report is from
source since security policy cannot be properly populated.

* **Response Flags**: Additional details about the response or connection from proxy.
* **Response Flags** (`response_flags`): Additional details about the response or connection from proxy.
In case of Envoy, see `%RESPONSE_FLAGS%` in [Envoy Access Log](https://www.envoyproxy.io/docs/envoy/latest/configuration/observability/access_log/usage#config-access-log-format-response-flags)
for more detail.

Expand All @@ -117,11 +133,18 @@ For TCP traffic, Istio generates the following metrics:
destination_canonical_revision
{{< /text >}}

* **Destination Cluster**: This identifies the cluster of the destination workload.
See also labels
[`service.istio.io/canonical-name`](/docs/reference/config/labels/#ServiceCanonicalName)
and
[`service.istio.io/canonical-revision`](/docs/reference/config/labels/#ServiceCanonicalRevision).

* **Destination Cluster** (`destination_cluster`): This identifies the cluster of the destination workload.
This is set by: `global.multiCluster.clusterName` at cluster install time.

* **Source Cluster**: This identifies the cluster of the source workload.
* **Source Cluster** (`source_cluster`): This identifies the cluster of the source workload.
This is set by: `global.multiCluster.clusterName` at cluster install time.

* **gRPC Response Status**: This identifies the response status of the gRPC. This
* **gRPC Response Status** (`grpc_response_status`): This identifies the response status of the gRPC. This
label is present only on gRPC metrics.

Metric dimensions can be suppressed with [the `Telemetry` resource's `metricsOverrides.tagOverride` field](/docs/reference/config/telemetry/#MetricsOverrides); see [Telemetry API](/docs/tasks/observability/telemetry/). Labels may also be added or modified using [metric classification](docs/tasks/observability/metrics/classify-metrics/) filters.