Skip to content

Latest commit

 

History

History
354 lines (309 loc) · 14.8 KB

File metadata and controls

354 lines (309 loc) · 14.8 KB

Metrics Transform Processor

Status
Stability beta: metrics
Distributions contrib, k8s
Issues Open issues Closed issues
Code Owners @dmitryax

Description

The metrics transform processor can be used to rename metrics, and add, rename or delete label keys and values. It can also be used to perform scaling and aggregations on metrics across labels or label values. The complete list of supported operations that can be applied to one or more metrics is provided in the below table.

ℹ️ This processor only supports renames/aggregations within a batch of metrics. It does not do any aggregation across batches, so it is not suitable for aggregating metrics from multiple sources (e.g. multiple nodes or clients).

Operation Example (based on metric system.cpu.usage)
Rename metrics Rename to system.cpu.usage_time
Add labels Add new label identifier with value 1 to all points
Rename label keys Rename label state to cpu_state
Rename label values For label state, rename value idle to -
Delete data points Delete all points where label state has value idle
Toggle data type Change from int data points to double data points
Scale value Multiply values by 1000 to convert from seconds to milliseconds
Aggregate across label sets Retain only the label state, average all points with the same value for this label
Aggregate across label values For label state, sum points where the value is user or system into used = user + system

In addition to the above:

  • Operations can be applied to one or more metrics using a strict or regexp filter
  • The action property allows metrics to be:
    • Updated in-place (update)
    • Copied and updates applied to the copy (insert)
    • Combined into a newly inserted metric that is generated by combining all data points from the set of matching metrics into a single metric (combine); the original matching metrics are also removed
  • When renaming metrics, capturing groups from the regexp filter will be expanded
  • When adding or updating a label value, {{version}} will be replaced with this collector's version number

Configuration

Configuration is specified through a list of transformations and operations. Transformations and operations will be applied to all metrics in order so that later transformations or operations may reference the result of previous transformations or operations.

processors:
  metricstransform:
  # transforms is a list of transformations with each element transforming a metric selected by metric name
    transforms:
    
        # SPECIFY WHICH METRIC(S) TO MATCH
        
        # include specifies the metric name used to determine which metric(s) to operate on
      - include: <metric_name>
        # match_type specifies whether the include name should be used as a strict match or regexp match, default = strict
        match_type: {strict, regexp}
    
        # experimental_match_labels specifies the label set against which the metric filter will work. If experimental_match_labels is specified, transforms will only be applied to those metrics which 
        # have the provided metric label values. This works for both strict and regexp match_type. This is an experimental feature.
        experimental_match_labels: {<label1>: <label_value1>, <label2>: <label_value2>}
        
        # SPECIFY THE ACTION TO TAKE ON THE MATCHED METRIC(S)
        
        # action specifies if the operations (specified below) are performed on metrics in place (update), on an inserted clone (insert), or on a new combined metric (combine)
        action: {update, insert, combine}
        
        # SPECIFY HOW TO TRANSFORM THE METRIC GENERATED AS A RESULT OF APPLYING THE ABOVE ACTION
        
        # new_name specifies the updated name of the metric; if action is insert or combine, new_name is required
        new_name: <new_metric_name_inserted>
        # aggregation_type defines how combined data points will be aggregated; if action is combine, aggregation_type is required
        aggregation_type: {sum, mean, min, max, count, median}
        # submatch_case specifies the case that should be used when adding label values based on regexp submatches when performing a combine action; leave blank to use the submatch value as is
        submatch_case: {lower, upper}
        # operations contain a list of operations that will be performed on the resulting metric(s)
        operations:
            # action defines the type of operation that will be performed, see examples below for more details
          - action: {add_label, update_label, delete_label_value, toggle_scalar_data_type, experimental_scale_value, aggregate_labels, aggregate_label_values}
            # label specifies the label to operate on
            label: <label>
            # new_label specifies the updated name of the label; if action is add_label, new_label is required
            new_label: <new_label>
            # aggregated_values contains a list of label values that will be aggregated; if action is aggregate_label_values, aggregated_values is required
            aggregated_values: [values...]
            # new_value specifies the updated name of the label value; if action is add_label or aggregate_label_values, new_value is required
            new_value: <new_value>
            # label_value specifies the label value for which points should be deleted; if action is delete_label_value, label_value is required
            label_value: <label_value>
            # label_set contains a list of labels that will remain after aggregation; if action is aggregate_labels, label_set is required
            label_set: [labels...]
            # aggregation_type defines how data points will be aggregated; if action is aggregate_labels or aggregate_label_values, aggregation_type is required
            aggregation_type: {sum, mean, min, max, count, median}
            # experimental_scale specifies the scalar to apply to values. Scaling exponential histograms inherently involves some loss of accuracy. 
            experimental_scale: <scalar>
            # value_actions contain a list of operations that will be performed on the selected label
            value_actions:
                # value specifies the value to operate on
              - value: <current_label_value>
                # new_value specifies the updated value
                new_value: <new_label_value>

Examples

Create a new metric from an existing metric

# create host.cpu.utilization from host.cpu.usage
include: host.cpu.usage
action: insert
new_name: host.cpu.utilization
operations:
  ...

Create a new metric from an existing metric with matching label values

# create host.cpu.utilization from host.cpu.usage where we have metric label "container=my_container"
include: host.cpu.usage
action: insert
new_name: host.cpu.utilization
match_type: strict
experimental_match_labels: {"container": "my_container"}
operations:
  ...

Create a new metric from an existing metric with matching label values with regexp

# create host.cpu.utilization from host.cpu.usage where we have metric label pod with non-empty values
include: host.cpu.usage
action: insert
new_name: host.cpu.utilization
match_type: regexp
experimental_match_labels: {"pod": "(.|\\s)*\\S(.|\\s)*"}
operations:
  ...

Rename metric

# rename system.cpu.usage to system.cpu.usage_time
include: system.cpu.usage
action: update
new_name: system.cpu.usage_time

Rename multiple metrics using Substitution

# rename all system.cpu metrics to system.processor.*.stat
# instead of regular $ use double dollar $$. Because $ is treated as a special character.
# wrap the group name/number with braces
include: ^system\.cpu\.(.*)$$
match_type: regexp
action: update
new_name: system.processor.$${1}.stat

Add a label

# for system.cpu.usage_time, add label `version` with value `opentelemetry collector vX.Y.Z` to all points
include: system.cpu.usage
action: update
operations:
  - action: add_label
    new_label: version
    new_value: opentelemetry collector {{version}}

Add a label to multiple metrics

# for all system metrics, add label `version` with value `opentelemetry collector vX.Y.Z` to all points
include: ^system\.
match_type: regexp
action: update
operations:
  - action: add_label
    new_label: version
    new_value: opentelemetry collector {{version}}

Rename labels

# for system.cpu.usage_time, rename the label state to cpu_state
include: system.cpu.usage
action: update
operations:
  - action: update_label
    label: state
    new_label: cpu_state

Rename labels for multiple metrics

# for all system.cpu metrics, rename the label state to cpu_state
include: ^system\.cpu\.
action: update
operations:
  - action: update_label
    label: state
    new_label: cpu_state

Rename label values

# rename the label value slab_reclaimable to sreclaimable, slab_unreclaimable to sunreclaimable
include: system.memory.usage
action: update
operations:
  - action: update_label
    label: state
    value_actions:
      - value: slab_reclaimable
        new_value: sreclaimable
      - value: slab_unreclaimable
        new_value: sunreclaimable

Delete by label value

# deletes all data points with the label value 'idle' of the label 'state'
include: system.cpu.usage
action: update
operations:
  - action: delete_label_value
    label: state
    label_value: idle

Toggle datatype

# toggle the datatype of cpu usage from int (the default) to double
include: system.cpu.usage
action: update
operations:
  - action: toggle_scalar_data_type

Scale value

# experimental_scale CPU usage from seconds to milliseconds
include: system.cpu.usage
action: update
operations:
  - action: experimental_scale_value
    experimental_scale: 1000

Aggregate labels

# aggregate away all labels except `state` using summation
include: system.cpu.usage
action: update
operations:
  - action: aggregate_labels
    label_set: [ state ]
    aggregation_type: sum

NOTE: Only the sum aggregation function is supported for histogram and exponential histogram datatypes.

Aggregate label values

# aggregate data points with state label value slab_reclaimable & slab_unreclaimable using summation into slab
include: system.memory.usage
action: update
operations:
  - action: aggregate_label_values
    label: state
    aggregated_values: [ slab_reclaimable, slab_unreclaimable ]
    new_value: slab 
    aggregation_type: sum

NOTE: Only the sum aggregation function is supported for histogram and exponential histogram datatypes.

Combine metrics

# convert a set of metrics for each http_method into a single metric with an http_method label, i.e.
#
# Web Service (*)/Total Delete Requests     iis.requests{http_method=delete}
# Web Service (*)/Total Get Requests     >  iis.requests{http_method=get}
# Web Service (*)/Total Post Requests       iis.requests{http_method=post}
include: ^Web Service \(\*\)/Total (?P<http_method>.*) Requests$
match_type: regexp
action: combine
new_name: iis.requests
submatch_case: lower
operations:
  ...

Group Metrics

# Group metrics from one single ResourceMetrics and report them as multiple ResourceMetrics.
# 
# ex: Consider pod and container metrics collected from Kubernetes. Both the metrics are recorded under one ResourceMetric
# applying this transformation will result in two separate ResourceMetric packets with corresponding resource labels in the resource headers
#
# instead of regular $ use double dollar $$. Because $ is treated as a special character.


- include: ^k8s\.pod\.(.*)$$
  match_type: regexp
  action: group
  group_resource_labels: {"resource.type": "k8s.pod", "source": "kubelet"}
- include: ^container\.(.*)$$
  match_type: regexp
  action: group
  group_resource_labels: {"resource.type": "container", "source": "kubelet"}

Metric Transform Processor vs. Attributes Processor for Metrics

Regarding metric support, these two processors have overlapping functionality. They can both do simple modifications of metric attribute key-value pairs. As a general rule the attributes processor has more attribute related functionality, while the metrics transform processor can do much more data manipulation. The attributes processor is preferred when the only needed functionality is overlapping, as it natively uses the official OpenTelemetry data model. However, if the metric transform processor is already in use or its extra functionality is necessary, there's no need to migrate away from it.

Shared functionality

  • Add attributes
  • Update values of attributes

Attribute processor specific functionality

  • delete
  • hash
  • extract

Metric transform processor specific functionality

  • Rename metrics
  • Delete data points
  • Toggle data type
  • Scale value
  • Aggregate across label sets
  • Aggregate across label values