Add Trace Span Pruning Processor by portertech · Pull Request #45617 · open-telemetry/opentelemetry-collector-contrib

portertech · 2026-01-23T17:22:17Z

Summary

This PR introduces the spanpruningprocessor, a new trace processor that dramatically reduces trace storage costs while preserving observability value. It intelligently identifies and aggregates repetitive leaf spans within traces, replacing groups of similar operations with single summary spans that capture the full statistical picture.

Component donation issue: #45654

The Problem

Modern distributed systems generate enormous volumes of trace data. A significant portion consists of repetitive, similar spans—think N+1 database queries, batch HTTP calls, or fan-out operations. Storing every individual span is expensive and often provides diminishing analytical value beyond the first few instances.

Current solutions are inadequate:

Head Sampling loses entire traces, breaking root cause analysis
Tail sampling helps but still keeps every span in sampled traces
Manual instrumentation changes require code modifications across services

The Solution

The Span Pruning Processor identifies duplicate or similar leaf spans within a single trace, groups them, and replaces each group with a single aggregated summary span. When leaf spans are aggregated, the processor also recursively aggregates their parent spans if all children of those parents are being aggregated.

Leaf spans are spans that are not referenced as a parent by any other span in the trace. They typically represent the last actions in an execution call stack (e.g., individual database queries, HTTP calls to external services).

Spans are grouped by:

Span name - spans must have the same name
Span kind - spans must have the same kind (Internal, Server, Client, Producer, Consumer)
Status code - spans must have the same status (OK, Error, or Unset)
TraceState - spans must have identical TraceState values (for Consistent Probability Sampling compatibility)
Configured attributes - spans must have matching values for attributes specified in group_by_attributes
Parent span name - leaf spans must share the same parent span name to be grouped together

Parent spans are eligible for aggregation when all of their children are aggregated, they share the same name, kind, and status code, and they are not root spans.

Optionally, the processor can detect duration outliers using statistical methods (IQR or MAD) and either annotate summary spans with outlier correlations or preserve outlier spans as individual spans for debugging while still aggregating normal spans.

This processor is useful for reducing trace data volume while preserving meaningful information about repeated operations.

Use Cases

Database query optimization: When an application makes many similar database queries (e.g., N+1 queries), aggregate them into a single summary span
Batch operations: Consolidate many similar leaf operations into a single representative span
Cost reduction: Reduce trace storage costs by eliminating redundant span data

Configuration

processors:
  spanpruning:
    # Attributes to use for grouping similar leaf spans (supports glob patterns)
    # Spans with the same name AND same values for matching attributes will be grouped
    # Examples:
    #   - "db.*" matches db.operation, db.name, db.statement, etc.
    #   - "http.request.*" matches http.request.method, http.request.header, etc.
    #   - "db.operation" matches only the exact key "db.operation"
    group_by_attributes:
      - "db.*"
      - "http.method"

    # Minimum number of similar leaf spans required before aggregation
    # Default: 5
    min_spans_to_aggregate: 3

    # Maximum depth of parent span aggregation above leaf spans
    # 0 = only aggregate leaf spans (no parent aggregation)
    # -1 = unlimited depth
    # Default: 1
    max_parent_depth: 1

    # Prefix for aggregation statistics attributes
    # Default: "aggregation."
    aggregation_attribute_prefix: "batch."

    # Upper bounds for histogram buckets (latency distribution)
    # Default: [5ms, 10ms, 25ms, 50ms, 100ms, 250ms, 500ms, 1s, 2.5s, 5s, 10s]
    # Set to empty list to disable histogram
    aggregation_histogram_buckets: [10ms, 50ms, 100ms, 500ms, 1s]

    # Enable attribute loss analysis during aggregation
    # Default: false (reduces telemetry overhead)
    # When enabled, analyzes attribute differences, records metrics, and adds summary attributes
    enable_attribute_loss_analysis: false

    # Attribute loss exemplar sampling rate
    # Fraction of attribute-loss metric recordings that include trace exemplars.
    # Range: 0.0 (disabled) to 1.0 (always)
    # Default: 0.01 (1%)
    attribute_loss_exemplar_sample_rate: 0.01

    # Enable measurement of serialized trace sizes before and after pruning
    # When enabled, records bytes_received and bytes_emitted metrics
    # This requires serializing the trace data which can be expensive for large batches
    # Default: false
    enable_bytes_metrics: false

    # Enable IQR or MAD outlier detection and attribute correlation
    # When enabled, adds duration_median_ns and outlier_correlated_attributes
    # to summary spans
    # Default: false
    enable_outlier_analysis: false

    # Outlier analysis configuration (optional)
    outlier_analysis:
      # Statistical method for outlier detection
      # "iqr" (default): Interquartile Range method
      # "mad": Median Absolute Deviation method (more robust to extreme outliers)
      method: iqr

      # IQR multiplier for outlier detection threshold (when method=iqr)
      # Outliers are spans with duration > Q3 + (iqr_multiplier * IQR)
      # Common values: 1.5 (standard), 3.0 (extreme only)
      # Default: 1.5
      iqr_multiplier: 1.5

      # MAD multiplier for outlier detection threshold (when method=mad)
      # Outliers are spans with duration > median + (mad_multiplier * MAD * 1.4826)
      # Common values: 2.5-3.0 (standard), 3.5+ (extreme only)
      # Default: 3.0
      mad_multiplier: 3.0

      # Minimum group size for reliable IQR calculation
      # Groups smaller than this skip outlier analysis
      # Must be at least 4 (need quartiles)
      # Default: 7
      min_group_size: 7

      # Minimum fraction of outliers that must share an attribute value
      # for it to be reported as correlated
      # Range: (0.0, 1.0]
      # Default: 0.75 (75% of outliers must share the value)
      correlation_min_occurrence: 0.75

      # Maximum fraction of normal spans that can have the correlated value
      # Lower values mean stronger signal
      # Range: [0.0, 1.0)
      # Default: 0.25 (at most 25% of normal spans can have the value)
      correlation_max_normal_occurrence: 0.25

      # Maximum correlated attributes to report in summary span attribute
      # Default: 5
      max_correlated_attributes: 5

      # Preserve outlier spans as individual spans instead of aggregating
      # When true, only normal spans are aggregated; outliers remain in the trace
      # Default: false
      preserve_outliers: false

      # Maximum number of outlier spans to preserve per aggregation group
      # Spans are selected by most extreme duration first
      # 0 = preserve all detected outliers
      # Default: 2
      max_preserved_outliers: 2

      # Only preserve outliers when a strong attribute correlation is found
      # This avoids preserving outliers that are just random variance
      # Default: false
      preserve_only_with_correlation: false

Configuration Options

Field	Type	Default	Description
`group_by_attributes`	[]string	[]	Attribute patterns for grouping (supports glob patterns like `db.*`)
`min_spans_to_aggregate`	int	5	Minimum group size before aggregation occurs
`max_parent_depth`	int	1	Max depth of parent aggregation (0=none, -1=unlimited)
`aggregation_attribute_prefix`	string	"aggregation."	Prefix for aggregation statistics attributes
`aggregation_histogram_buckets`	[]time.Duration	`[5ms, 10ms, 25ms, 50ms, 100ms, 250ms, 500ms, 1s, 2.5s, 5s, 10s]`	Upper bounds for histogram buckets
`enable_attribute_loss_analysis`	bool	false	Enable attribute loss analysis (adds metrics and span attributes showing attribute differences)
`attribute_loss_exemplar_sample_rate`	float64	0.01	Fraction of attribute-loss metric recordings that include trace exemplars (0.0–1.0). Only applies when `enable_attribute_loss_analysis` is true.
`enable_bytes_metrics`	bool	false	Enable measurement of serialized trace sizes (bytes_received/bytes_emitted metrics)
`enable_outlier_analysis`	bool	false	Enable outlier detection and correlation analysis
`outlier_analysis.method`	string	"iqr"	Statistical method: "iqr" or "mad"
`outlier_analysis.iqr_multiplier`	float64	1.5	IQR threshold multiplier (when method=iqr)
`outlier_analysis.mad_multiplier`	float64	3.0	MAD threshold multiplier (when method=mad)
`outlier_analysis.min_group_size`	int	7	Minimum group size for outlier analysis
`outlier_analysis.correlation_min_occurrence`	float64	0.75	Minimum outlier occurrence fraction for correlation
`outlier_analysis.correlation_max_normal_occurrence`	float64	0.25	Maximum normal occurrence fraction for correlation
`outlier_analysis.max_correlated_attributes`	int	5	Maximum correlated attributes to report
`outlier_analysis.preserve_outliers`	bool	false	Keep outliers as individual spans instead of aggregating
`outlier_analysis.max_preserved_outliers`	int	2	Max outliers to preserve per group (0=preserve all)
`outlier_analysis.preserve_only_with_correlation`	bool	false	Only preserve outliers if a strong correlation is found

Glob Pattern Support

The group_by_attributes field supports glob patterns for matching attribute keys:

Pattern	Matches
`db.*`	`db.operation`, `db.name`, `db.statement`, etc.
`http.request.*`	`http.request.method`, `http.request.header.content-type`, etc.
`rpc.*`	`rpc.method`, `rpc.service`, `rpc.system`, etc.
`db.operation`	Only the exact key `db.operation`

When multiple attributes match a pattern, they are all included in the grouping key (sorted alphabetically for consistency).

Summary Span

When spans are aggregated, the summary span includes:

Properties

Name: Original span name (e.g., SELECT)
TraceID: Same as original spans
SpanID: Newly generated unique ID
ParentSpanID: Same as original spans (common parent)
Kind: Same as template span (inherited from slowest span)
StartTimestamp: Earliest start time of all spans in the group
EndTimestamp: Latest end time of all spans in the group
Status: Same as original spans (spans are grouped by status code)
TraceState: Inherited from the template span (preserved for Consistent Probability Sampling compatibility)
Attributes: Inherited from the slowest span in the group

Note: The summary span's duration (EndTimestamp - StartTimestamp) represents the total time window covered by all aggregated spans, which may exceed duration_max_ns. For example, if spans overlap or are staggered, the time range can be larger than any individual span's duration. Use duration_max_ns to find the slowest individual operation.

What Gets Aggregated Away

When spans are aggregated into a summary span, the following data from non-template spans is lost:

Data	Behavior
Span Events	Only the template (slowest) span's events are preserved
Span Links	Only the template span's links are preserved
Attributes	Non-matching attribute values are lost (see attribute loss analysis)
Individual Timestamps	Original start/end times replaced by the group's time range
SpanIDs	Original SpanIDs are replaced by a single summary SpanID

To understand attribute loss, enable enable_attribute_loss_analysis: true which adds diverse_attributes and missing_attributes to summary spans.

Aggregation Attributes

The following attributes are added to the summary span (shown with default aggregation_attribute_prefix: "aggregation."):

Attribute	Type	Description
`<prefix>is_summary`	bool	Always `true` to identify summary spans
`<prefix>span_count`	int64	Number of spans that were aggregated
`<prefix>duration_min_ns`	int64	Minimum duration in nanoseconds
`<prefix>duration_max_ns`	int64	Maximum duration in nanoseconds
`<prefix>duration_avg_ns`	int64	Average duration in nanoseconds
`<prefix>duration_total_ns`	int64	Total duration in nanoseconds
`<prefix>histogram_bucket_bounds_s`	[]float64	Bucket upper bounds in seconds (excludes +Inf)
`<prefix>histogram_bucket_counts`	[]int64	Cumulative count per bucket (includes +Inf bucket)

Optional Outlier Analysis Attributes

When enable_outlier_analysis: true, the following additional attributes are added:

Attribute	Type	Description
`<prefix>duration_median_ns`	int64	Median duration (more robust than average for skewed distributions)
`<prefix>outlier_correlated_attributes`	string	Attributes that distinguish outliers from normal spans (format: `key=value(outlier%/normal%), ...`)

Histogram Buckets

The histogram provides a latency distribution of the aggregated spans. The buckets are cumulative, meaning each bucket count includes all spans with duration less than or equal to the bucket boundary.

Example with buckets [10ms, 50ms, 100ms] and 5 spans with durations [5ms, 15ms, 25ms, 75ms, 150ms]:

histogram_bucket_bounds_s: [0.01, 0.05, 0.1]
histogram_bucket_counts: [1, 3, 4, 5]
- Bucket 0 (≤10ms): 1 span (5ms)
- Bucket 1 (≤50ms): 3 spans (5ms, 15ms, 25ms)
- Bucket 2 (≤100ms): 4 spans (5ms, 15ms, 25ms, 75ms)
- Bucket 3 (+Inf): 5 spans (all)

Outlier Analysis (Optional)

When enable_outlier_analysis: true, the processor detects duration outliers and identifies attributes that correlate with slow spans.

Detection Methods

The processor supports two statistical methods for outlier detection:

Method	Formula	Characteristics
IQR (default)	`threshold = Q3 + (multiplier × IQR)`	Standard method; sensitive to moderate outliers; uses quartiles
MAD	`threshold = median + (multiplier × MAD × 1.4826)`	More robust to extreme outliers; uses median

When to use each:

IQR: Best for typical distributions with moderate outliers. Standard choice for most use cases.
MAD: Better when you have extreme outliers that would skew IQR calculations, or when you need more stable detection thresholds.

How It Works

IQR (Interquartile Range) Method:

Sort spans by duration
Calculate Q1 (25th percentile) and Q3 (75th percentile)
Calculate IQR = Q3 - Q1
Flag spans with duration > Q3 + (iqr_multiplier × IQR) as outliers

MAD (Median Absolute Deviation) Method:

Sort spans by duration and find the median
Calculate |duration - median| for each span
MAD = median of those deviations
Flag spans with duration > median + (mad_multiplier × MAD × 1.4826) as outliers

Note: The 1.4826 scale factor makes MAD comparable to standard deviation for normal distributions.

Attribute Correlation (same for both methods):

Compare attribute values between outliers and normal spans
Find attribute values that appear frequently in outliers but rarely in normal spans
Report the strongest correlations based on the configured thresholds

Configuration Example

processors:
  spanpruning:
    enable_outlier_analysis: true
    outlier_analysis:
      method: iqr                # or "mad" for more robustness
      iqr_multiplier: 1.5        # Standard outlier threshold (IQR method)
      mad_multiplier: 3.0        # Standard outlier threshold (MAD method)
      min_group_size: 7          # Skip groups with <7 spans
      correlation_min_occurrence: 0.75   # 75% of outliers must share value
      correlation_max_normal_occurrence: 0.25  # <25% of normal spans can have it
      max_correlated_attributes: 5       # Report top 5 correlations

Example Output

SELECT (summary, span_count: 20)
  aggregation.duration_avg_ns: 45000000
  aggregation.duration_median_ns: 8000000
  aggregation.outlier_correlated_attributes: "db.cache_hit=false(100%/0%), db.shard=7(80%/10%)"

Interpretation:

Median vs Avg: Large difference (8ms vs 45ms) indicates outliers are skewing the average
Primary correlation: All outliers (100%) had cache_hit=false, while 0% of normal spans did
Secondary correlation: 80% of outliers hit shard 7, but only 10% of normal spans did

This helps identify root causes of latency issues:

Cache misses
Specific database shards
Failed retries
Timeout scenarios

When to Use

Enable when you need to understand why some operations are slow
Disable (default) to minimize overhead when outlier analysis isn't needed
Works best with groups of 10+ spans for statistical reliability

Performance Impact

Computational overhead: Sorts durations, calculates quartiles, counts attribute occurrences
Minimal when disabled: Zero overhead (no sorting or calculations)
Recommended: Use min_group_size: 7 or higher to skip analysis on small groups

Preserving Outlier Spans (Optional)

When outlier_analysis.preserve_outliers: true, detected outlier spans are kept as individual spans instead of being aggregated. This provides:

Full visibility into slow operations for debugging
Preserved context: Original attributes, events, and links remain intact
Selective aggregation: Only prune repetitive normal spans

Configuration

processors:
  spanpruning:
    enable_outlier_analysis: true
    outlier_analysis:
      preserve_outliers: true         # Keep outliers as individual spans
      max_preserved_outliers: 2       # Keep top 2 slowest outliers per group
      preserve_only_with_correlation: false  # Preserve even without correlation

Configuration Options

Field	Type	Default	Description
`preserve_outliers`	bool	false	Keep outliers as individual spans instead of aggregating
`max_preserved_outliers`	int	2	Max outliers to preserve per group (0=preserve all detected)
`preserve_only_with_correlation`	bool	false	Only preserve outliers if a strong attribute correlation is found

Example Output

Before (10 similar SELECT spans, 2 are outliers):

handler
├── SELECT - 5ms (normal)
├── SELECT - 6ms (normal)
├── SELECT - 7ms (normal)
├── SELECT - 8ms (normal)
├── SELECT - 9ms (normal)
├── SELECT - 10ms (normal)
├── SELECT - 11ms (normal)
├── SELECT - 12ms (normal)
├── SELECT - 500ms (outlier, cache_hit=false)
└── SELECT - 600ms (outlier, cache_hit=false)

After (with preserve_outliers: true, max_preserved_outliers: 2):

handler
├── SELECT (summary, span_count=8)      ← Normal spans aggregated
│   - aggregation.preserved_outlier_count: 2
│   - aggregation.outlier_correlated_attributes: "cache_hit=false(100%/0%)"
├── SELECT - 500ms                       ← Outlier preserved
│   - aggregation.is_preserved_outlier: true
│   - aggregation.summary_span_id: "abc123"
│   - cache_hit: false
└── SELECT - 600ms                       ← Outlier preserved
    - aggregation.is_preserved_outlier: true
    - aggregation.summary_span_id: "abc123"
    - cache_hit: false

Summary Span Attributes (When Preserving Outliers)

Attribute	Type	Description
`<prefix>preserved_outlier_count`	int64	Number of outlier spans preserved
`<prefix>preserved_outlier_span_ids`	[]string	SpanIDs of preserved outliers

Preserved Outlier Span Attributes

Attribute	Type	Description
`<prefix>is_preserved_outlier`	bool	Identifies span as a preserved outlier
`<prefix>summary_span_id`	string	SpanID of the associated summary span

Behavior Notes

Parent aggregation: Parents can still be aggregated if all their children are either aggregated or preserved as outliers
Skip aggregation: If preserving outliers leaves too few normal spans (below min_spans_to_aggregate), the entire group is left unchanged
Selection order: Outliers are preserved starting with the most extreme (longest duration) first

Pipeline Placement

This processor is designed to work best when placed after processors that ensure complete traces are available:

service:
  pipelines:
    traces:
      receivers: [otlp]
      processors: [groupbytrace, spanpruning, batch]
      exporters: [otlp]

Or with tail sampling:

service:
  pipelines:
    traces:
      receivers: [otlp]
      processors: [tail_sampling, spanpruning, batch]
      exporters: [otlp]

Example

Basic Example

A trace with repeated database queries (some failing):

Before Processing:

root-span (parent)
├── SELECT (leaf) - duration: 10ms, db.operation: select, status: OK
├── SELECT (leaf) - duration: 15ms, db.operation: select, status: OK
├── SELECT (leaf) - duration: 12ms, db.operation: select, status: OK
├── SELECT (leaf) - duration: 50ms, db.operation: select, status: Error
├── SELECT (leaf) - duration: 45ms, db.operation: select, status: Error
└── INSERT (leaf) - duration: 20ms, db.operation: insert, status: OK

After Processing (with min_spans_to_aggregate: 2):

root-span (parent)
├── SELECT (summary, status: OK)
│   - aggregation.is_summary: true
│   - aggregation.span_count: 3
│   - aggregation.duration_min_ns: 10000000
│   - aggregation.duration_max_ns: 15000000
│   - aggregation.duration_avg_ns: 12333333
├── SELECT (summary, status: Error)
│   - aggregation.is_summary: true
│   - aggregation.span_count: 2
│   - aggregation.duration_min_ns: 45000000
│   - aggregation.duration_max_ns: 50000000
│   - aggregation.duration_avg_ns: 47500000
└── INSERT (unchanged - only 1 span, below threshold)

Note: Spans with different status codes are grouped separately, preserving error information.

Recursive Parent Aggregation Example

When spans are aggregated, the processor also checks if their parent spans can be aggregated. Parent spans are eligible for aggregation when:

All of their children are being aggregated
They share the same name, kind, and status code with other eligible parents
They are not root spans (must have a parent)
At least 2 parents meet the criteria

Before Processing (with min_spans_to_aggregate: 2, group_by_attributes: ["db.op"]):

root
├── handler (status: OK)
│   └── SELECT (db.op=select, status: OK) ───┐
├── handler (status: OK)                      │ leaf group A: 3 OK SELECTs
│   └── SELECT (db.op=select, status: OK) ───┤
├── handler (status: OK)                      │
│   └── SELECT (db.op=select, status: OK) ───┘
├── handler (status: Error)
│   └── SELECT (db.op=select, status: Error) ┐ leaf group B: 2 Error SELECTs
├── handler (status: Error)                   │
│   └── SELECT (db.op=select, status: Error) ┘
├── handler (status: OK)
│   └── INSERT (db.op=insert, status: OK) ──── only 1, below threshold
└── worker (status: OK)
    └── SELECT (db.op=select, status: OK) ──── different parent name

After Processing:

root
├── handler (summary, status: OK, span_count: 3)
│   └── SELECT (summary, status: OK, span_count: 3)
├── handler (summary, status: Error, span_count: 2)
│   └── SELECT (summary, status: Error, span_count: 2)
├── handler (status: OK)
│   └── INSERT (status: OK) ─────────────────────────── unchanged
└── worker (status: OK)
    └── SELECT (status: OK) ─────────────────────────── unchanged

Why each span was handled this way:

Span	Result	Reason
3x handler (OK) with SELECT children	Aggregated	All children aggregated, same name+kind+status
3x SELECT (OK) under handler	Aggregated	Same name + kind + status + attributes + parent name
2x handler (Error) with SELECT children	Aggregated	All children aggregated, same name+kind+status
2x SELECT (Error) under handler	Aggregated	Same name + kind + status + attributes + parent name
handler (OK) with INSERT child	Unchanged	Child not aggregated (only 1 INSERT)
INSERT (OK)	Unchanged	Below threshold (only 1 span)
worker (OK)	Unchanged	Child not aggregated
SELECT (OK) under worker	Unchanged	Different parent name than other SELECTs

Limitations

Requires complete traces for accurate leaf detection
Summary span inherits attributes from the slowest span in the group
Parent spans are only aggregated when ALL their children are aggregated

Consistent Probability Sampling (CPS) Compatibility

The processor is designed to be compatible with Consistent Probability Sampling (CPS). CPS uses TraceState to carry sampling metadata (ot=th:...;rv:...) where:

th (threshold) indicates the sampling probability threshold
rv (randomness value) provides consistent randomness for sampling decisions

Why TraceState matters for aggregation:

Spans with different TraceState values represent different sampling populations with different "adjusted counts" (weights). Aggregating them together would produce statistically incorrect summaries and break downstream sampling decisions.

Example:

Before (if TraceState was ignored - WRONG):
  Span A (th:fd70a4, 1% sampling) ─┐
  Span B (th:fd70a4, 1% sampling) ─┼─► Summary (mixed weights = incorrect statistics)
  Span C (th:fa00, 2% sampling)   ─┘

After (with TraceState grouping - CORRECT):
  Span A (th:fd70a4) ─┬─► Summary (1% weight, correct)
  Span B (th:fd70a4) ─┘
  Span C (th:fa00)   ─── unchanged (below min_spans_to_aggregate threshold)

The processor uses exact TraceState matching (not just the th value) because:

The rv value affects sampling decisions
Vendor-specific keys may have semantic meaning
Key ordering may be significant

Telemetry

The processor emits the following metrics to help monitor its operation:

Counters

Metric	Description
`otelcol_processor_spanpruning_spans_received`	Total number of spans received by the processor
`otelcol_processor_spanpruning_spans_pruned`	Total number of spans removed by aggregation
`otelcol_processor_spanpruning_aggregations_created`	Total number of aggregation summary spans created
`otelcol_processor_spanpruning_traces_processed`	Total number of traces processed
`otelcol_processor_spanpruning_outliers_detected`	Total spans identified as outliers by analysis (when `enable_outlier_analysis: true`)
`otelcol_processor_spanpruning_outliers_preserved`	Total outlier spans kept as individual spans (when `preserve_outliers: true`)
`otelcol_processor_spanpruning_outliers_correlations_detected`	Total aggregation groups where outliers had correlated attributes
`otelcol_processor_spanpruning_bytes_received`	Total bytes of serialized traces received (when `enable_bytes_metrics: true`)
`otelcol_processor_spanpruning_bytes_emitted`	Total bytes of serialized traces emitted after pruning (when `enable_bytes_metrics: true`)

Histograms

Metric	Description
`otelcol_processor_spanpruning_aggregation_group_size`	Distribution of the number of spans per aggregation group
`otelcol_processor_spanpruning_processing_duration`	Time taken to process each batch of traces (in seconds)

Optional Attribute Loss Metrics

When enable_attribute_loss_analysis: true, the processor also emits metrics about attribute loss during aggregation. These metrics help you understand how much information is being lost when spans are grouped together.

To correlate these metrics back to traces, a configurable fraction of these metric recordings can include trace exemplars via attribute_loss_exemplar_sample_rate. Sampling is applied per aggregation group, and the exemplar context is taken from the slowest span in the group.

Histograms (Optional)

Metric	Description
`otelcol_processor_spanpruning_leaf_attribute_diversity_loss`	Attribute values lost due to diversity per leaf aggregation (when leaf spans have different attribute values)
`otelcol_processor_spanpruning_leaf_attribute_loss`	Attribute keys lost due to absence per leaf aggregation (when some spans don't have an attribute that others do)
`otelcol_processor_spanpruning_parent_attribute_diversity_loss`	Attribute values lost due to diversity per parent aggregation
`otelcol_processor_spanpruning_parent_attribute_loss`	Attribute keys lost due to absence per parent aggregation

Attribute loss analysis is disabled by default (enable_attribute_loss_analysis: false) to reduce overhead. When enabled, the processor:

Analyzes attribute differences across spans being aggregated
Records histogram metrics for loss tracking
Adds <prefix>diverse_attributes and <prefix>missing_attributes summary attributes to aggregated spans

These metrics can be used to:

Monitor the effectiveness of span pruning (compare spans_received vs spans_pruned)
Track the compression ratio achieved by aggregation
Identify processing bottlenecks via processing_duration
Understand aggregation patterns via aggregation_group_size

Signed-off-by: Sean Porter <portertech@gmail.com>

portertech · 2026-01-26T22:33:02Z

@csmarchbanks has agreed to be a code owner 🎉 I am now seeking another code owner external to our org (Grafana).

andrzej-stencel · 2026-01-27T09:32:40Z

Thanks Sean! Converting this PR to draft until the proposal is accepted.

Signed-off-by: Sean Porter <portertech@gmail.com>

portertech · 2026-01-28T17:54:03Z

@andrzej-stencel the wonderful @jmacd is keen to sponsor, I've updated the proposal 👍

PeterF778 · 2026-01-29T15:45:23Z

To be compatible with Consistent Probability Sampling, only the spans with identical TraceState should be aggregated. The description of the solution does not mention TraceState at all.

portertech · 2026-01-29T19:24:08Z

@PeterF778 excellent point, going to do some testing 👍

Signed-off-by: Sean Porter <portertech@gmail.com>

portertech · 2026-01-29T22:55:26Z

@PeterF778 implementation now accounts for tracestate. I would love your thoughts on it.

Signed-off-by: Sean Porter <portertech@gmail.com>

PeterF778 · 2026-01-30T17:49:07Z

@PeterF778 implementation now accounts for tracestate. I would love your thoughts on it.

Looks good! Thanks!

jmacd · 2026-02-03T15:54:21Z

@andrzej-stencel do you think we can accept this as one PR, or should be broken into skeleton, config, docs, impl etc.?

MikeGoldsmith · 2026-02-04T20:33:02Z

At over 9k new lines of code, this is nearly impossible to review. I'd 100% be in favour of breaking this up into manageable chunks.

portertech · 2026-02-06T17:00:56Z

@jmacd how do you propose we decompose it?

portertech · 2026-02-06T17:02:45Z

Perhaps one PR for the skeleton with the MVP pruner? No outlier detection or loss analysis.

portertech · 2026-02-06T17:15:22Z

I am going to try breaking it down this afternoon 👍

portertech · 2026-02-11T14:57:41Z

I did manage to reduce the component the its core, stripping out additional capabilities, diff main...portertech:opentelemetry-collector-contrib:trace-span-pruning-mvp

github-actions · 2026-02-26T05:38:18Z

This PR was marked stale due to lack of activity. It will be closed in 14 days.

jmacd · 2026-02-27T22:51:35Z

@portertech #45617 (comment) looks good to me. Much better!

github-actions · 2026-03-14T05:33:36Z

This PR was marked stale due to lack of activity. It will be closed in 14 days.

github-actions · 2026-03-28T05:37:51Z

Closed as inactive. Feel free to reopen if this PR is still being worked on.

## Summary This PR introduces the `spanpruningprocessor`, a new trace processor that reduces trace storage costs while preserving observability value. It intelligently identifies and aggregates repetitive leaf spans within traces, replacing groups of similar operations with single summary spans that capture the full statistical picture. This is a reduced-scope MVP of #45617 (now closed), focusing on the core aggregation algorithm. Advanced features like outlier detection, outlier preservation, histogram buckets, attribute loss analysis, and byte-size metrics will follow in subsequent PRs once the foundation is merged. Component donation issue: #45654 ## The Problem Modern distributed systems generate enormous volumes of trace data. A significant portion consists of repetitive, similar spans -- think N+1 database queries, batch HTTP calls, or fan-out operations. Storing every individual span is expensive and often provides diminishing analytical value beyond the first few instances. Current solutions are inadequate: - **Head sampling** loses entire traces, breaking root cause analysis - **Tail sampling** helps but still keeps every span in sampled traces - **Manual instrumentation changes** require code modifications across services ## The Solution The Span Pruning Processor identifies duplicate or similar leaf spans within a single trace, groups them, and replaces each group with a single aggregated summary span. When leaf spans are aggregated, the processor also recursively aggregates their parent spans if all children of those parents are being aggregated. **Leaf spans** are spans that are not referenced as a parent by any other span in the trace. They typically represent the last actions in an execution call stack (e.g., individual database queries, HTTP calls to external services). Spans are grouped by: 1. **Span name** - spans must have the same name 2. **Span kind** - spans must have the same kind (Internal, Server, Client, Producer, Consumer) 3. **Status code** - spans must have the same status (OK, Error, or Unset) 4. **TraceState** - spans must have identical TraceState values (for Consistent Probability Sampling compatibility) 5. **Configured attributes** - spans must have matching values for attributes specified in `group_by_attributes` 6. **Parent span name** - leaf spans must share the same parent span name to be grouped together Parent spans are eligible for aggregation when all of their children are aggregated, they share the same name, kind, and status code, and they are not root spans. ## Use Cases - **Database query optimization**: When an application makes many similar database queries (e.g., N+1 queries), aggregate them into a single summary span - **Batch operations**: Consolidate many similar leaf operations into a single representative span - **Cost reduction**: Reduce trace storage costs by eliminating redundant span data ## Configuration ```yaml processors: spanpruning: # Attributes to use for grouping similar leaf spans (supports glob patterns) # Spans with the same name AND same values for matching attributes will be grouped # Examples: # - "db.*" matches db.operation, db.name, db.statement, etc. # - "http.request.*" matches http.request.method, http.request.header, etc. # - "db.operation" matches only the exact key "db.operation" group_by_attributes: - "db.*" - "http.method" # Minimum number of similar leaf spans required before aggregation # Default: 5 min_spans_to_aggregate: 3 # Maximum depth of parent span aggregation above leaf spans # 0 = only aggregate leaf spans (no parent aggregation) # -1 = unlimited depth # Default: 1 max_parent_depth: 1 # Prefix for aggregation statistics attributes # Default: "aggregation." aggregation_attribute_prefix: "batch." ``` ## Configuration Options | Field | Type | Default | Description | |-------|------|---------|-------------| | `group_by_attributes` | []string | [] | Attribute patterns for grouping (supports glob patterns like `db.*`) | | `min_spans_to_aggregate` | int | 5 | Minimum group size before aggregation occurs | | `max_parent_depth` | int | 1 | Max depth of parent aggregation (0=none, -1=unlimited) | | `aggregation_attribute_prefix` | string | "aggregation." | Prefix for aggregation statistics attributes | ### Glob Pattern Support The `group_by_attributes` field supports glob patterns for matching attribute keys: | Pattern | Matches | |---------|---------| | `db.*` | `db.operation`, `db.name`, `db.statement`, etc. | | `http.request.*` | `http.request.method`, `http.request.header.content-type`, etc. | | `rpc.*` | `rpc.method`, `rpc.service`, `rpc.system`, etc. | | `db.operation` | Only the exact key `db.operation` | When multiple attributes match a pattern, they are all included in the grouping key (sorted alphabetically for consistency). ## Summary Span When spans are aggregated, the summary span includes: ### Properties - **Name**: Original span name (e.g., `SELECT`) - **TraceID**: Same as original spans - **SpanID**: Newly generated unique ID - **ParentSpanID**: Same as original spans (common parent) - **Kind**: Same as template span (inherited from slowest span) - **StartTimestamp**: Earliest start time of all spans in the group - **EndTimestamp**: Latest end time of all spans in the group - **Status**: Same as original spans (spans are grouped by status code) - **TraceState**: Inherited from the template span (preserved for Consistent Probability Sampling compatibility) - **Attributes**: Inherited from the slowest span in the group - **Events**: Inherited from the template (slowest) span - **Links**: Inherited from the template span > **Note**: The summary span's duration (`EndTimestamp - StartTimestamp`) represents the total time window covered by all aggregated spans, which may exceed `duration_max_ns`. For example, if spans overlap or are staggered, the time range can be larger than any individual span's duration. Use `duration_max_ns` to find the slowest individual operation. ### What Gets Aggregated Away When spans are aggregated into a summary span, the following data from non-template spans is **lost**: | Data | Behavior | |------|----------| | **Span Events** | Only the template (slowest) span's events are preserved | | **Span Links** | Only the template span's links are preserved | | **Attributes** | Non-matching attribute values are lost | | **Individual Timestamps** | Original start/end times replaced by the group's time range | | **SpanIDs** | Original SpanIDs are replaced by a single summary SpanID | ### Aggregation Attributes The following attributes are added to the summary span (shown with default `aggregation_attribute_prefix: "aggregation."`): | Attribute | Type | Description | |-----------|------|-------------| | `<prefix>is_summary` | bool | Always `true` to identify summary spans | | `<prefix>span_count` | int64 | Number of spans that were aggregated | | `<prefix>duration_min_ns` | int64 | Minimum duration in nanoseconds | | `<prefix>duration_max_ns` | int64 | Maximum duration in nanoseconds | | `<prefix>duration_avg_ns` | int64 | Average duration in nanoseconds | | `<prefix>duration_total_ns` | int64 | Total duration in nanoseconds | ## Pipeline Placement This processor is designed to work best when placed after processors that ensure complete traces are available: ```yaml service: pipelines: traces: receivers: [otlp] processors: [groupbytrace, spanpruning, batch] exporters: [otlp] ``` Or with tail sampling: ```yaml service: pipelines: traces: receivers: [otlp] processors: [tail_sampling, spanpruning, batch] exporters: [otlp] ``` ## Examples ### Basic Example A trace with repeated database queries (some failing): **Before Processing:** ``` root-span (parent) ├── SELECT (leaf) - duration: 10ms, db.operation: select, status: OK ├── SELECT (leaf) - duration: 15ms, db.operation: select, status: OK ├── SELECT (leaf) - duration: 12ms, db.operation: select, status: OK ├── SELECT (leaf) - duration: 50ms, db.operation: select, status: Error ├── SELECT (leaf) - duration: 45ms, db.operation: select, status: Error └── INSERT (leaf) - duration: 20ms, db.operation: insert, status: OK ``` **After Processing (with `min_spans_to_aggregate: 2`):** ``` root-span (parent) ├── SELECT (summary, status: OK) │ - aggregation.is_summary: true │ - aggregation.span_count: 3 │ - aggregation.duration_min_ns: 10000000 │ - aggregation.duration_max_ns: 15000000 │ - aggregation.duration_avg_ns: 12333333 ├── SELECT (summary, status: Error) │ - aggregation.is_summary: true │ - aggregation.span_count: 2 │ - aggregation.duration_min_ns: 45000000 │ - aggregation.duration_max_ns: 50000000 │ - aggregation.duration_avg_ns: 47500000 └── INSERT (unchanged - only 1 span, below threshold) ``` Note: Spans with different status codes are grouped separately, preserving error information. ### Recursive Parent Aggregation Example When spans are aggregated, the processor also checks if their parent spans can be aggregated. Parent spans are eligible for aggregation when: 1. All of their children are being aggregated 2. They share the same name, kind, and status code with other eligible parents 3. They are not root spans (must have a parent) 4. At least 2 parents meet the criteria **Before Processing (with `min_spans_to_aggregate: 2`, `group_by_attributes: ["db.op"]`):** ``` root ├── handler (status: OK) │ └── SELECT (db.op=select, status: OK) ───┐ ├── handler (status: OK) │ leaf group A: 3 OK SELECTs │ └── SELECT (db.op=select, status: OK) ───┤ ├── handler (status: OK) │ │ └── SELECT (db.op=select, status: OK) ───┘ ├── handler (status: Error) │ └── SELECT (db.op=select, status: Error) ┐ leaf group B: 2 Error SELECTs ├── handler (status: Error) │ │ └── SELECT (db.op=select, status: Error) ┘ ├── handler (status: OK) │ └── INSERT (db.op=insert, status: OK) ──── only 1, below threshold └── worker (status: OK) └── SELECT (db.op=select, status: OK) ──── different parent name ``` **After Processing:** ``` root ├── handler (summary, status: OK, span_count: 3) │ └── SELECT (summary, status: OK, span_count: 3) ├── handler (summary, status: Error, span_count: 2) │ └── SELECT (summary, status: Error, span_count: 2) ├── handler (status: OK) │ └── INSERT (status: OK) ─────────────────────────── unchanged └── worker (status: OK) └── SELECT (status: OK) ─────────────────────────── unchanged ``` **Why each span was handled this way:** | Span | Result | Reason | |------|--------|--------| | 3x handler (OK) with SELECT children | Aggregated | All children aggregated, same name+kind+status | | 3x SELECT (OK) under handler | Aggregated | Same name + kind + status + attributes + parent name | | 2x handler (Error) with SELECT children | Aggregated | All children aggregated, same name+kind+status | | 2x SELECT (Error) under handler | Aggregated | Same name + kind + status + attributes + parent name | | handler (OK) with INSERT child | Unchanged | Child not aggregated (only 1 INSERT) | | INSERT (OK) | Unchanged | Below threshold (only 1 span) | | worker (OK) | Unchanged | Child not aggregated | | SELECT (OK) under worker | Unchanged | Different parent name than other SELECTs | ## Consistent Probability Sampling (CPS) Compatibility The processor is designed to be compatible with [Consistent Probability Sampling](https://opentelemetry.io/docs/specs/otel/trace/tracestate-probability-sampling/) (CPS). CPS uses TraceState to carry sampling metadata (`ot=th:...;rv:...`) where: - `th` (threshold) indicates the sampling probability threshold - `rv` (randomness value) provides consistent randomness for sampling decisions **Why TraceState matters for aggregation:** Spans with different TraceState values represent different sampling populations with different "adjusted counts" (weights). Aggregating them together would produce statistically incorrect summaries and break downstream sampling decisions. The processor uses **exact TraceState matching** (not just the `th` value) because: - The `rv` value affects sampling decisions - Vendor-specific keys may have semantic meaning - Key ordering may be significant ## Limitations - Requires complete traces for accurate leaf detection - Summary span inherits attributes from the slowest span in the group - Parent spans are only aggregated when ALL their children are aggregated ## Telemetry The processor emits the following metrics to help monitor its operation: ### Counters | Metric | Description | |--------|-------------| | `otelcol_processor_spanpruning_spans_received` | Total number of spans received by the processor | | `otelcol_processor_spanpruning_spans_pruned` | Total number of spans removed by aggregation | | `otelcol_processor_spanpruning_aggregations_created` | Total number of aggregation summary spans created | | `otelcol_processor_spanpruning_traces_processed` | Total number of traces processed | ### Histograms | Metric | Description | |--------|-------------| | `otelcol_processor_spanpruning_aggregation_group_size` | Distribution of the number of spans per aggregation group | | `otelcol_processor_spanpruning_processing_duration` | Time taken to process each batch of traces (in seconds) | These metrics can be used to: - Monitor the effectiveness of span pruning (compare `spans_received` vs `spans_pruned`) - Track the compression ratio achieved by aggregation - Identify processing bottlenecks via `processing_duration` - Understand aggregation patterns via `aggregation_group_size` ## Scope / Future Work This MVP focuses on the core aggregation engine. The following features from the original PR (#45617) are planned for follow-up PRs: - **Outlier detection**: IQR and MAD-based statistical outlier detection - **Outlier preservation**: Keep slow spans as individual spans while aggregating normal ones - **Attribute correlation**: Identify attributes that correlate with slow operations - **Histogram buckets**: Latency distribution in summary spans - **Attribute loss analysis**: Track and report attribute diversity lost during aggregation - **Byte-size metrics**: Measure serialized trace sizes before/after pruning ## Architecture The processor operates in three phases per trace: 1. **Tree Construction** (`tree.go`): Builds parent-child relationships, identifies leaves and orphans 2. **Analysis** (`processor.go`, `grouping.go`): Groups similar leaf spans by key, then walks up the tree to find eligible parent spans for recursive aggregation 3. **Execution** (`aggregation.go`): Sorts groups top-down, creates summary spans with preassigned SpanIDs, and batch-removes originals Key design decisions: - **Tree-based analysis** avoids O(n^2) parent lookups by pre-computing relationships - **Type-safe attribute encoding** (`grouping.go`) ensures correct grouping for all pdata value types (maps, slices, bytes) - **Pooled string builders** minimize allocations in the hot grouping-key path - **Single-pass statistics** (`stats.go`) computes min/max/avg/total and time ranges without extra traversals #### Link to tracking issue Fixes #45654 #### Testing - Comprehensive unit tests (`processor_test.go`) covering: leaf span aggregation, recursive parent aggregation at multiple depths, grouping by attributes with glob patterns, status code separation, TraceState/CPS compatibility, span kind grouping, edge cases (empty traces, single spans, orphans, multiple roots), configuration validation, and template span selection (events, links, attributes inherited from slowest span) - Configuration validation tests (`config_test.go`) covering all fields and error cases - Aggregation logic tests (`aggregation_test.go`) for duration calculation and template selection - Benchmark tests (`processor_benchmark_test.go`) measuring throughput across varying trace sizes (100-10000 spans) and group counts - Generated component lifecycle tests and telemetry tests via `mdatagen` #### Documentation - Comprehensive `README.md` with configuration reference, glob pattern examples, summary span schema, pipeline placement guidance, before/after examples (including recursive parent aggregation), CPS compatibility notes, limitations, and telemetry reference - `documentation.md` generated from `metadata.yaml` describing all 6 custom telemetry metrics --------- Signed-off-by: Sean Porter <portertech@gmail.com>

## Summary This PR introduces the `spanpruningprocessor`, a new trace processor that reduces trace storage costs while preserving observability value. It intelligently identifies and aggregates repetitive leaf spans within traces, replacing groups of similar operations with single summary spans that capture the full statistical picture. This is a reduced-scope MVP of open-telemetry#45617 (now closed), focusing on the core aggregation algorithm. Advanced features like outlier detection, outlier preservation, histogram buckets, attribute loss analysis, and byte-size metrics will follow in subsequent PRs once the foundation is merged. Component donation issue: open-telemetry#45654 ## The Problem Modern distributed systems generate enormous volumes of trace data. A significant portion consists of repetitive, similar spans -- think N+1 database queries, batch HTTP calls, or fan-out operations. Storing every individual span is expensive and often provides diminishing analytical value beyond the first few instances. Current solutions are inadequate: - **Head sampling** loses entire traces, breaking root cause analysis - **Tail sampling** helps but still keeps every span in sampled traces - **Manual instrumentation changes** require code modifications across services ## The Solution The Span Pruning Processor identifies duplicate or similar leaf spans within a single trace, groups them, and replaces each group with a single aggregated summary span. When leaf spans are aggregated, the processor also recursively aggregates their parent spans if all children of those parents are being aggregated. **Leaf spans** are spans that are not referenced as a parent by any other span in the trace. They typically represent the last actions in an execution call stack (e.g., individual database queries, HTTP calls to external services). Spans are grouped by: 1. **Span name** - spans must have the same name 2. **Span kind** - spans must have the same kind (Internal, Server, Client, Producer, Consumer) 3. **Status code** - spans must have the same status (OK, Error, or Unset) 4. **TraceState** - spans must have identical TraceState values (for Consistent Probability Sampling compatibility) 5. **Configured attributes** - spans must have matching values for attributes specified in `group_by_attributes` 6. **Parent span name** - leaf spans must share the same parent span name to be grouped together Parent spans are eligible for aggregation when all of their children are aggregated, they share the same name, kind, and status code, and they are not root spans. ## Use Cases - **Database query optimization**: When an application makes many similar database queries (e.g., N+1 queries), aggregate them into a single summary span - **Batch operations**: Consolidate many similar leaf operations into a single representative span - **Cost reduction**: Reduce trace storage costs by eliminating redundant span data ## Configuration ```yaml processors: spanpruning: # Attributes to use for grouping similar leaf spans (supports glob patterns) # Spans with the same name AND same values for matching attributes will be grouped # Examples: # - "db.*" matches db.operation, db.name, db.statement, etc. # - "http.request.*" matches http.request.method, http.request.header, etc. # - "db.operation" matches only the exact key "db.operation" group_by_attributes: - "db.*" - "http.method" # Minimum number of similar leaf spans required before aggregation # Default: 5 min_spans_to_aggregate: 3 # Maximum depth of parent span aggregation above leaf spans # 0 = only aggregate leaf spans (no parent aggregation) # -1 = unlimited depth # Default: 1 max_parent_depth: 1 # Prefix for aggregation statistics attributes # Default: "aggregation." aggregation_attribute_prefix: "batch." ``` ## Configuration Options | Field | Type | Default | Description | |-------|------|---------|-------------| | `group_by_attributes` | []string | [] | Attribute patterns for grouping (supports glob patterns like `db.*`) | | `min_spans_to_aggregate` | int | 5 | Minimum group size before aggregation occurs | | `max_parent_depth` | int | 1 | Max depth of parent aggregation (0=none, -1=unlimited) | | `aggregation_attribute_prefix` | string | "aggregation." | Prefix for aggregation statistics attributes | ### Glob Pattern Support The `group_by_attributes` field supports glob patterns for matching attribute keys: | Pattern | Matches | |---------|---------| | `db.*` | `db.operation`, `db.name`, `db.statement`, etc. | | `http.request.*` | `http.request.method`, `http.request.header.content-type`, etc. | | `rpc.*` | `rpc.method`, `rpc.service`, `rpc.system`, etc. | | `db.operation` | Only the exact key `db.operation` | When multiple attributes match a pattern, they are all included in the grouping key (sorted alphabetically for consistency). ## Summary Span When spans are aggregated, the summary span includes: ### Properties - **Name**: Original span name (e.g., `SELECT`) - **TraceID**: Same as original spans - **SpanID**: Newly generated unique ID - **ParentSpanID**: Same as original spans (common parent) - **Kind**: Same as template span (inherited from slowest span) - **StartTimestamp**: Earliest start time of all spans in the group - **EndTimestamp**: Latest end time of all spans in the group - **Status**: Same as original spans (spans are grouped by status code) - **TraceState**: Inherited from the template span (preserved for Consistent Probability Sampling compatibility) - **Attributes**: Inherited from the slowest span in the group - **Events**: Inherited from the template (slowest) span - **Links**: Inherited from the template span > **Note**: The summary span's duration (`EndTimestamp - StartTimestamp`) represents the total time window covered by all aggregated spans, which may exceed `duration_max_ns`. For example, if spans overlap or are staggered, the time range can be larger than any individual span's duration. Use `duration_max_ns` to find the slowest individual operation. ### What Gets Aggregated Away When spans are aggregated into a summary span, the following data from non-template spans is **lost**: | Data | Behavior | |------|----------| | **Span Events** | Only the template (slowest) span's events are preserved | | **Span Links** | Only the template span's links are preserved | | **Attributes** | Non-matching attribute values are lost | | **Individual Timestamps** | Original start/end times replaced by the group's time range | | **SpanIDs** | Original SpanIDs are replaced by a single summary SpanID | ### Aggregation Attributes The following attributes are added to the summary span (shown with default `aggregation_attribute_prefix: "aggregation."`): | Attribute | Type | Description | |-----------|------|-------------| | `<prefix>is_summary` | bool | Always `true` to identify summary spans | | `<prefix>span_count` | int64 | Number of spans that were aggregated | | `<prefix>duration_min_ns` | int64 | Minimum duration in nanoseconds | | `<prefix>duration_max_ns` | int64 | Maximum duration in nanoseconds | | `<prefix>duration_avg_ns` | int64 | Average duration in nanoseconds | | `<prefix>duration_total_ns` | int64 | Total duration in nanoseconds | ## Pipeline Placement This processor is designed to work best when placed after processors that ensure complete traces are available: ```yaml service: pipelines: traces: receivers: [otlp] processors: [groupbytrace, spanpruning, batch] exporters: [otlp] ``` Or with tail sampling: ```yaml service: pipelines: traces: receivers: [otlp] processors: [tail_sampling, spanpruning, batch] exporters: [otlp] ``` ## Examples ### Basic Example A trace with repeated database queries (some failing): **Before Processing:** ``` root-span (parent) ├── SELECT (leaf) - duration: 10ms, db.operation: select, status: OK ├── SELECT (leaf) - duration: 15ms, db.operation: select, status: OK ├── SELECT (leaf) - duration: 12ms, db.operation: select, status: OK ├── SELECT (leaf) - duration: 50ms, db.operation: select, status: Error ├── SELECT (leaf) - duration: 45ms, db.operation: select, status: Error └── INSERT (leaf) - duration: 20ms, db.operation: insert, status: OK ``` **After Processing (with `min_spans_to_aggregate: 2`):** ``` root-span (parent) ├── SELECT (summary, status: OK) │ - aggregation.is_summary: true │ - aggregation.span_count: 3 │ - aggregation.duration_min_ns: 10000000 │ - aggregation.duration_max_ns: 15000000 │ - aggregation.duration_avg_ns: 12333333 ├── SELECT (summary, status: Error) │ - aggregation.is_summary: true │ - aggregation.span_count: 2 │ - aggregation.duration_min_ns: 45000000 │ - aggregation.duration_max_ns: 50000000 │ - aggregation.duration_avg_ns: 47500000 └── INSERT (unchanged - only 1 span, below threshold) ``` Note: Spans with different status codes are grouped separately, preserving error information. ### Recursive Parent Aggregation Example When spans are aggregated, the processor also checks if their parent spans can be aggregated. Parent spans are eligible for aggregation when: 1. All of their children are being aggregated 2. They share the same name, kind, and status code with other eligible parents 3. They are not root spans (must have a parent) 4. At least 2 parents meet the criteria **Before Processing (with `min_spans_to_aggregate: 2`, `group_by_attributes: ["db.op"]`):** ``` root ├── handler (status: OK) │ └── SELECT (db.op=select, status: OK) ───┐ ├── handler (status: OK) │ leaf group A: 3 OK SELECTs │ └── SELECT (db.op=select, status: OK) ───┤ ├── handler (status: OK) │ │ └── SELECT (db.op=select, status: OK) ───┘ ├── handler (status: Error) │ └── SELECT (db.op=select, status: Error) ┐ leaf group B: 2 Error SELECTs ├── handler (status: Error) │ │ └── SELECT (db.op=select, status: Error) ┘ ├── handler (status: OK) │ └── INSERT (db.op=insert, status: OK) ──── only 1, below threshold └── worker (status: OK) └── SELECT (db.op=select, status: OK) ──── different parent name ``` **After Processing:** ``` root ├── handler (summary, status: OK, span_count: 3) │ └── SELECT (summary, status: OK, span_count: 3) ├── handler (summary, status: Error, span_count: 2) │ └── SELECT (summary, status: Error, span_count: 2) ├── handler (status: OK) │ └── INSERT (status: OK) ─────────────────────────── unchanged └── worker (status: OK) └── SELECT (status: OK) ─────────────────────────── unchanged ``` **Why each span was handled this way:** | Span | Result | Reason | |------|--------|--------| | 3x handler (OK) with SELECT children | Aggregated | All children aggregated, same name+kind+status | | 3x SELECT (OK) under handler | Aggregated | Same name + kind + status + attributes + parent name | | 2x handler (Error) with SELECT children | Aggregated | All children aggregated, same name+kind+status | | 2x SELECT (Error) under handler | Aggregated | Same name + kind + status + attributes + parent name | | handler (OK) with INSERT child | Unchanged | Child not aggregated (only 1 INSERT) | | INSERT (OK) | Unchanged | Below threshold (only 1 span) | | worker (OK) | Unchanged | Child not aggregated | | SELECT (OK) under worker | Unchanged | Different parent name than other SELECTs | ## Consistent Probability Sampling (CPS) Compatibility The processor is designed to be compatible with [Consistent Probability Sampling](https://opentelemetry.io/docs/specs/otel/trace/tracestate-probability-sampling/) (CPS). CPS uses TraceState to carry sampling metadata (`ot=th:...;rv:...`) where: - `th` (threshold) indicates the sampling probability threshold - `rv` (randomness value) provides consistent randomness for sampling decisions **Why TraceState matters for aggregation:** Spans with different TraceState values represent different sampling populations with different "adjusted counts" (weights). Aggregating them together would produce statistically incorrect summaries and break downstream sampling decisions. The processor uses **exact TraceState matching** (not just the `th` value) because: - The `rv` value affects sampling decisions - Vendor-specific keys may have semantic meaning - Key ordering may be significant ## Limitations - Requires complete traces for accurate leaf detection - Summary span inherits attributes from the slowest span in the group - Parent spans are only aggregated when ALL their children are aggregated ## Telemetry The processor emits the following metrics to help monitor its operation: ### Counters | Metric | Description | |--------|-------------| | `otelcol_processor_spanpruning_spans_received` | Total number of spans received by the processor | | `otelcol_processor_spanpruning_spans_pruned` | Total number of spans removed by aggregation | | `otelcol_processor_spanpruning_aggregations_created` | Total number of aggregation summary spans created | | `otelcol_processor_spanpruning_traces_processed` | Total number of traces processed | ### Histograms | Metric | Description | |--------|-------------| | `otelcol_processor_spanpruning_aggregation_group_size` | Distribution of the number of spans per aggregation group | | `otelcol_processor_spanpruning_processing_duration` | Time taken to process each batch of traces (in seconds) | These metrics can be used to: - Monitor the effectiveness of span pruning (compare `spans_received` vs `spans_pruned`) - Track the compression ratio achieved by aggregation - Identify processing bottlenecks via `processing_duration` - Understand aggregation patterns via `aggregation_group_size` ## Scope / Future Work This MVP focuses on the core aggregation engine. The following features from the original PR (open-telemetry#45617) are planned for follow-up PRs: - **Outlier detection**: IQR and MAD-based statistical outlier detection - **Outlier preservation**: Keep slow spans as individual spans while aggregating normal ones - **Attribute correlation**: Identify attributes that correlate with slow operations - **Histogram buckets**: Latency distribution in summary spans - **Attribute loss analysis**: Track and report attribute diversity lost during aggregation - **Byte-size metrics**: Measure serialized trace sizes before/after pruning ## Architecture The processor operates in three phases per trace: 1. **Tree Construction** (`tree.go`): Builds parent-child relationships, identifies leaves and orphans 2. **Analysis** (`processor.go`, `grouping.go`): Groups similar leaf spans by key, then walks up the tree to find eligible parent spans for recursive aggregation 3. **Execution** (`aggregation.go`): Sorts groups top-down, creates summary spans with preassigned SpanIDs, and batch-removes originals Key design decisions: - **Tree-based analysis** avoids O(n^2) parent lookups by pre-computing relationships - **Type-safe attribute encoding** (`grouping.go`) ensures correct grouping for all pdata value types (maps, slices, bytes) - **Pooled string builders** minimize allocations in the hot grouping-key path - **Single-pass statistics** (`stats.go`) computes min/max/avg/total and time ranges without extra traversals #### Link to tracking issue Fixes open-telemetry#45654 #### Testing - Comprehensive unit tests (`processor_test.go`) covering: leaf span aggregation, recursive parent aggregation at multiple depths, grouping by attributes with glob patterns, status code separation, TraceState/CPS compatibility, span kind grouping, edge cases (empty traces, single spans, orphans, multiple roots), configuration validation, and template span selection (events, links, attributes inherited from slowest span) - Configuration validation tests (`config_test.go`) covering all fields and error cases - Aggregation logic tests (`aggregation_test.go`) for duration calculation and template selection - Benchmark tests (`processor_benchmark_test.go`) measuring throughput across varying trace sizes (100-10000 spans) and group counts - Generated component lifecycle tests and telemetry tests via `mdatagen` #### Documentation - Comprehensive `README.md` with configuration reference, glob pattern examples, summary span schema, pipeline placement guidance, before/after examples (including recursive parent aggregation), CPS compatibility notes, limitations, and telemetry reference - `documentation.md` generated from `metadata.yaml` describing all 6 custom telemetry metrics --------- Signed-off-by: Sean Porter <portertech@gmail.com>

portertech added 30 commits January 5, 2026 16:12

first pass, one-shot

583fdf6

Signed-off-by: Sean Porter <portertech@gmail.com>

removed leaf span filtering, make it broad/general

e79711e

Signed-off-by: Sean Porter <portertech@gmail.com>

aggregation duration histogram

ae49764

Signed-off-by: Sean Porter <portertech@gmail.com>

also group spans by status

32d0314

Signed-off-by: Sean Porter <portertech@gmail.com>

min spans to aggregate default is now 5

a39101b

Signed-off-by: Sean Porter <portertech@gmail.com>

aggregate leaf parents

2458a3c

Signed-off-by: Sean Porter <portertech@gmail.com>

updated readme overview

99254a1

Signed-off-by: Sean Porter <portertech@gmail.com>

updated readme aggregation attributes table to use prefix

bef6660

Signed-off-by: Sean Porter <portertech@gmail.com>

use a tree

ca6d0b7

Signed-off-by: Sean Porter <portertech@gmail.com>

refactoring

9bc0cd5

Signed-off-by: Sean Porter <portertech@gmail.com>

more refactoring

277ca6a

Signed-off-by: Sean Porter <portertech@gmail.com>

renamed the component, dropped "leaf", scope can grow

3199647

Signed-off-by: Sean Porter <portertech@gmail.com>

max parent depth

3a217c4

Signed-off-by: Sean Porter <portertech@gmail.com>

renamed summary_ -> aggregation_ for consistency

ff22633

Signed-off-by: Sean Porter <portertech@gmail.com>

readme edit, call out "leaf" spans

a32a9f2

Signed-off-by: Sean Porter <portertech@gmail.com>

use math/rand/v2 for aggregation span ids

a33fc24

Signed-off-by: Sean Porter <portertech@gmail.com>

further config validation, aggregation suffix, prefix, and group attrs

0355252

Signed-off-by: Sean Porter <portertech@gmail.com>

added instrumentation

5812c98

Signed-off-by: Sean Porter <portertech@gmail.com>

added telemetry section to readme

594f606

Signed-off-by: Sean Porter <portertech@gmail.com>

use seconds for duration unit

afe00ca

Signed-off-by: Sean Porter <portertech@gmail.com>

sparse aggregation benchmark

eff9148

Signed-off-by: Sean Porter <portertech@gmail.com>

walk up from marked nodes, limit depth to 10

a093fe6

Signed-off-by: Sean Porter <portertech@gmail.com>

removed max parent depth limit

86d1391

Signed-off-by: Sean Porter <portertech@gmail.com>

replaced aggregation_span_name_suffix with the is_summary span attribute

023a5bd

Signed-off-by: Sean Porter <portertech@gmail.com>

cleaned up garbage artifact

9567e84

Signed-off-by: Sean Porter <portertech@gmail.com>

measure attribute loss

2cf7d27

Signed-off-by: Sean Porter <portertech@gmail.com>

attribute loss analysis is optional

60210de

Signed-off-by: Sean Porter <portertech@gmail.com>

Merge branch 'trace-span-pruning' into trace-span-pruning-preservation

d2b1ad3

fixed tests and metric docs

c5ff29e

Signed-off-by: Sean Porter <portertech@gmail.com>

attribute loss exemplars

ebd9006

Signed-off-by: Sean Porter <portertech@gmail.com>

andrzej-stencel marked this pull request as draft January 27, 2026 09:31

added @csmarchbanks as code owner

8d80c94

Signed-off-by: Sean Porter <portertech@gmail.com>

portertech added 2 commits January 29, 2026 11:28

Merge branch 'main' into trace-span-pruning

ae316ad

tracestate grouping

16c7560

Signed-off-by: Sean Porter <portertech@gmail.com>

fix lint errors

db604e2

Signed-off-by: Sean Porter <portertech@gmail.com>

Merge branch 'main' into trace-span-pruning

4632d00

github-actions Bot added the Stale label Feb 26, 2026

github-actions Bot removed the Stale label Feb 28, 2026

github-actions Bot added the Stale label Mar 14, 2026

github-actions Bot closed this Mar 28, 2026

portertech mentioned this pull request Mar 31, 2026

Add Trace Span Pruning Processor #47277

Merged

Conversation

portertech commented Jan 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

The Problem

The Solution

Use Cases

Configuration

Configuration Options

Glob Pattern Support

Summary Span

Properties

What Gets Aggregated Away

Aggregation Attributes

Optional Outlier Analysis Attributes

Histogram Buckets

Outlier Analysis (Optional)

Detection Methods

How It Works

Configuration Example

Example Output

When to Use

Performance Impact

Preserving Outlier Spans (Optional)

Configuration

Configuration Options

Example Output

Summary Span Attributes (When Preserving Outliers)

Preserved Outlier Span Attributes

Behavior Notes

Pipeline Placement

Example

Basic Example

Recursive Parent Aggregation Example

Limitations

Consistent Probability Sampling (CPS) Compatibility

Telemetry

Counters

Histograms

Optional Attribute Loss Metrics

Histograms (Optional)

Uh oh!

portertech commented Jan 26, 2026

Uh oh!

andrzej-stencel commented Jan 27, 2026

Uh oh!

portertech commented Jan 28, 2026

Uh oh!

PeterF778 commented Jan 29, 2026

Uh oh!

portertech commented Jan 29, 2026

Uh oh!

portertech commented Jan 29, 2026

Uh oh!

PeterF778 commented Jan 30, 2026

Uh oh!

jmacd commented Feb 3, 2026

Uh oh!

MikeGoldsmith commented Feb 4, 2026

Uh oh!

portertech commented Feb 6, 2026

Uh oh!

portertech commented Feb 6, 2026

Uh oh!

portertech commented Feb 6, 2026

Uh oh!

portertech commented Feb 11, 2026

Uh oh!

github-actions Bot commented Feb 26, 2026

Uh oh!

jmacd commented Feb 27, 2026

Uh oh!

github-actions Bot commented Mar 14, 2026

Uh oh!

github-actions Bot commented Mar 28, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

portertech commented Jan 23, 2026 •

edited

Loading