Use sync.Map for exponential histogram aggregations by dashpole · Pull Request #8077 · open-telemetry/opentelemetry-go

dashpole · 2026-03-19T21:37:05Z

Part of #7796

This applies the same approach as I did for fixed-bucket histograms (#7474) to exponential histograms.

Changes

Move the sync.Mutex from outside the entire map to now only covering the scale and positive/negative buckets.
Split expoHistogram into deltaExpoHistogram and cumulativeExpoHistogram: TODO

This does not make the buckets concurrent-safe. That will be done in subsequent PRs.

codecov · 2026-03-19T23:07:44Z

Codecov Report

❌ Patch coverage is 88.32685% with 30 lines in your changes missing coverage. Please review.
✅ Project coverage is 82.0%. Comparing base (8d70624) to head (5ad698a).

Files with missing lines	Patch %	Lines
...metric/internal/aggregate/exponential_histogram.go	87.8%	20 Missing and 6 partials ⚠️
sdk/metric/internal/aggregate/atomic.go	89.7%	2 Missing and 2 partials ⚠️

Additional details and impacted files

@@           Coverage Diff           @@
##            main   #8077     +/-   ##
=======================================
- Coverage   82.0%   82.0%   -0.1%     
=======================================
  Files        308     308             
  Lines      24060   24228    +168     
=======================================
+ Hits       19748   19882    +134     
- Misses      3936    3961     +25     
- Partials     376     385      +9

Files with missing lines	Coverage Δ
sdk/metric/internal/aggregate/aggregate.go	`100.0% <100.0%> (ø)`
sdk/metric/internal/aggregate/atomic.go	`89.9% <89.7%> (-2.8%)`	⬇️
...metric/internal/aggregate/exponential_histogram.go	`92.8% <87.8%> (-7.2%)`	⬇️

... and 1 file with indirect coverage changes

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

dashpole · 2026-04-03T14:46:05Z

One part if this that is challenging to resolve is dealing with underflow for cumulative metrics. The current design mirrors how the histogram implementation works: Collection swaps hot and cold, reads the cold, and then merges the cold back into the hot point.

The issue comes during the merge process. It is possible that an observation made to the hot point should underflow, but we don't find that out until we try to merge the cold point into the hot one. For example:

attrs := attribute.NewSet()
maxSize := 2
h := newExpoHistogram(maxsize, ...)
h.measure(ctx, math.MaxFloat64, attrs, ...)
go h.collect(...)
h.measure(ctx, math.SmallestNonzeroFloat64, attrs)
// assume collect() finishes after measure, and tries to merge
// an exp histogram with math.MaxFloat64 into an exp histogram
// with math.SmallestNonzeroFloat64. This will underflow, but we
// can't remove the underflowed measurement after it has been
// aggregated.

This is an extremely rare case: Underflow is only possible with maxSize <= 2, and when making measurements where one is 2^1024 times greater than the other.

Some options i've come up with to deal with it:

Fix it properly, but with significant complexity:
1. Add a separate tracker that uses three atomic bits to track which scale -10 buckets have been seen to drop the measurement that underflows. This will probably have a small performance cost as well.
2. "Pre-scale" buckets before swapping to make underflow impossible when merging cold back into hot. This is quite a bit more complex.
Best-effort removal of underflowed measurements during the merge process:
1. Remove the underflowed bucket counts, and lower the overall count by the same amount. Lower the sum proportional to the count to keep the average the same.
2. Put the smallest underflowed measurements into the zero bucket. Raise the zero threshold, and move the smallest underflowed bucket counts to the zero count. In the worst case, the zero_threshold would be raised to 1.0, but the remaining range would fit into a single scale -10 bucket.

I'm planning to implement the proper fix (option 1.i), but I wanted to document this in-case it comes up later. Option 2.i is also appealing given how extremely rare this should be in-practice.

Some small testing improvements forked from #8077. This also fixes a flake where the order in which sums are added can change the resulting sum. Use assertSumEqual to handle this similar to other places in the test. Co-authored-by: Tyler Yahn <MrAlias@users.noreply.github.com>

…DataPoint

…dd wrappers

…entation

…nf handling

dashpole added the Skip Changelog PRs that do not require a CHANGELOG.md entry label Mar 19, 2026

dashpole force-pushed the exphist_syncmap branch 2 times, most recently from 4a958eb to fc3a34b Compare March 20, 2026 16:28

dashpole force-pushed the exphist_syncmap branch from 3054c75 to d5f3996 Compare April 2, 2026 15:30

dashpole mentioned this pull request Apr 2, 2026

Improve test coverage for exponential histogram edge cases #8129

Merged

dashpole force-pushed the exphist_syncmap branch 3 times, most recently from fa32a82 to 906aa6e Compare April 6, 2026 19:59

dashpole added 8 commits April 9, 2026 00:55

feat(metric): Add atomicUnderflowTracker and tests

6d3605d

feat(metric): Add merge capabilities to expoBuckets and expoHistogram…

410ad1d

…DataPoint

refactor(metric): Split newExponentialHistogram in aggregate.go and a…

1a3f627

…dd wrappers

feat(metric): Implement deltaExpoHistogram with double buffering

63db18a

feat(metric): Implement cumulativeExpoHistogram and remove old implem…

0decf04

…entation

test(metric): Add tests for exponential histogram underflow and NaN/I…

49ae15d

…nf handling

lint

48034f5

increase sum delta

5ad698a

dashpole force-pushed the exphist_syncmap branch from 906aa6e to 5ad698a Compare April 9, 2026 01:13

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use sync.Map for exponential histogram aggregations#8077

Use sync.Map for exponential histogram aggregations#8077
dashpole wants to merge 8 commits intoopen-telemetry:mainfrom
dashpole:exphist_syncmap

dashpole commented Mar 19, 2026 •

edited

Loading

Uh oh!

codecov bot commented Mar 19, 2026 •

edited

Loading

Uh oh!

dashpole commented Apr 3, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

dashpole commented Mar 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Changes

Uh oh!

codecov bot commented Mar 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

dashpole commented Apr 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

dashpole commented Mar 19, 2026 •

edited

Loading

codecov bot commented Mar 19, 2026 •

edited

Loading

dashpole commented Apr 3, 2026 •

edited

Loading