Optimize histogram reservoir#7443
Conversation
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #7443 +/- ##
=======================================
- Coverage 86.2% 86.2% -0.1%
=======================================
Files 302 302
Lines 21973 21971 -2
=======================================
- Hits 18949 18947 -2
Misses 2643 2643
Partials 381 381
🚀 New features to boost your workflow:
|
512b67e to
a7d395d
Compare
864211f to
6497b59
Compare
7457c73 to
7c1476f
Compare
7c1476f to
7b79e43
Compare
Forked from this discussion here: #7443 (comment) It seems like a good idea for us as a group to align on and document what we are comfortable with in terms of how ordered measurements are reflected in collected metric data. --------- Co-authored-by: Tyler Yahn <MrAlias@users.noreply.github.com>
|
On further reflection, I fixed the copying issue before running the benchmark, so it is perhaps reasonable that less racy code runs slower. Would be good if the tests and/or linter detected the issue. I note that |
433ff16 to
e4dfbac
Compare
|
I also see slightly worse results, but agree it is definitely better to be correct. I'll work on a test. |
e4dfbac to
67df837
Compare
|
I added a ConcurrentSafe test, and verified that it fails (quite spectacularly) with the previous atomic.Value implementation. |
|
The concurrent safe test found another race condition around my usage of sync.Pool, which i'm looking into |
597d23c to
81231b8
Compare
2c82611 to
5e17e43
Compare
MrAlias
left a comment
There was a problem hiding this comment.
Overall, looks good to me. Just testing cleanup.
03e0957 to
e3936fe
Compare
Co-authored-by: Tyler Yahn <MrAlias@users.noreply.github.com>
e3936fe to
4415931
Compare
|
I put the Measure benchmarks in the description. Not as good of an improvement as I had expected, so i'll have to dig into that more... |
|
@dmathieu if you have the chance to review, this is related to other optimization PRs. |
|
I figured out why the benchmark results were so poor: The benchmark was recording all observations in a single bucket, and each bucket has its own lock, so there were effectively no parallelism gains. I switched the benchmark to record observations in different buckets, and that shows that this is a ~70% performance improvement when exemplars are being recorded. |
~Depends on #7441, #7443~ This improves the concurrent performance of the fixed size reservoir's Offer function by 4x (i.e. 75% reduction). This improves the performance of Measure() for fixed-size reservoirs by 60% overall. Accomplish this by: * using a single atomic for count and next. This assumes that both can fit in a uint32. * only use a lock to guard changing `w` and `next` together. Offer benchmarks: ``` │ main.txt │ fixedsize.txt │ │ sec/op │ sec/op vs base │ FixedSizeReservoirOffer-24 185.25n ± 4% 45.58n ± 1% -75.40% (p=0.002 n=6) ``` Measure benchmarks: ``` │ main.txt │ fixedsize.txt │ │ sec/op │ sec/op vs base │ SyncMeasure/NoView/ExemplarsEnabled/Int64Counter/Attributes/0-24 175.45n ± 6% 67.01n ± 9% -61.81% (p=0.002 n=6) SyncMeasure/NoView/ExemplarsEnabled/Int64Counter/Attributes/1-24 170.25n ± 1% 69.82n ± 6% -58.99% (p=0.002 n=6) SyncMeasure/NoView/ExemplarsEnabled/Int64Counter/Attributes/10-24 167.40n ± 2% 64.52n ± 10% -61.46% (p=0.002 n=6) SyncMeasure/NoView/ExemplarsEnabled/Float64Counter/Attributes/0-24 173.55n ± 0% 69.17n ± 12% -60.14% (p=0.002 n=6) SyncMeasure/NoView/ExemplarsEnabled/Float64Counter/Attributes/1-24 169.50n ± 1% 68.55n ± 5% -59.56% (p=0.002 n=6) SyncMeasure/NoView/ExemplarsEnabled/Float64Counter/Attributes/10-24 166.95n ± 1% 65.82n ± 6% -60.58% (p=0.002 n=6) SyncMeasure/NoView/ExemplarsEnabled/Int64UpDownCounter/Attributes/0-24 168.85n ± 1% 67.99n ± 11% -59.73% (p=0.002 n=6) SyncMeasure/NoView/ExemplarsEnabled/Int64UpDownCounter/Attributes/1-24 173.50n ± 1% 66.69n ± 2% -61.56% (p=0.002 n=6) SyncMeasure/NoView/ExemplarsEnabled/Int64UpDownCounter/Attributes/10-24 171.30n ± 5% 67.73n ± 8% -60.46% (p=0.002 n=6) SyncMeasure/NoView/ExemplarsEnabled/Float64UpDownCounter/Attributes/0-24 168.90n ± 2% 67.69n ± 9% -59.92% (p=0.002 n=6) SyncMeasure/NoView/ExemplarsEnabled/Float64UpDownCounter/Attributes/1-24 173.35n ± 2% 68.25n ± 9% -60.63% (p=0.002 n=6) SyncMeasure/NoView/ExemplarsEnabled/Float64UpDownCounter/Attributes/10-24 172.95n ± 2% 70.90n ± 7% -59.01% (p=0.002 n=6) geomean 171.0n 67.83n -60.33% ``` --------- Co-authored-by: Tyler Yahn <MrAlias@users.noreply.github.com> Co-authored-by: Robert Pająk <pellared@hotmail.com>
### Added - Add `Enabled` method to all synchronous instrument interfaces (`Float64Counter`, `Float64UpDownCounter`, `Float64Histogram`, `Float64Gauge`, `Int64Counter`, `Int64UpDownCounter`, `Int64Histogram`, `Int64Gauge`,) in `go.opentelemetry.io/otel/metric`. This stabilizes the synchronous instrument enabled feature, allowing users to check if an instrument will process measurements before performing computationally expensive operations. (#7763) - Add `AlwaysRecord` sampler in `go.opentelemetry.io/otel/sdk/trace`. (#7724) - Add `go.opentelemetry.io/otel/semconv/v1.39.0` package. The package contains semantic conventions from the `v1.39.0` version of the OpenTelemetry Semantic Conventions. See the [migration documentation](https://github.com/open-telemetry/opentelemetry-go/blob/298cbedf256b7a9ab3c21e41fc5e3e6d6e4e94aa/semconv/v1.39.0/MIGRATION.md) for information on how to upgrade from `go.opentelemetry.io/otel/semconv/v1.38.0.` (#7783, #7789) ### Changed - `Exporter` in `go.opentelemetry.io/otel/exporter/prometheus` ignores metrics with the scope `go.opentelemetry.io/contrib/bridges/prometheus`. This prevents scrape failures when the Prometheus exporter is misconfigured to get data from the Prometheus bridge. (#7688) - Improve performance of concurrent histogram measurements in `go.opentelemetry.io/otel/sdk/metric`. (#7474) - Add experimental observability metrics in `go.opentelemetry.io/otel/exporters/stdout/stdoutmetric`. (#7492) - Improve the concurrent performance of `HistogramReservoir` in `go.opentelemetry.io/otel/sdk/metric/exemplar` by 4x. (#7443) - Improve performance of concurrent synchronous gauge measurements in `go.opentelemetry.io/otel/sdk/metric`. (#7478) - Improve performance of concurrent exponential histogram measurements in `go.opentelemetry.io/otel/sdk/metric`. (#7702) - Improve the concurrent performance of `FixedSizeReservoir` in `go.opentelemetry.io/otel/sdk/metric/exemplar`. (#7447) - The `rpc.grpc.status_code` attribute in the experimental metrics emitted from `go.opentelemetry.io/otel/exporters/otlp/otlptrace/otlptracegrpc` is replaced with the `rpc.response.status_code` attribute to align with the semantic conventions. (#7854) - The `rpc.grpc.status_code` attribute in the experimental metrics emitted from `go.opentelemetry.io/otel/exporters/otlp/otlplog/otlploggrpc` is replaced with the `rpc.response.status_code` attribute to align with the semantic conventions. (#7854) ### Fixed - Fix bad log message when key-value pairs are dropped because of key duplication in `go.opentelemetry.io/otel/sdk/log`. (#7662) - Fix `DroppedAttributes` on `Record` in `go.opentelemetry.io/otel/sdk/log` to not count the non-attribute key-value pairs dropped because of key duplication. (#7662) - Fix `SetAttributes` on `Record` in `go.opentelemetry.io/otel/sdk/log` to not log that attributes are dropped when they are actually not dropped. (#7662) - `WithHostID` detector in `go.opentelemetry.io/otel/sdk/resource` to use full path for `ioreg` command on Darwin (macOS). (#7818) - Fix missing `request.GetBody` in `go.opentelemetry.io/otel/exporters/otlp/otlptrace/otlptracehttp` to correctly handle HTTP2 GOAWAY frame. (#7794) ### Deprecated - Deprecate `go.opentelemetry.io/otel/exporters/zipkin`. For more information, see the [OTel blog post deprecating the Zipkin exporter](https://opentelemetry.io/blog/2025/deprecating-zipkin-exporters/). (#7670) --------- Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
This improves the concurrent performance of the histogram reservoir's Offer function by 4x (i.e. 75% reduction).
Accomplish this by locking each measurement, rather than locking around the entire storage. Also, defer extracting the trace context from context.Context until collection time. This improves the performance of Offer, which is on the measure hot path. Exemplars are often overwritten, so deferring the operation until Collect reduces the overall work.
Benchmarks for Measure:
I explored using a []atomic.Pointer[measurement], but this had similar performance while being much more complex (needing a sync.Pool to eliminate allocations). The single-threaded performance was also much worse for that solution. See main...dashpole:optimize_histogram_reservoir_old.