[OpenTelemetry] Improve HistogramExplicitBounds#7165
Conversation
Apply recommendations from open-telemetry#3428: - Use SIMD vectorization to find buckets. - Use radix sort. - Apply cap to histogram explicit bounds (10M). Written primarily by Copilot through an iterative approach.
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #7165 +/- ##
==========================================
- Coverage 89.82% 89.75% -0.08%
==========================================
Files 276 276
Lines 14738 14822 +84
==========================================
+ Hits 13238 13303 +65
- Misses 1500 1519 +19
Flags with carried forward coverage won't be shown. Click here to find out more.
|
Add missing patch coverage.
Fix CS8604 warning.
This comment was marked as outdated.
This comment was marked as outdated.
Add note for histogram boundary upper limit.
There was a problem hiding this comment.
Pull request overview
Improves histogram explicit-bound bucket lookup performance in the metrics pipeline, and adds a hard cap on the number of explicit boundaries to prevent pathological allocations/processing.
Changes:
- Replaces the previous large-bound lookup strategy with a radix-partitioned search + SIMD-accelerated linear scan fallback.
- Adds a 10,000,000 max explicit boundary count check for both view configuration and instrument advice.
- Adds unit tests, fuzz/property tests, and benchmarks; updates changelog and solution/project wiring.
Reviewed changes
Copilot reviewed 11 out of 11 changed files in this pull request and generated 1 comment.
Show a summary per file
| File | Description |
|---|---|
| test/OpenTelemetry.Tests/Metrics/MetricViewTests.cs | Adds unit test validating boundary-count limit exception details. |
| test/OpenTelemetry.Tests/Metrics/AggregatorTests.cs | Adds coverage for large-bound lookup edge cases (mixed signs, infinities, NaN, degenerate NaN bounds) and display-bounds cleanup. |
| test/OpenTelemetry.FuzzTests/OpenTelemetry.FuzzTests.csproj | New fuzz test project for OpenTelemetry core metrics. |
| test/OpenTelemetry.FuzzTests/Metrics/HistogramExplicitBoundsFuzzTests.cs | Adds FsCheck property tests comparing optimized path vs scalar baseline. |
| test/Benchmarks/Metrics/HistogramExplicitBoundsBenchmarks.cs | Adds BenchmarkDotNet benchmarks for bucket lookup across sizes/layouts. |
| src/OpenTelemetry/OpenTelemetry.csproj | Grants InternalsVisibleTo for the new OpenTelemetry.FuzzTests project. |
| src/OpenTelemetry/Metrics/View/ExplicitBucketHistogramConfiguration.cs | Introduces MaxBoundaryCount and enforces it when setting Boundaries. |
| src/OpenTelemetry/Metrics/MetricStreamIdentity.cs | Enforces MaxBoundaryCount for instrument advice-provided boundaries. |
| src/OpenTelemetry/Metrics/HistogramExplicitBounds.cs | Implements radix-partitioned + SIMD-assisted bucket index lookup and infinity cleanup. |
| src/OpenTelemetry/CHANGELOG.md | Documents the explicit-boundary cap as a breaking change. |
| OpenTelemetry.slnx | Adds the new fuzz test project to the solution. |
Comments suppressed due to low confidence (1)
src/OpenTelemetry/Metrics/View/ExplicitBucketHistogramConfiguration.cs:74
IsSortedAndDistinctcurrently treatsdouble.NaNas valid because all comparisons with NaN return false, so arrays containing NaN can pass validation even though they are not meaningfully “ascending order with distinct values”. Consider explicitly rejecting NaN (and possibly non-finite values) in this check soBoundariesvalidation matches the documented requirements and downstream bucketing assumptions.
for (var i = 1; i < values.Length; i++)
{
if (values[i] <= values[i - 1])
{
return false;
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Fix incorrect ArgumentException parameter name.
|
@cijothomas, could you please check this PR? Especially adding boundaries? |
There was a problem hiding this comment.
Some direct feedback while reviewing by Codex, in general LGTM.
It will be great to see Cijo final review, but I will not block this.
@martincostello, I think that you need to solve problems with MemoryMarshal, your preious changes are working ;)
Remove `MemoryMarshal` usage after merging with main to remove unsafe code.
|
Need to re-run the benchmarks with the latest changes. |
Benchmark comparison:
|
| Scenario (BoundCount / Layout) | Method | Baseline | Candidate | Δ |
|---|---|---|---|---|
| 49 / MixedSigned | LookupExactBoundary | 10.70 ns | 5.14 ns | −52% |
| 49 / MixedSigned | LookupInRangeValue | 10.89 ns | 5.03 ns | −54% |
| 49 / MixedSigned | LookupWithInfiniteBounds | 10.45 ns | 5.12 ns | −51% |
| 49 / MixedSigned | LookupPositiveInfinity | 20.06 ns | 0.54 ns | −97% |
| 49 / MixedSigned | LookupNaN | 15.67 ns | 0.05 ns | −100% |
| 49 / PositiveOnly | LookupExactBoundary | 10.41 ns | 4.98 ns | −52% |
| 49 / PositiveOnly | LookupInRangeValue | 10.34 ns | 4.85 ns | −53% |
| 49 / PositiveOnly | LookupPositiveInfinity | 21.02 ns | 0.46 ns | −98% |
| 49 / PositiveOnly | LookupNaN | 15.45 ns | ~0 ns | −100% |
| 10 / MixedSigned | LookupPositiveInfinity | 7.84 ns | 0.64 ns | −92% |
| 10 / MixedSigned | LookupNaN | 2.87 ns | 0.02 ns | −99% |
| 10 / MixedSigned | LookupExactBoundary | 2.47 ns | 2.18 ns | −12% |
| 10 / MixedSigned | LookupWithInfiniteBounds | 2.50 ns | 2.26 ns | −10% |
| 10 / PositiveOnly | LookupPositiveInfinity | 4.48 ns | 0.50 ns | −89% |
| 10 / PositiveOnly | LookupNaN | 3.04 ns | ~0 ns | −100% |
| 10 / PositiveOnly | LookupExactBoundary | 2.48 ns | 2.61 ns | +5% (noise) |
| 10 / PositiveOnly | LookupInRangeValue | 2.57 ns | 2.41 ns | −6% |
The ±Infinity improvement also holds for the 50 and 1000 bound counts (e.g. LookupNegativeInfinity at 1000/MixedSigned: 3.29 ns → 0.18 ns; LookupPositiveInfinity: 4.13 ns → 0.48 ns) — these baseline cells were not dead-code-eliminated, so they're trustworthy.
Conclusion
A clear performance win with no allocation cost: infinity/NaN edge-case lookups are now effectively free, and the common finite-value lookups are ~50% faster once the bound array is large enough to matter. No genuine regressions were found — the only "slower" numbers come from baseline cells the JIT had optimized away, where the candidate is simply being measured doing real work.
Note: the .NET Framework 4.6.2 rows show the same trend and the same dead-code-elimination artifacts at BoundCount 50/1000.
….0 (#1201) Updated [OpenTelemetry.Exporter.OpenTelemetryProtocol](https://github.com/open-telemetry/opentelemetry-dotnet) from 1.15.3 to 1.16.0. <details> <summary>Release notes</summary> _Sourced from [OpenTelemetry.Exporter.OpenTelemetryProtocol's releases](https://github.com/open-telemetry/opentelemetry-dotnet/releases)._ ## 1.16.0 For highlights and announcements pertaining to this release see: [Release Notes > 1.16.0](https://github.com/open-telemetry/opentelemetry-dotnet/blob/main/RELEASENOTES.md#1160). The following changes are from the previous release [1.16.0-rc.1](https://github.com/open-telemetry/opentelemetry-dotnet/releases/tag/core-1.16.0-rc.1). * NuGet: [OpenTelemetry v1.16.0](https://www.nuget.org/packages/OpenTelemetry/1.16.0) No notable changes. See [CHANGELOG](https://github.com/open-telemetry/opentelemetry-dotnet/blob/core-1.16.0/src/OpenTelemetry/CHANGELOG.md) for details. * NuGet: [OpenTelemetry.Api v1.16.0](https://www.nuget.org/packages/OpenTelemetry.Api/1.16.0) No notable changes. See [CHANGELOG](https://github.com/open-telemetry/opentelemetry-dotnet/blob/core-1.16.0/src/OpenTelemetry.Api/CHANGELOG.md) for details. * NuGet: [OpenTelemetry.Api.ProviderBuilderExtensions v1.16.0](https://www.nuget.org/packages/OpenTelemetry.Api.ProviderBuilderExtensions/1.16.0) No notable changes. See [CHANGELOG](https://github.com/open-telemetry/opentelemetry-dotnet/blob/core-1.16.0/src/OpenTelemetry.Api.ProviderBuilderExtensions/CHANGELOG.md) for details. * NuGet: [OpenTelemetry.Exporter.Console v1.16.0](https://www.nuget.org/packages/OpenTelemetry.Exporter.Console/1.16.0) No notable changes. See [CHANGELOG](https://github.com/open-telemetry/opentelemetry-dotnet/blob/core-1.16.0/src/OpenTelemetry.Exporter.Console/CHANGELOG.md) for details. * NuGet: [OpenTelemetry.Exporter.InMemory v1.16.0](https://www.nuget.org/packages/OpenTelemetry.Exporter.InMemory/1.16.0) No notable changes. See [CHANGELOG](https://github.com/open-telemetry/opentelemetry-dotnet/blob/core-1.16.0/src/OpenTelemetry.Exporter.InMemory/CHANGELOG.md) for details. * NuGet: [OpenTelemetry.Exporter.OpenTelemetryProtocol v1.16.0](https://www.nuget.org/packages/OpenTelemetry.Exporter.OpenTelemetryProtocol/1.16.0) No notable changes. See [CHANGELOG](https://github.com/open-telemetry/opentelemetry-dotnet/blob/core-1.16.0/src/OpenTelemetry.Exporter.OpenTelemetryProtocol/CHANGELOG.md) for details. * NuGet: [OpenTelemetry.Exporter.Zipkin v1.16.0](https://www.nuget.org/packages/OpenTelemetry.Exporter.Zipkin/1.16.0) No notable changes. See [CHANGELOG](https://github.com/open-telemetry/opentelemetry-dotnet/blob/core-1.16.0/src/OpenTelemetry.Exporter.Zipkin/CHANGELOG.md) for details. * NuGet: [OpenTelemetry.Extensions.Hosting v1.16.0](https://www.nuget.org/packages/OpenTelemetry.Extensions.Hosting/1.16.0) No notable changes. ... (truncated) ## 1.16.0-rc.1 The following changes are from the previous release [1.15.3](https://github.com/open-telemetry/opentelemetry-dotnet/releases/tag/core-1.15.3). * NuGet: [OpenTelemetry v1.16.0-rc.1](https://www.nuget.org/packages/OpenTelemetry/1.16.0-rc.1) * Stop validating View-provided metric stream `Name` against the instrument name syntax, per [spec clarification](open-telemetry/opentelemetry-specification#5094). ([#7300](open-telemetry/opentelemetry-dotnet#7300)) * Fix incorrect validation of `OTEL_BSP_*` and `OTEL_BLRP_*` environment variables. ([#7187](open-telemetry/opentelemetry-dotnet#7187)) * Fix observable instrument callbacks running once per reader instead of once per collection cycle. ([#7188](open-telemetry/opentelemetry-dotnet#7188)) * Added exception safety for user-supplied `ExemplarReservoir` implementations. Exceptions thrown from `Offer` are now caught and logged rather than propagating out of `Counter.Add`/`Histogram.Record`. ([#7277](open-telemetry/opentelemetry-dotnet#7277)) * Update `OpenTelemetrySdkEventSource` to support the W3C randomness flag. ([#7301](open-telemetry/opentelemetry-dotnet#7301)) * Added `ObservedTimestamp` property to `LogRecord`. ([#6979](open-telemetry/opentelemetry-dotnet#6979)) * **Breaking Change** Explicit histogram boundaries no longer allow more than 10 million values. ([#7165](open-telemetry/opentelemetry-dotnet#7165)) * Fixed a circular reference which could cause a `LoggerProvider` to fail to resolve when one of its dependencies depends on `ILogger` or `ILoggerFactory`. As part of this fix the `LoggerProvider` resolved from dependency injection is now created lazily when the first logger is created rather than when `ILoggerProvider` or `ILoggerFactory` is resolved. A consequence is that any invalid configuration now surfaces when the first log record is written instead of when the logging services are resolved. ([#7308](open-telemetry/opentelemetry-dotnet#7308)) See [CHANGELOG](https://github.com/open-telemetry/opentelemetry-dotnet/blob/core-1.16.0-rc.1/src/OpenTelemetry/CHANGELOG.md) for details. * NuGet: [OpenTelemetry.Api v1.16.0-rc.1](https://www.nuget.org/packages/OpenTelemetry.Api/1.16.0-rc.1) * **Experimental (pre-release builds only):** Add support for using environment variables as context propagation carriers. ([#7174](open-telemetry/opentelemetry-dotnet#7174)) * Fix `BaggagePropagator` to correctly follow Key and Value Encoding rules as per ... (truncated) ## 1.16.0-beta.1 The following changes are from the previous release [1.15.3-beta.1](https://github.com/open-telemetry/opentelemetry-dotnet/releases/tag/coreunstable-1.15.3-beta.1). * NuGet: [OpenTelemetry.Exporter.Prometheus.AspNetCore v1.16.0-beta.1](https://www.nuget.org/packages/OpenTelemetry.Exporter.Prometheus.AspNetCore/1.16.0-beta.1) * Fixed scrape response cache freshness using monotonic time so it is not affected by NTP system clock adjustments. ([#7253](open-telemetry/opentelemetry-dotnet#7253)) * **Breaking Change** Removed `DisableTimestamp` property from `PrometheusAspNetCoreOptions`. ([#7176](open-telemetry/opentelemetry-dotnet#7176)) * Fixed the serialization of `NaN`, `PositiveInfinity`, and `NegativeInfinity` values in Prometheus metrics to be compliant with the specification. ([#7179](open-telemetry/opentelemetry-dotnet#7179)) * Fixed loss of precision when serializing `double` and `float` values in Prometheus metrics to be compliant with the specification by using 17 significant digits to represent such values. ([#7179](open-telemetry/opentelemetry-dotnet#7179)) * Fix non-ASCII characters in metric names and unit strings not being sanitized correctly during Prometheus serialization. ([#7184](open-telemetry/opentelemetry-dotnet#7184)) * Fix case where reader tracking could be reset while readers were still active. ([#7190](open-telemetry/opentelemetry-dotnet#7190)) * Improve `Accept` header handling for format negotiation so OpenMetrics is selected correctly by considering whitespace and `q` weights. ([#7208](open-telemetry/opentelemetry-dotnet#7208)) * Emit OpenMetrics exemplars for counters and histogram buckets. ([#7222](open-telemetry/opentelemetry-dotnet#7222)) * Fix incorrect handling of untyped metrics when using OpenMetrics format. ([#7219](open-telemetry/opentelemetry-dotnet#7219)) * Fix Prometheus/OpenMetrics serialization to emit metric and label names containing `_` instead of dropping them and prefixing leading digits. Invalid characters are replaced with `_` instead of being dropped. ([#7209](open-telemetry/opentelemetry-dotnet#7209)) * Add `escaping=underscores` to the `Accept` header handling for content negotiation so OpenMetrics are handled correctly. ([#7209](open-telemetry/opentelemetry-dotnet#7209)) * Omit histogram `_sum` and `_count` in OpenMetrics when negative bucket thresholds are present. ([#7221](open-telemetry/opentelemetry-dotnet#7221)) ... (truncated) Commits viewable in [compare view](open-telemetry/opentelemetry-dotnet@core-1.15.3...core-1.16.0). </details> [](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores) Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting `@dependabot rebase`. [//]: # (dependabot-automerge-start) [//]: # (dependabot-automerge-end) --- <details> <summary>Dependabot commands and options</summary> <br /> You can trigger Dependabot actions by commenting on this PR: - `@dependabot rebase` will rebase this PR - `@dependabot recreate` will recreate this PR, overwriting any edits that have been made to it - `@dependabot show <dependency name> ignore conditions` will show all of the ignore conditions of the specified dependency - `@dependabot ignore this major version` will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this minor version` will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this dependency` will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself) </details> Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Fixes #3428
Changes
Apply recommendations from #3428:
Written primarily by Copilot through an iterative approach.
Final benchmark results are shown below: #7165 (comment)
I haven't benchmarked the ARM64 SIMD branch as I don't have the hardware for that locally.
Merge requirement checklist
CHANGELOG.mdfiles updated for non-trivial changesChanges in public API reviewed (if applicable)