[Metrics] Resolve accumulated TODOs in SDK core#7277
Conversation
Wrap ExemplarReservoir.Offer in try/catch so a buggy custom reservoir can't silently crash the measurement path. Surfaces the error as a new Warning event (open-telemetry#57). Log a Verbose diagnostic in MeasurementsCompleted when the state object isn't a MetricState, consistent with every other callback in the file. Log unknown instrument types that fall through to the default case in MonotonicDeltaTemporalityPreferenceFunc. Call TrimExcess() after releasing the creation lock in AddMetricWithViews to reclaim wasted list capacity
There was a problem hiding this comment.
Pull request overview
Resolves several long-standing TODOs in the metrics SDK core by improving exception safety around exemplars, adding missing verbose diagnostics, and reducing unnecessary allocations from over-provisioned metric lists.
Changes:
- Wraps user-supplied
ExemplarReservoir.Offercalls with exception handling and introduces a new Warning-levelEventSourceevent for failures. - Adds verbose diagnostics for unexpected state in
MeasurementsCompletedand for unrecognized instrument types in temporality preference selection. - Calls
TrimExcess()on the metrics list created inAddMetricWithViewsafter releasing the lock to reclaim unused capacity.
Reviewed changes
Copilot reviewed 5 out of 5 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
| src/OpenTelemetry/Metrics/Reader/MetricReaderExt.cs | Trims preallocated metrics list capacity after view processing completes. |
| src/OpenTelemetry/Metrics/Reader/MetricReader.cs | Adds verbose log when temporality preference sees an unexpected instrument type. |
| src/OpenTelemetry/Metrics/MetricPoint/MetricPoint.cs | Adds exception-safe exemplar offering via helper methods. |
| src/OpenTelemetry/Metrics/MeterProviderSdk.cs | Adds verbose log when MeasurementsCompleted receives an unexpected state type. |
| src/OpenTelemetry/Internal/OpenTelemetrySdkEventSource.cs | Adds new EventId 57 warning event for exemplar reservoir offer exceptions. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #7277 +/- ##
==========================================
- Coverage 90.03% 90.03% -0.01%
==========================================
Files 276 276
Lines 14218 14244 +26
==========================================
+ Hits 12801 12824 +23
- Misses 1417 1420 +3
Flags with carried forward coverage won't be shown. Click here to find out more.
|
martincostello
left a comment
There was a problem hiding this comment.
Also please review the Copilot review comments.
|
copilot feedback Code Review — PR #7277 [Metrics] Resolve accumulated TODOs in SDK core Consensus findings (both Opus 4.7 + Opus 4.6 agreed) — real issues
Where the models disagreed — I verified by reading the code
Unique to Opus 4.7
Plus useful verifications: event id 57 is unique; [MethodImpl(NoInlining)] is present (not relying on JIT); MonotonicDeltaTemporalityPreferenceFunc outer lambda remains static with no closure allocation; OpenTelemetrySdkEventSource is internal (no .publicApi/ update needed); the PR does include unit tests with ThrowingExemplarReservoir (correcting a wrong premise in my prompt). Model comparison notes
Recommended actions for the PR
|
…and logging - Drop ExemplarReservoirException event from Warning to Verbose — a throwing reservoir fires on every measurement, making Warning level impractical on hot paths where exemplar capture is best-effort anyway - Remove TrimExcess() from AddMetricWithViews — the caller immediately calls .ToArray() on the returned list, so the trim just allocates a backing array that is discarded on the next line - Align MeasurementsCompleted error logging with its siblings MeasurementRecordedLong/Double, which use MeasurementDropped at Warning for the same state mismatch condition - Add CHANGELOG entry for the exception-safety behaviour introduced by OfferExemplarSafe, which is a user-observable contract change
|
Thanks for the thorough review @rajkumar-rangaraj ExemplarReservoirException log level TrimExcess() MeasurementsCompleted severity CHANGELOG I'm really excited to see this merged and looking forward to contributing more to the project in the future. |
|
/easycla |
|
Hi @rajkumar-rangaraj |
….0 (#1201) Updated [OpenTelemetry.Exporter.OpenTelemetryProtocol](https://github.com/open-telemetry/opentelemetry-dotnet) from 1.15.3 to 1.16.0. <details> <summary>Release notes</summary> _Sourced from [OpenTelemetry.Exporter.OpenTelemetryProtocol's releases](https://github.com/open-telemetry/opentelemetry-dotnet/releases)._ ## 1.16.0 For highlights and announcements pertaining to this release see: [Release Notes > 1.16.0](https://github.com/open-telemetry/opentelemetry-dotnet/blob/main/RELEASENOTES.md#1160). The following changes are from the previous release [1.16.0-rc.1](https://github.com/open-telemetry/opentelemetry-dotnet/releases/tag/core-1.16.0-rc.1). * NuGet: [OpenTelemetry v1.16.0](https://www.nuget.org/packages/OpenTelemetry/1.16.0) No notable changes. See [CHANGELOG](https://github.com/open-telemetry/opentelemetry-dotnet/blob/core-1.16.0/src/OpenTelemetry/CHANGELOG.md) for details. * NuGet: [OpenTelemetry.Api v1.16.0](https://www.nuget.org/packages/OpenTelemetry.Api/1.16.0) No notable changes. See [CHANGELOG](https://github.com/open-telemetry/opentelemetry-dotnet/blob/core-1.16.0/src/OpenTelemetry.Api/CHANGELOG.md) for details. * NuGet: [OpenTelemetry.Api.ProviderBuilderExtensions v1.16.0](https://www.nuget.org/packages/OpenTelemetry.Api.ProviderBuilderExtensions/1.16.0) No notable changes. See [CHANGELOG](https://github.com/open-telemetry/opentelemetry-dotnet/blob/core-1.16.0/src/OpenTelemetry.Api.ProviderBuilderExtensions/CHANGELOG.md) for details. * NuGet: [OpenTelemetry.Exporter.Console v1.16.0](https://www.nuget.org/packages/OpenTelemetry.Exporter.Console/1.16.0) No notable changes. See [CHANGELOG](https://github.com/open-telemetry/opentelemetry-dotnet/blob/core-1.16.0/src/OpenTelemetry.Exporter.Console/CHANGELOG.md) for details. * NuGet: [OpenTelemetry.Exporter.InMemory v1.16.0](https://www.nuget.org/packages/OpenTelemetry.Exporter.InMemory/1.16.0) No notable changes. See [CHANGELOG](https://github.com/open-telemetry/opentelemetry-dotnet/blob/core-1.16.0/src/OpenTelemetry.Exporter.InMemory/CHANGELOG.md) for details. * NuGet: [OpenTelemetry.Exporter.OpenTelemetryProtocol v1.16.0](https://www.nuget.org/packages/OpenTelemetry.Exporter.OpenTelemetryProtocol/1.16.0) No notable changes. See [CHANGELOG](https://github.com/open-telemetry/opentelemetry-dotnet/blob/core-1.16.0/src/OpenTelemetry.Exporter.OpenTelemetryProtocol/CHANGELOG.md) for details. * NuGet: [OpenTelemetry.Exporter.Zipkin v1.16.0](https://www.nuget.org/packages/OpenTelemetry.Exporter.Zipkin/1.16.0) No notable changes. See [CHANGELOG](https://github.com/open-telemetry/opentelemetry-dotnet/blob/core-1.16.0/src/OpenTelemetry.Exporter.Zipkin/CHANGELOG.md) for details. * NuGet: [OpenTelemetry.Extensions.Hosting v1.16.0](https://www.nuget.org/packages/OpenTelemetry.Extensions.Hosting/1.16.0) No notable changes. ... (truncated) ## 1.16.0-rc.1 The following changes are from the previous release [1.15.3](https://github.com/open-telemetry/opentelemetry-dotnet/releases/tag/core-1.15.3). * NuGet: [OpenTelemetry v1.16.0-rc.1](https://www.nuget.org/packages/OpenTelemetry/1.16.0-rc.1) * Stop validating View-provided metric stream `Name` against the instrument name syntax, per [spec clarification](open-telemetry/opentelemetry-specification#5094). ([#7300](open-telemetry/opentelemetry-dotnet#7300)) * Fix incorrect validation of `OTEL_BSP_*` and `OTEL_BLRP_*` environment variables. ([#7187](open-telemetry/opentelemetry-dotnet#7187)) * Fix observable instrument callbacks running once per reader instead of once per collection cycle. ([#7188](open-telemetry/opentelemetry-dotnet#7188)) * Added exception safety for user-supplied `ExemplarReservoir` implementations. Exceptions thrown from `Offer` are now caught and logged rather than propagating out of `Counter.Add`/`Histogram.Record`. ([#7277](open-telemetry/opentelemetry-dotnet#7277)) * Update `OpenTelemetrySdkEventSource` to support the W3C randomness flag. ([#7301](open-telemetry/opentelemetry-dotnet#7301)) * Added `ObservedTimestamp` property to `LogRecord`. ([#6979](open-telemetry/opentelemetry-dotnet#6979)) * **Breaking Change** Explicit histogram boundaries no longer allow more than 10 million values. ([#7165](open-telemetry/opentelemetry-dotnet#7165)) * Fixed a circular reference which could cause a `LoggerProvider` to fail to resolve when one of its dependencies depends on `ILogger` or `ILoggerFactory`. As part of this fix the `LoggerProvider` resolved from dependency injection is now created lazily when the first logger is created rather than when `ILoggerProvider` or `ILoggerFactory` is resolved. A consequence is that any invalid configuration now surfaces when the first log record is written instead of when the logging services are resolved. ([#7308](open-telemetry/opentelemetry-dotnet#7308)) See [CHANGELOG](https://github.com/open-telemetry/opentelemetry-dotnet/blob/core-1.16.0-rc.1/src/OpenTelemetry/CHANGELOG.md) for details. * NuGet: [OpenTelemetry.Api v1.16.0-rc.1](https://www.nuget.org/packages/OpenTelemetry.Api/1.16.0-rc.1) * **Experimental (pre-release builds only):** Add support for using environment variables as context propagation carriers. ([#7174](open-telemetry/opentelemetry-dotnet#7174)) * Fix `BaggagePropagator` to correctly follow Key and Value Encoding rules as per ... (truncated) ## 1.16.0-beta.1 The following changes are from the previous release [1.15.3-beta.1](https://github.com/open-telemetry/opentelemetry-dotnet/releases/tag/coreunstable-1.15.3-beta.1). * NuGet: [OpenTelemetry.Exporter.Prometheus.AspNetCore v1.16.0-beta.1](https://www.nuget.org/packages/OpenTelemetry.Exporter.Prometheus.AspNetCore/1.16.0-beta.1) * Fixed scrape response cache freshness using monotonic time so it is not affected by NTP system clock adjustments. ([#7253](open-telemetry/opentelemetry-dotnet#7253)) * **Breaking Change** Removed `DisableTimestamp` property from `PrometheusAspNetCoreOptions`. ([#7176](open-telemetry/opentelemetry-dotnet#7176)) * Fixed the serialization of `NaN`, `PositiveInfinity`, and `NegativeInfinity` values in Prometheus metrics to be compliant with the specification. ([#7179](open-telemetry/opentelemetry-dotnet#7179)) * Fixed loss of precision when serializing `double` and `float` values in Prometheus metrics to be compliant with the specification by using 17 significant digits to represent such values. ([#7179](open-telemetry/opentelemetry-dotnet#7179)) * Fix non-ASCII characters in metric names and unit strings not being sanitized correctly during Prometheus serialization. ([#7184](open-telemetry/opentelemetry-dotnet#7184)) * Fix case where reader tracking could be reset while readers were still active. ([#7190](open-telemetry/opentelemetry-dotnet#7190)) * Improve `Accept` header handling for format negotiation so OpenMetrics is selected correctly by considering whitespace and `q` weights. ([#7208](open-telemetry/opentelemetry-dotnet#7208)) * Emit OpenMetrics exemplars for counters and histogram buckets. ([#7222](open-telemetry/opentelemetry-dotnet#7222)) * Fix incorrect handling of untyped metrics when using OpenMetrics format. ([#7219](open-telemetry/opentelemetry-dotnet#7219)) * Fix Prometheus/OpenMetrics serialization to emit metric and label names containing `_` instead of dropping them and prefixing leading digits. Invalid characters are replaced with `_` instead of being dropped. ([#7209](open-telemetry/opentelemetry-dotnet#7209)) * Add `escaping=underscores` to the `Accept` header handling for content negotiation so OpenMetrics are handled correctly. ([#7209](open-telemetry/opentelemetry-dotnet#7209)) * Omit histogram `_sum` and `_count` in OpenMetrics when negative bucket thresholds are present. ([#7221](open-telemetry/opentelemetry-dotnet#7221)) ... (truncated) Commits viewable in [compare view](open-telemetry/opentelemetry-dotnet@core-1.15.3...core-1.16.0). </details> [](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores) Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting `@dependabot rebase`. [//]: # (dependabot-automerge-start) [//]: # (dependabot-automerge-end) --- <details> <summary>Dependabot commands and options</summary> <br /> You can trigger Dependabot actions by commenting on this PR: - `@dependabot rebase` will rebase this PR - `@dependabot recreate` will recreate this PR, overwriting any edits that have been made to it - `@dependabot show <dependency name> ignore conditions` will show all of the ignore conditions of the specified dependency - `@dependabot ignore this major version` will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this minor version` will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this dependency` will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself) </details> Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Hey everyone! 👋 I was going through the codebase and spotted a few TODOs that looked straightforward to fix, so I went ahead and took care of them. Thanks for maintaining such a great project and for everything you do for the open source community 🙏
Changes
Clears four long-standing
TODOcomments in the metrics SDK. Each change is small and self-contained; they are grouped together because they all touch the same area and share the same low risk profile.Exception safety for custom
ExemplarReservoirMetricPoint.UpdateExemplarcalledExemplarReservoir.Offerwith no protection. Any exception from a user-supplied reservoir would propagate up and corrupt the measurement path. The fix wraps the call in a try/catch and emits a newWarning-levelEventSourceevent (#57,ExemplarReservoirException). To preserveAggressiveInliningon the hot path, the guard (if (offerExemplar)) stays inlined inUpdateExemplarwhile the offer + exception handling moves into a non-inlinedOfferExemplarSafehelper.Diagnostic log in
MeasurementsCompletedWhen
MeasurementsCompletedreceives a state that isn't aMetricState, it now logs aVerbose-levelMeterProviderSdkEventbefore returning — consistent withMeasurementRecordedLongandMeasurementRecordedDoublein the same file.Log unknown instrument type in temporality preference
MonotonicDeltaTemporalityPreferenceFunchad a silent_ => AggregationTemporality.Deltadefault. It is now a block lambda that emits aVerboselog when an unrecognized instrument type hits that arm. Return behavior is unchanged.TrimExcess()afterAddMetricWithViewsThe
metricslist is pre-allocated to the maximum view count, but views can be skipped for several reasons (invalid name, existing metric,Dropconfig, limit exceeded).TrimExcess()is now called after the lock is released to reclaim unused capacity without extending lock hold time.Merge requirement checklist
CHANGELOG.mdfiles updated for non-trivial changes