Add finish method to synchronous instruments by atoulme · Pull Request #4702 · open-telemetry/opentelemetry-specification

atoulme · 2025-10-25T07:14:41Z

Changes

Add the ability to finish reporting a synchronous instrument attribute set. The attribute set under that instrument will no longer report.

POCs

open-telemetry/opentelemetry-go#8050
open-telemetry/opentelemetry-java#7792

specification/metrics/api.md

dashpole · 2025-10-27T14:06:05Z

One thing that has been asked for is to be able to delete multiple series at once, since callers often don't have access to the complete list of attribute sets they've previously incremented. E.g. remove(http.target=foo) would remove all streams for that http.target. One way to solve this is by matching all attribute sets which don't have the keys provided. Remove without any arguments would match and remove all streams from the instrument. Unfortunately, it wouldn't be backwards-compatible to change this later, so we need to decide if this is important from the start.

The other big question (which we need to resolve in the SDK spec PR), is how this impacts start time handling for cumulative metrics in the SDK. If an attribute set is deleted and recreated, the resulting metric data must have non-overlapping start-end time ranges since the cumulative value has been (presumably) reset. One way to solve this would be to require per-attribute-set start times in the SDK (#4184).

jack-berg · 2025-10-27T20:41:51Z

A more flexible version of @dashpole's suggestion might look like: remove(Predicate<Attribute> predicate), where the predicate is invoked for each series, with usage in java like:

instrument.remove(attributes -> true); // Remove all series
instrument.remove(attributes -> attribute.get("http.route").equals("/v1/foo/bar")); // Remove all series where http.route=/v1/foo/bar
instrument.remove(attributes -> attributes.get("http.route").startsWith("/v1/foo")); // Remove all series matching pattern http.route=/v1/foo.*

If an attribute set is deleted and recreated, the resulting metric data must have non-overlapping start-end time ranges since the cumulative value has been (presumably) reset. One way to solve this would be to require per-attribute-set start times in the SDK (#4184).

Yeah for example, in java, we always use the same SDK start time for all cumulative series. Whether they see their first data at start time or days later, the start is always the same. I think your suggestion to track per-attribute-set start times in the SDK seems reasonable, but that seems to imply a behavior change from the single constant start time that we currently do java. Are there implications for this? Are there cases where a user with a cumulative backend would be upset to see new series with a start time corresponding to the window where data was first recorded?

jack-berg · 2025-10-27T20:46:14Z

Also, if I call something like instrument.remove(attributes -> true) to delete all series, its not clear whether I intend to record additional data in the future. If I don't intend to record additional data, then I would probably view the fact that the SDK continues to have memory allocated to the instrument as a memory leak. But the SDK can't free up all the resources for that instrument without knowing for sure that I won't record again.

Makes me wonder if we need a top level instrument level close / remove method, as well as the fine grained method for removing specific series.

dashpole · 2025-10-27T20:57:53Z

Are there implications for this? Are there cases where a user with a cumulative backend would be upset to see new series with a start time corresponding to the window where data was first recorded?

I don't think it will impact Prometheus in any negative way (@ArthurSens might know more, since he opened the original issue).

Depending on how strict you want to be, it may make it harder to aggregate timeseries with different start timestamps. If you user the earliest start timestamp, you may be missing data, and not produce an accurate cumulative for the entire time range.

carlosalberto · 2025-10-27T22:25:05Z

cc @jmacd

atoulme · 2025-10-28T06:33:05Z

Makes me wonder if we need a top level instrument level close / remove method, as well as the fine grained method for removing specific series.

This can be added later and separately from this effort, from what I can tell.

atoulme · 2025-10-28T06:40:09Z

If an attribute set is deleted and recreated, the resulting metric data must have non-overlapping start-end time ranges since the cumulative value has been (presumably) reset. One way to solve this would be to require per-attribute-set start times in the SDK (#4184).

My concrete use case is tied to a queue system where we report the count of events seen. See https://github.com/open-telemetry/opentelemetry-java-contrib/blob/main/ibm-mq-metrics/model/metrics.yaml#L216

When we lose contact with the queue manager, we can no longer report this information. If we get back in contact with the queue manager, we must recreate the time series with the new information, resetting the counter and its start time. We typically alert on delta changes, so we want the time series to be separate.

dashpole · 2025-10-28T20:06:07Z

we must recreate the time series with the new information, resetting the counter and its start time. We typically alert on delta changes, so we want the time series to be separate.

How do you reset the start time? All SDKs i'm aware of in OTel today set the cumulative start time at instrument creation time, and never reset it.

ArthurSens · 2025-10-28T20:17:34Z

Are there implications for this? Are there cases where a user with a cumulative backend would be upset to see new series with a start time corresponding to the window where data was first recorded?

Actually, we would be happier to see this :) I've opened #4184 a long time ago, but never found the time to continue working on it. A start time per time series would help us provide more accurate increase rates.

atoulme · 2025-10-29T22:27:58Z

Since this is now sponsored, I am marking this PR ready for review. The discussion continues.

atoulme · 2025-10-29T22:28:36Z

we must recreate the time series with the new information, resetting the counter and its start time. We typically alert on delta changes, so we want the time series to be separate.

How do you reset the start time? All SDKs i'm aware of in OTel today set the cumulative start time at instrument creation time, and never reset it.

Let's put this in as a requirement and try it out in the POCs, see how we fare.

github-actions · 2025-11-06T03:33:23Z

This PR was marked stale due to lack of activity. It will be closed in 7 days.

specification/metrics/api.md

jmacd · 2025-11-12T17:34:18Z

@atoulme

My concrete use case is tied to a queue system where we report the count of events seen. See https://github.com/open-telemetry/opentelemetry-java-contrib/blob/main/ibm-mq-metrics/model/metrics.yaml#L216

When we lose contact with the queue manager, we can no longer report this information. If we get back in contact with the queue manager, we must recreate the time series with the new information, resetting the counter and its start time. We typically alert on delta changes, so we want the time series to be separate.

This issue has been raised and resisted a number of times in OpenTelemetry. I understand there is still an unanswered need, but I do not like the verb remove as the API method name, it's not clearly "removing" anything; it's not de-registering the instrument. We are not trying to erase the memory of the instrument, we're trying to get series out of memory. We need to report final measurements, "seal" the timeseries in some manner, and then forget about the data. If I could choose the verb for this action, it's "finish", it means "flush and forget". I think the idea of passing a predicate to select series for finishing makes sense.

Consumers should receive the correct finalized value of these series. As @ArthurSens points out in #4184, what we need is a specification for how the data should be transmitted so that the ending of a series is clear. In Prometheus, we have the NaN value, and in OTel we have the missing data point flag, but we've never specified how to set that flag. I would like to see a specification that dictates SDKs have to remember the "finishing" series long enough to send the NaN/missing-data-flag to each reader at least once, otherwise that reader would lose information. Then, to answer #4184, we need to specify that new series must be created with a start time >= the Nan/missing-data-flag previously issued for the same series.

github-actions · 2025-11-20T03:31:17Z

This PR was marked stale due to lack of activity. It will be closed in 7 days.

github-actions · 2025-11-27T03:32:31Z

Closed as inactive. Feel free to reopen if this PR is still being worked on.

atoulme · 2025-12-04T22:27:56Z

Can I please ask for a maintainer to reopen the pull request? Thanks.

atoulme · 2025-12-04T22:36:13Z

@atoulme

My concrete use case is tied to a queue system where we report the count of events seen. See https://github.com/open-telemetry/opentelemetry-java-contrib/blob/main/ibm-mq-metrics/model/metrics.yaml#L216

When we lose contact with the queue manager, we can no longer report this information. If we get back in contact with the queue manager, we must recreate the time series with the new information, resetting the counter and its start time. We typically alert on delta changes, so we want the time series to be separate.

This issue has been raised and resisted a number of times in OpenTelemetry. I understand there is still an unanswered need, but I do not like the verb remove as the API method name, it's not clearly "removing" anything; it's not de-registering the instrument. We are not trying to erase the memory of the instrument, we're trying to get series out of memory. We need to report final measurements, "seal" the timeseries in some manner, and then forget about the data. If I could choose the verb for this action, it's "finish", it means "flush and forget". I think the idea of passing a predicate to select series for finishing makes sense.

Sure, finish works.

Consumers should receive the correct finalized value of these series. As @ArthurSens points out in #4184, what we need is a specification for how the data should be transmitted so that the ending of a series is clear. In Prometheus, we have the NaN value, and in OTel we have the missing data point flag, but we've never specified how to set that flag. I would like to see a specification that dictates SDKs have to remember the "finishing" series long enough to send the NaN/missing-data-flag to each reader at least once, otherwise that reader would lose information. Then, to answer #4184, we need to specify that new series must be created with a start time >= the Nan/missing-data-flag previously issued for the same series.

Are you offering to work on that specification? Is it a requirement for this or a follow-up?

github-actions · 2026-01-01T03:48:09Z

This PR was marked stale. It will be closed in 14 days without additional activity.

… histograms

Co-authored-by: Robert Pająk <pellared@hotmail.com>

atoulme · 2026-01-08T23:42:59Z

This was discussed on 1/6/2026 as part of the maintainer sync meeting, I reproduce the meeting notes below:

this PR is blocked on the SDK spec. The staleness marker discussion can be a separate PR.
consider staleness marker from Prometheus, already exist in proto
https://github.com/open-telemetry/opentelemetry-proto/blob/main/opentelemetry/proto/metrics/v1/metrics.proto#L375-L378

dashpole · 2026-02-25T16:34:05Z

The start time PR merged, so this should be unblocked now. I think there will be some updates to the start timestamp spec needed to clarify the interaction between finish and start times.

"All cumulative timeseries MUST use the initial start timestamp in subsequent collection intervals." probably needs to be changed to "All cumulative timeseries MUST use the initial start timestamp in subsequent collection intervals until Finish is called on the timeseries. After Finish is called, the next measurement to the timeseries SHOULD choose a new start time following the above rules for cumulative aggregations.".

atoulme · 2026-03-03T22:21:53Z

Thanks for the update @dashpole . I think we are unblocked but need to update the PR and the POCs. I have asked @MrAlias if he could help with the Go POC, and I'll try to refresh the Java one. It's been a bit and we need help ; the Python POC could be a nice one to update next.

Review the language in the PR based on comments
Reimplement/rebase Go POC
Reimplement/rebase Java POC

charless-splunk · 2026-03-04T15:45:22Z

I also have a use case for being able to cleanly unregister metrics from the client at runtime...

We have multiple autoscaled services that primarily consumes from kafka. As scaling events happen, kafka partitions are reassigned. We scope many of our metrics to kafka partitions where partition is an important metadata tag on the metrics. We track things at a partition level like latest offset committed, records consumed, and some logical time tracking as based on specific messages consumed.

If we are unable to cleanly remove metrics at runtime, then we end up with long lived instances reporting stale values for a partition that was reassigned to another instance while that other instance reports current values for the given partition.

In the case of java applications, we risk a heap leak over time on the lowest order instances that live the longest and through many partition reassignments.

dashpole · 2026-03-04T17:17:34Z

Can you link the PoCs in the description?

reyang · 2026-03-17T01:46:17Z

specification/metrics/api.md

+
+**Status**: [Development](../document-status.md)
+
+Unregister the attribute set. It will no longer be reported.


"It will no longer be reported." - forever or until another API call trying to add the attribute set back?

Registering again will create a new attribute set with a new start time.

dashpole · 2026-03-17T14:16:28Z

specification/metrics/api.md

+
+This API MUST accept the following parameter:
+
+* [Attributes](../common/README.md#attribute) to identify the Instrument.


The instrument isn't identified by the attributes. I would consider "time series" or something similar. You also say this below. This also applies to other Finish function definition.

dashpole · 2026-03-17T14:32:27Z

specification/metrics/sdk.md

@@ -1041,6 +1042,13 @@ Note: If a user makes no configuration changes, `Enabled` returns `true` since b
 default `MeterConfig.enabled=true` and instruments use the default
 aggregation when no matching views match the instrument.



Some things I would like to see specified in the SDK section:

Specify how this interacts with export and storage. Something like "The SDK MUST preserve previously-aggregated metric data for the timeseries until it has been Collected, and then MUST clear all internal storage associated with the timeseries."

Specify how this interacts with timestamps: Something like "When a an aggregation that has been Finished is Collected, its timestamp MUST be the time at which it was Finished". This may require changes elsewhere in the specification.

Specify what happens if a timeseries has been finished, but a new measurement is made to it. This is a bit tricky. Options are:

[Preferred] Un-finish the timeseries. This is a new observation, so rather than stop + start again, we can just continue the existing aggregation.

Move the timestamp back to the current measurement time, but still Finish the timeseries. This might help if there is a race where a measurement is made just after it is Finished.

Keep the existing timeseries finished, but start a new one with a new start timestamp. This might be complex to implement, as they would both have the same attribute set.

[Preferred[ Un-finish the timeseries. This is a new observation, so rather than stop + start again, we can just continue the existing aggregation.

👍

dashpole · 2026-03-17T14:44:34Z

specification/metrics/api.md

+
+This API MUST accept the following parameter:
+
+* [Attributes](../common/README.md#attribute) to identify the Instrument.


I think we should change this to be able to delete one or more timeseries by treating attributes that are not provided to this as matching. There are a few reasons:

I can easily delete all timeseries on an instrument by providing no attributes

As a user, I don't have to keep track of every attribute that i've ever provided to the API. For example, if my HTTP server dynamically creates and deletes routes over time, I would like to be able to Finish all timeseries associated with a particular http.route without needing to enumerate all of the status codes it has ever served.

This would interact much better with filtering. If, for example, I was filtering out all attributes except for http.route on my server metrics, it isn't clear if the SDK should Finish the aggregation with http.route=foo if I call Finish(http.route=foo, http.response.status_code=200, ...). Was my intent to delete the entire route? Was it to only delete the portion that has status code 200? If, instead, users of the API pass attributes where they want all aggregations containing those attributes to be deleted, the previous example would not delete anything. If I instead called Finish(http.route=foo), only then would the entire route be deleted.

Yes. Why not the proposed predicate solution?

I can't find any comments which indicate there are problems / critiques with it.

Ah, sorry. Missed it. I like your approach better, and support allowing users to provide a predicate or filter function.

I think the only downside is possibly performance, but I don't expect that to be a big concern for Finish.

Simply out of bandwidth. I will work on it for java.

Add a feature to use per-series start times to match the spec: https://github.com/open-telemetry/opentelemetry-specification/blob/main/specification/metrics/sdk.md#start-timestamps This is a prerequisite to [finishing / closing](open-telemetry/opentelemetry-specification#4702). Previous prototype: #7719 --------- Co-authored-by: Tyler Yahn <MrAlias@users.noreply.github.com> Co-authored-by: Robert Pająk <pellared@hotmail.com>

MrAlias · 2026-04-09T23:08:19Z

We updated the Go PoC to track the current Finish proposal and used it to explore a few API and SDK behaviors that are not yet fully spelled out in the spec text.

The current Go PoC now includes:

Finish on synchronous instruments with a dedicated FinishOption surface
exact attribute-set finishing, aligned with the current spec text
SDK tombstoning semantics: a finished series is exported one final time and then dropped on the following cycle
correct revival semantics: if a finished series is measured again before collection, the series is revived without dropping previously accumulated data
fresh cumulative start times when a finished series is later recreated
an additional PoC-only matcher-based finish option, WithMatchAttributes(func(attribute.Set) bool), to explore broader-than-exact matching

A few observations from the Go implementation:

Tombstoning appears important. Immediate deletion was too weak in practice. For a Finish operation to be observable and predictable, the SDK likely needs final export, then forget semantics rather than immediate disappearance.
Revival before collection is subtle. A naive tombstone implementation can lose data if a series is finished and then measured again before the next collection. In the PoC, we had to explicitly revive the tombstoned state rather than simply delete the tombstone marker.
Exact-match-only Finish may be too limiting. The current spec text is framed in terms of identifying an attribute set, and the Go PoC supports that directly. We also explored a matcher-based option because users may want to finish broader groups of series, for example:
- all series for a container regardless of other attributes
- all series with a specific attribute value
- a narrower subset by combining exact attributes with a predicate
API shape matters. For Go, a dedicated FinishOption surface worked better than reusing measurement options. Finish is not a measurement operation, and keeping its option space separate made both the public API and SDK implementation cleaner.

Based on the PoC, I think the spec would benefit from being more explicit about:

whether Finish is intended to be exact-match only, or whether broader matching should be supported
what should happen if a finished series is measured again before collection
whether a finished series should still be exported one final time
cumulative start-time behavior when a finished series is later recreated

If useful, I can also summarize the concrete Go API/SDK tradeoffs we hit while implementing the PoC.

atoulme force-pushed the add_remove branch from 32583e9 to 70dab0d Compare October 25, 2025 08:14

dmathieu reviewed Oct 25, 2025

View reviewed changes

specification/metrics/api.md Outdated Show resolved Hide resolved

atoulme force-pushed the add_remove branch 2 times, most recently from cdd66a9 to 646e391 Compare October 26, 2025 03:48

atoulme mentioned this pull request Oct 27, 2025

Allow to unregister/stop/destroy instruments #2232

Open

atoulme marked this pull request as ready for review October 29, 2025 22:27

atoulme requested review from a team as code owners October 29, 2025 22:27

github-actions bot added the Stale label Nov 6, 2025

pellared reviewed Nov 6, 2025

View reviewed changes

pellared removed the Stale label Nov 6, 2025

github-actions bot added the Stale label Nov 20, 2025

github-actions bot closed this Nov 27, 2025

atoulme force-pushed the add_remove branch 2 times, most recently from 472238c to 0cccd60 Compare December 15, 2025 01:21

atoulme changed the title ~~Add remove method to synchronous instruments~~ Add finish method to synchronous instruments Dec 17, 2025

github-actions bot added the Stale label Jan 1, 2026

atoulme and others added 4 commits January 1, 2026 15:05

Add remove method to synchronous gauges, counters, updowncounters and…

0ebd7a5

… histograms

Add to changelog and add status

23239af

Update specification/metrics/api.md

d0c0299

Co-authored-by: Robert Pająk <pellared@hotmail.com>

code review

6fa9f8c

atoulme force-pushed the add_remove branch from 0b74ab2 to a221cfd Compare January 1, 2026 23:07

github-actions bot removed the Stale label Jan 2, 2026

atoulme force-pushed the add_remove branch from a221cfd to 0a442b4 Compare January 2, 2026 07:33

start adding to sdk

35fefda

atoulme force-pushed the add_remove branch from 0a442b4 to 35fefda Compare January 2, 2026 23:26

ManickaP approved these changes Feb 5, 2026

View reviewed changes

jack-berg assigned dashpole Feb 25, 2026

jack-berg mentioned this pull request Mar 11, 2026

Series start time open-telemetry/opentelemetry-java#8180

Merged

dashpole mentioned this pull request Mar 14, 2026

Add support for the development per-series starttime feature open-telemetry/opentelemetry-go#8060

Merged

atoulme added 2 commits March 16, 2026 17:48

Merge branch 'main' into add_remove

bb7741b

fixup

6690172

reyang reviewed Mar 17, 2026

View reviewed changes

dashpole reviewed Mar 17, 2026

View reviewed changes

jack-berg mentioned this pull request Mar 30, 2026

Add support to finish instrument recording open-telemetry/opentelemetry-java#7792

Draft


		Status: [Development](../document-status.md)

		Unregister the attribute set. It will no longer be reported.


		This API MUST accept the following parameter:

		* [Attributes](../common/README.md#attribute) to identify the Instrument.

		@@ -1041,6 +1042,13 @@ Note: If a user makes no configuration changes, `Enabled` returns `true` since b
		default `MeterConfig.enabled=true` and instruments use the default
		aggregation when no matching views match the instrument.

Conversation

atoulme commented Oct 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Changes

POCs

Uh oh!

Uh oh!

dashpole commented Oct 27, 2025

Uh oh!

jack-berg commented Oct 27, 2025

Uh oh!

jack-berg commented Oct 27, 2025

Uh oh!

dashpole commented Oct 27, 2025

Uh oh!

carlosalberto commented Oct 27, 2025

Uh oh!

atoulme commented Oct 28, 2025

Uh oh!

atoulme commented Oct 28, 2025

Uh oh!

dashpole commented Oct 28, 2025

Uh oh!

ArthurSens commented Oct 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

atoulme commented Oct 29, 2025

Uh oh!

atoulme commented Oct 29, 2025

Uh oh!

github-actions bot commented Nov 6, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

jmacd commented Nov 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented Nov 20, 2025

Uh oh!

github-actions bot commented Nov 27, 2025

Uh oh!

atoulme commented Dec 4, 2025

Uh oh!

atoulme commented Dec 4, 2025

Uh oh!

github-actions bot commented Jan 1, 2026

Uh oh!

atoulme commented Jan 8, 2026

Uh oh!

dashpole commented Feb 25, 2026

Uh oh!

atoulme commented Mar 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

charless-splunk commented Mar 4, 2026

Uh oh!

dashpole commented Mar 4, 2026

Uh oh!

reyang Mar 17, 2026

Choose a reason for hiding this comment

Uh oh!

atoulme Mar 17, 2026

Choose a reason for hiding this comment

Uh oh!

dashpole Mar 17, 2026

Choose a reason for hiding this comment

Uh oh!

dashpole Mar 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jack-berg Mar 30, 2026

Choose a reason for hiding this comment

Uh oh!

dashpole Mar 17, 2026

Choose a reason for hiding this comment

Uh oh!

atoulme commented Oct 25, 2025 •

edited

Loading

ArthurSens commented Oct 28, 2025 •

edited

Loading

jmacd commented Nov 12, 2025 •

edited

Loading

atoulme commented Mar 3, 2026 •

edited

Loading

dashpole Mar 17, 2026 •

edited

Loading

dashpole Mar 31, 2026 •

edited

Loading

MrAlias commented Apr 9, 2026 •

edited

Loading