Attribute sets not observed during async callbacks are not exported (#…

…3242)
open-telemetry · Aug 10, 2023 · db9422d · db9422d
1 parent fce0a69
commit db9422d
Show file tree

Hide file tree

Showing 3 changed files with 82 additions and 58 deletions.
diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -17,6 +17,8 @@ release.
   ([#3648](https://github.com/open-telemetry/opentelemetry-specification/pull/3648))
 - MetricReader.Collect ignores Resource from MetricProducer.Produce.
   ([#3636](https://github.com/open-telemetry/opentelemetry-specification/pull/3636))
+- Attribute sets not observed during async callbacks are not exported.
+  ([#3242](https://github.com/open-telemetry/opentelemetry-specification/pull/3242))
 
 ### Logs
 

diff --git a/specification/metrics/sdk.md b/specification/metrics/sdk.md
@@ -701,6 +701,10 @@ execution.
 The implementation MUST complete the execution of all callbacks for a
 given instrument before starting a subsequent round of collection.
 
+The implementation SHOULD NOT produce aggregated metric data for a
+previously-observed attribute set which is not observed during a successful
+callback.
+
 ### Cardinality limits
 
 **Status**: [Experimental](../document-status.md)

diff --git a/specification/metrics/supplementary-guidelines.md b/specification/metrics/supplementary-guidelines.md
@@ -431,25 +431,30 @@ Instrument](./api.md#histogram). What if we collect measurements from an
 [Asynchronous Counter](./api.md#asynchronous-counter)?
 
 The following example shows the number of [page
-faults](https://en.wikipedia.org/wiki/Page_fault) of each thread since the
-thread ever started:
+faults](https://en.wikipedia.org/wiki/Page_fault) of each process since
+it started:
 
 * During the time range (T<sub>0</sub>, T<sub>1</sub>]:
-  * pid = `1001`, tid = `1`, #PF = `50`
-  * pid = `1001`, tid = `2`, #PF = `30`
+  * pid = `1001`, #PF = `50`
+  * pid = `1002`, #PF = `30`
 * During the time range (T<sub>1</sub>, T<sub>2</sub>]:
-  * pid = `1001`, tid = `1`, #PF = `53`
-  * pid = `1001`, tid = `2`, #PF = `38`
+  * pid = `1001`, #PF = `53`
+  * pid = `1002`, #PF = `38`
 * During the time range (T<sub>2</sub>, T<sub>3</sub>]
-  * pid = `1001`, tid = `1`, #PF = `56`
-  * pid = `1001`, tid = `2`, #PF = `42`
+  * pid = `1001`, #PF = `56`
+  * pid = `1002`, #PF = `42`
 * During the time range (T<sub>3</sub>, T<sub>4</sub>]:
-  * pid = `1001`, tid = `1`, #PF = `60`
-  * pid = `1001`, tid = `2`, #PF = `47`
+  * pid = `1001`, #PF = `60`
+  * pid = `1002`, #PF = `47`
 * During the time range (T<sub>4</sub>, T<sub>5</sub>]:
-  * thread 1 died, thread 3 started
-  * pid = `1001`, tid = `2`, #PF = `53`
-  * pid = `1001`, tid = `3`, #PF = `5`
+  * process 1001 died, process 1003 started
+  * pid = `1002`, #PF = `53`
+  * pid = `1003`, #PF = `5`
+* During the time range (T<sub>5</sub>, T<sub>6</sub>]:
+  * A new process 1001 started
+  * pid = `1001`, #PF = `10`
+  * pid = `1002`, #PF = `57`
+  * pid = `1003`, #PF = `8`
 
 Note that in the following examples, Cumulative aggregation
 temporality is discussed before Delta aggregation temporality because
@@ -461,47 +466,56 @@ API with specified Cumulative aggregation temporality.
 If we export the metrics using **Cumulative Temporality**:
 
 * (T<sub>0</sub>, T<sub>1</sub>]
-  * attributes: {pid = `1001`, tid = `1`}, sum: `50`
-  * attributes: {pid = `1001`, tid = `2`}, sum: `30`
+  * attributes: {pid = `1001`}, sum: `50`
+  * attributes: {pid = `1002`}, sum: `30`
 * (T<sub>0</sub>, T<sub>2</sub>]
-  * attributes: {pid = `1001`, tid = `1`}, sum: `53`
-  * attributes: {pid = `1001`, tid = `2`}, sum: `38`
+  * attributes: {pid = `1001`}, sum: `53`
+  * attributes: {pid = `1002`}, sum: `38`
 * (T<sub>0</sub>, T<sub>3</sub>]
-  * attributes: {pid = `1001`, tid = `1`}, sum: `56`
-  * attributes: {pid = `1001`, tid = `2`}, sum: `42`
+  * attributes: {pid = `1001`}, sum: `56`
+  * attributes: {pid = `1002`}, sum: `42`
 * (T<sub>0</sub>, T<sub>4</sub>]
-  * attributes: {pid = `1001`, tid = `1`}, sum: `60`
-  * attributes: {pid = `1001`, tid = `2`}, sum: `47`
+  * attributes: {pid = `1001`}, sum: `60`
+  * attributes: {pid = `1002`}, sum: `47`
 * (T<sub>0</sub>, T<sub>5</sub>]
-  * attributes: {pid = `1001`, tid = `2`}, sum: `53`
-  * attributes: {pid = `1001`, tid = `3`}, sum: `5`
+  * attributes: {pid = `1002`}, sum: `53`
+* (T<sub>4</sub>, T<sub>5</sub>]
+  * attributes: {pid = `1003`}, sum: `5`
+* (T<sub>5</sub>, T<sub>6</sub>]
+  * attributes: {pid = `1001`}, sum: `10`
+* (T<sub>0</sub>, T<sub>6</sub>]
+  * attributes: {pid = `1002`}, sum: `57`
+* (T<sub>4</sub>, T<sub>6</sub>]
+  * attributes: {pid = `1003`}, sum: `8`
 
 The behavior in the first four periods is quite straightforward - we
 just take the data being reported from the asynchronous instruments
 and send them.
 
-The data model prescribes several valid behaviors at T<sub>5</sub> in
-this case, where one stream dies and another starts.  The [Resets and
-Gaps](./data-model.md#resets-and-gaps) section describes how start
-timestamps and staleness markers can be used to increase the
+The data model prescribes several valid behaviors at T<sub>5</sub> and
+T<sub>6</sub> in this case, where one stream dies and another starts.
+The [Resets and Gaps](./data-model.md#resets-and-gaps) section describes
+how start timestamps and staleness markers can be used to increase the
 receiver's understanding of these events.
 
 Consider whether the SDK maintains individual timestamps for the
 individual stream, or just one per process.  In this example, where a
-thread can die and start counting page faults from zero, the valid
-behaviors at T<sub>5</sub> are:
+process can die and restart, it starts counting page faults from zero.
+In this case, the valid behaviors at T<sub>5</sub> and T<sub>6</sub>
+are:
 
 1. If all streams in the process share a start time, and the SDK is
    not required to remember all past streams: the thread restarts with
-   zero sum.  Receivers with reset detection are able to calculate a
-   correct rate (except for frequent restarts relative to the
-   collection interval), however the precise time of a reset will be
-   unknown.
-2. If the SDK maintains per-stream start times, it signals to the
-   receiver precisely when a stream started, making the first
-   observation in a stream more useful for diagnostics.  Receivers can
-   perform overlap detection or duplicate suppression and do not
-   require reset detection, in this case.
+   zero sum, and the start time of the process.  Receivers with reset
+   detection are able to calculate a correct rate (except for frequent
+   restarts relative to the collection interval), however the precise
+   time of a reset will be unknown.
+2. If the SDK maintains per-stream start times, it provides the previous
+   callback time as the start time, as this time is before the occurrence
+   of any events which are measured during the subsequent callback. This
+   makes the first observation in a stream more useful for diagnostics,
+   as downstream consumers can perform overlap detection or duplicate
+   suppression and do not require reset detection in this case.
 3. Independent of above treatments, the SDK can add a staleness marker
    to indicate the start of a gap in the stream when one thread dies
    by remembering which streams have previously reported but are not
@@ -519,20 +533,23 @@ data model.
 If we export the metrics using **Delta Temporality**:
 
 * (T<sub>0</sub>, T<sub>1</sub>]
-  * attributes: {pid = `1001`, tid = `1`}, delta: `50`
-  * attributes: {pid = `1001`, tid = `2`}, delta: `30`
+  * attributes: {pid = `1002`}, delta: `30`
 * (T<sub>1</sub>, T<sub>2</sub>]
-  * attributes: {pid = `1001`, tid = `1`}, delta: `3`
-  * attributes: {pid = `1001`, tid = `2`}, delta: `8`
+  * attributes: {pid = `1001`}, delta: `3`
+  * attributes: {pid = `1002`}, delta: `8`
 * (T<sub>2</sub>, T<sub>3</sub>]
-  * attributes: {pid = `1001`, tid = `1`}, delta: `3`
-  * attributes: {pid = `1001`, tid = `2`}, delta: `4`
+  * attributes: {pid = `1001`}, delta: `3`
+  * attributes: {pid = `1002`}, delta: `4`
 * (T<sub>3</sub>, T<sub>4</sub>]
-  * attributes: {pid = `1001`, tid = `1`}, delta: `4`
-  * attributes: {pid = `1001`, tid = `2`}, delta: `5`
+  * attributes: {pid = `1001`}, delta: `4`
+  * attributes: {pid = `1002`}, delta: `5`
 * (T<sub>4</sub>, T<sub>5</sub>]
-  * attributes: {pid = `1001`, tid = `2`}, delta: `6`
-  * attributes: {pid = `1001`, tid = `3`}, delta: `5`
+  * attributes: {pid = `1002`}, delta: `6`
+  * attributes: {pid = `1003`}, delta: `5`
+* (T<sub>5</sub>, T<sub>6</sub>]
+  * attributes: {pid = `1001`}, delta: `10`
+  * attributes: {pid = `1002`}, delta: `4`
+  * attributes: {pid = `1003`}, delta: `3`
 
 You can see that we are performing Cumulative->Delta conversion, and it requires
 us to remember the last value of **every single permutation we've encountered so
@@ -560,27 +577,28 @@ So here are some suggestions that we encourage SDK implementers to consider:
 ##### Asynchronous example: attribute removal in a view
 
 Suppose the metrics in the asynchronous example above are exported
-through a view configured to remove the `tid` attribute, leaving a
-single-dimensional count of page faults by `pid`.  For each metric
-stream, two measurements are produced covering the same interval of
-time, which the SDK is expected to aggregate before producing the
-output.
+through a view configured to remove the `pid` attribute, leaving a
+count of page faults.  For each metric stream, two measurements are produced
+covering the same interval of time, which the SDK is expected to aggregate
+before producing the output.
 
 The data model specifies to use the "natural merge" function, in this
 case meaning to add the current point values together because they
 are `Sum` data points.  The expected output is, still in **Cumulative
 Temporality**:
 
 * (T<sub>0</sub>, T<sub>1</sub>]
-  * dimensions: {pid = `1001`}, sum: `80`
+  * dimensions: {}, sum: `80`
 * (T<sub>0</sub>, T<sub>2</sub>]
-  * dimensions: {pid = `1001`}, sum: `91`
+  * dimensions: {}, sum: `91`
 * (T<sub>0</sub>, T<sub>3</sub>]
-  * dimensions: {pid = `1001`}, sum: `98`
+  * dimensions: {}, sum: `98`
 * (T<sub>0</sub>, T<sub>4</sub>]
-  * dimensions: {pid = `1001`}, sum: `107`
+  * dimensions: {}, sum: `107`
 * (T<sub>0</sub>, T<sub>5</sub>]
-  * dimensions: {pid = `1001`}, sum: `58`
+  * dimensions: {}, sum: `58`
+* (T<sub>0</sub>, T<sub>6</sub>]
+  * dimensions: {}, sum: `75`
 
 As discussed in the asynchronous cumulative temporality example above,
 there are various treatments available for detecting resets.  Even if