Add ExponentialHistogram to Metrics data model (open-telemetry#1935)

jmacd · reyang · aabmass · web-flow · commit 7b52806aebdf · 2021-10-07T07:49:35.000-04:00
* Add ExponentialHistogram to Metrics data model

* draft expectations

* toc

* remove one 'exponential'

* mention the use of logarithm and inexact computation

* manual edit TOC

* typo

* reduce precision

* from Yuke's feedback

* mapping methods

* several fixes from yzhuge

* lint

* update links

* Changelog

* mention min/max

* let consumers deal with overflow and underflow

* yzhuge's remarks

* whitespace

* Apply suggestions from code review

Co-authored-by: Reiley Yang &lt;reyang@microsoft.com&gt;

* revert TOC trouble etc

* upcase

* Update specification/metrics/datamodel.md

Co-authored-by: Aaron Abbott &lt;aaronabbott@google.com&gt;

Co-authored-by: Reiley Yang &lt;reyang@microsoft.com&gt;
Co-authored-by: Aaron Abbott &lt;aaronabbott@google.com&gt;
diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -15,6 +15,8 @@ release.
 
 - Add optional min / max fields to histogram data model.
   ([#1915](https://github.com/open-telemetry/opentelemetry-specification/pull/1915))
+- Add exponential histogram to the metrics data model.
+  ([#1935](https://github.com/open-telemetry/opentelemetry-specification/pull/1935))
 
 ### Logs
 
diff --git a/specification/metrics/datamodel.md b/specification/metrics/datamodel.md
@@ -19,6 +19,17 @@
   * [Sums](#sums)
   * [Gauge](#gauge)
   * [Histogram](#histogram)
+  * [ExponentialHistogram](#exponentialhistogram)
+    + [Exponential Scale](#exponential-scale)
+    + [Exponential Buckets](#exponential-buckets)
+    + [Zero Count](#zero-count)
+    + [Producer Expectations](#producer-expectations)
+      - [Scale Zero: Extract the Exponent](#scale-zero-extract-the-exponent)
+      - [Negative Scale: Extract and Shift the Exponent](#negative-scale-extract-and-shift-the-exponent)
+      - [All Scales: Use the Logarithm Function](#all-scales-use-the-logarithm-function)
+      - [Positive Scale: Use a Lookup Table](#positive-scale-use-a-lookup-table)
+      - [Producer Recommendations](#producer-recommendations)
+    + [Consumer Expectations](#consumer-expectations)
   * [Summary (Legacy)](#summary-legacy)
 - [Exemplars](#exemplars)
 - [Single-Writer](#single-writer)
@@ -223,12 +234,13 @@ consisting of several metadata properties:
 - Kind of point (integer, floating point, etc)
 - Unit of measurement
 
-The primary data of each timeseries are ordered (timestamp, value) points, for
-three value types:
+The primary data of each timeseries are ordered (timestamp, value) points, with
+one of the following value types:
 
 1. Counter (Monotonic, Cumulative)
 2. Gauge
 3. Histogram
+4. Exponential Histogram
 
 This model may be viewed as an idealization of
 [Prometheus Remote Write](https://docs.google.com/document/d/1LPhVRSFkGNSuU1fBd81ulhsCPR4hkSZyyBj1SZ8fWOM/edit#heading=h.3p42p5s8n0ui).
@@ -267,9 +279,10 @@ same kind. <sup>[1](#otlpdatapointfn)</sup>
 
 The basic point kinds are:
 
-1. [Sum](https://github.com/open-telemetry/opentelemetry-proto/blob/v0.9.0/opentelemetry/proto/metrics/v1/metrics.proto#L230)
-2. [Gauge](https://github.com/open-telemetry/opentelemetry-proto/blob/v0.9.0/opentelemetry/proto/metrics/v1/metrics.proto#L200)
-3. [Histogram](https://github.com/open-telemetry/opentelemetry-proto/blob/v0.9.0/opentelemetry/proto/metrics/v1/metrics.proto#L258)
+1. [Sum](https://github.com/open-telemetry/opentelemetry-proto/blob/v0.10.x/opentelemetry/proto/metrics/v1/metrics.proto#L198)
+2. [Gauge](https://github.com/open-telemetry/opentelemetry-proto/blob/v0.10.x/opentelemetry/proto/metrics/v1/metrics.proto#L192)
+3. [Histogram](https://github.com/open-telemetry/opentelemetry-proto/blob/v0.10.x/opentelemetry/proto/metrics/v1/metrics.proto#L211)
+4. [Exponential Histogram](https://github.com/open-telemetry/opentelemetry-proto/blob/27a10cd70f63afdbddf460881969f9ad7ae4af5d/opentelemetry/proto/metrics/v1/metrics.proto#L239)
 
 Comparing the OTLP Metric Data Stream and Timeseries data models, OTLP does
 not map 1:1 from its point types into timeseries points. In OTLP, a Sum point
@@ -422,14 +435,262 @@ Changing the inclusivity and exclusivity of bounds is an example of
 worst-case Histogram error; users should choose Histogram boundaries
 so that worst-case error is within their error tolerance.
 
+### ExponentialHistogram
+
+**Status**: [Experimental](../document-status.md)
+
+[ExponentialHistogram](https://github.com/open-telemetry/opentelemetry-proto/blob/cfbf9357c03bf4ac150a3ab3bcbe4cc4ed087362/opentelemetry/proto/metrics/v1/metrics.proto#L222)
+data points are an alternate representation to the
+[Histogram](#histogram) data point, used to convey a population of
+recorded measurements in a compressed format.  ExponentialHistogram
+compresses bucket boundaries using an exponential formula, making it
+suitable for conveying high dynamic range data with small relative
+error, compared with alternative representations of similar size.
+
+Statements about `Histogram` that refer to aggregation temporality,
+attributes, and timestamps, as well as the `sum`, `count`, `min`, `max` and
+`exemplars` fields, are the same for `ExponentialHistogram`.  These
+fields all share identical interpretation as for `Histogram`, only the
+bucket structure differs between these two types.
+
+#### Exponential Scale
+
+The resolution of the ExponentialHistogram is characterized by a
+parameter known as `scale`, with larger values of `scale` offering
+greater precision.  Bucket boundaries of the ExponentialHistogram are
+located at integer powers of the `base`, also known as the "growth
+factor", where:
+
+```
+base = 2**(2**(-scale))
+```
+
+The symbol `**` in these formulas represents exponentiation, thus
+`2**x` is read "Two to the power of `x`", typically computed by an
+expression like `math.Pow(2.0, x)`.  Calculated `base` values for
+selected scales are shown below:
+
+| Scale | Base    | Expression  |
+| --    | --      | --          |
+| 10    | 1.00068 | 2**(1/1024) |
+| 9     | 1.00135 | 2**(1/512)  |
+| 8     | 1.00271 | 2**(1/256)  |
+| 7     | 1.00543 | 2**(1/128)  |
+| 6     | 1.01089 | 2**(1/64)   |
+| 5     | 1.02190 | 2**(1/32)   |
+| 4     | 1.04427 | 2**(1/16)   |
+| 3     | 1.09051 | 2**(1/8)    |
+| 2     | 1.18921 | 2**(1/4)    |
+| 1     | 1.41421 | 2**(1/2)    |
+| 0     | 2       | 2**1        |
+| -1    | 4       | 2**2        |
+| -2    | 16      | 2**4        |
+| -3    | 256     | 2**8        |
+| -4    | 65536   | 2**16       |
+
+An important property of this design is described as "perfect
+subsetting".  Buckets of an exponential Histogram with a given scale
+map exactly into buckets of exponential Histograms with lesser scales,
+which allows consumers to lower the resolution of a histogram (i.e.,
+downscale) without introducing error.
+
+#### Exponential Buckets
+
+The ExponentialHistogram bucket identified by `index`, a signed
+integer, represents values in the population that are greater than or
+equal to `base**index` and less than `base**(index+1)`.  Note that the
+ExponentialHistogram specifies a lower-inclusive bound while the
+explicit-boundary Histogram specifies an upper-inclusive bound.
+
+The positive and negative ranges of the histogram are expressed
+separately.  Negative values are mapped by their absolute value
+into the negative range using the same scale as the positive range.
+
+Each range of the ExponentialHistogram data point uses a dense
+representation of the buckets, where a range of buckets is expressed
+as a single `offset` value, a signed integer, and an array of count
+values, where array element `i` represents the bucket count for bucket
+index `offset+i`.
+
+For a given range, positive or negative:
+
+- Bucket index `0` counts measurements in the range `[1, base)`
+- Positive indexes correspond with absolute values greater or equal to `base`
+- Negative indexes correspond with absolute values less than 1
+- There are `2**scale` buckets between successive powers of 2.
+
+For example, with `scale=3` there are `2**3` buckets between 1 and 2.
+Note that the lower boundary for bucket index 4 in a `scale=3`
+histogram maps into the lower boundary for bucket index 2 in a
+`scale=2` histogram and maps into the lower boundary for bucket index
+1 (i.e., the `base`) in a `scale=1` histogram—these are examples of
+perfect subsetting.
+
+| `scale=3` bucket index | lower boundary | equation                     |
+| --                     | --             | --                           |
+| 0                      | 1              | 2**(0/8)                     |
+| 1                      | 1.09051        | 2**(1/8)                     |
+| 2                      | 1.18921        | 2**(2/8), 2**(1/4)           |
+| 3                      | 1.29684        | 2**(3/8)                     |
+| 4                      | 1.41421        | 2**(4/8), 2**(2/4), 2**(1/2) |
+| 5                      | 1.54221        | 2**(5/8)                     |
+| 6                      | 1.68179        | 2**(6/8)                     |
+| 7                      | 1.83401        | 2**(7/8)                     |
+
+#### Zero Count
+
+The ExponentialHistogram contains a special `zero_count` field
+containing the count of values that are either exactly zero or within
+the region considered zero by the instrumentation at the tolerated
+level of precision.  This bucket stores values that cannot be
+expressed using the standard exponential formula as well as values
+that have been rounded to zero.
+
+#### Producer Expectations
+
+The ExponentialHistogram design makes it possible to express values
+that are too large or small to be represented in the 64 bit "double"
+floating point format.  Certain values for `scale`, while meaningful,
+are not necessarily useful.
+
+The range of data represented by an ExponentialHistogram determines
+which scales can be usefully applied.  Regardless of scale, producers
+SHOULD ensure that the index of any encoded bucket falls within the
+range of a signed 32-bit integer.  This recommendation is applied to
+limit the width of integers used in standard processing pipelines such
+as the OpenTelemetry collector.  The wire-level protocol could be
+extended for 64-bit bucket indices in a future release.
+
+Producers use a mapping function to compute bucket indices.  Producers
+are presumed to support IEEE double-width floating-point numbers with
+11-bit exponent and 52-bit significand.  The pseudo-code below for
+mapping values to exponents refers to the following constants:
+
+```golang
+const (
+    // SignificandWidth is the size of an IEEE 754 double-precision
+    // floating-point significand.
+    SignificandWidth = 52
+    // ExponentWidth is the size of an IEEE 754 double-precision
+    // floating-point exponent.
+    ExponentWidth = 11
+
+    // SignificandMask is the mask for the significand of an IEEE 754
+    // double-precision floating-point value: 0xFFFFFFFFFFFFF.
+    SignificandMask = 1 << SignificandWidth - 1
+
+    // ExponentBias is the exponent bias specified for encoding
+    // the IEEE 754 double-precision floating point exponent: 1023.
+    ExponentBias = 1 << (ExponentWidth-1) - 1
+
+    // ExponentMask are set to 1 for the bits of an IEEE 754
+    // floating point exponent: 0x7FF0000000000000.
+    ExponentMask = ((1 << ExponentWidth) - 1) << SignificandWidth
+)
+```
+
+The following choices of mapping function have been validated through
+reference implementations.
+
+##### Scale Zero: Extract the Exponent
+
+For scale zero, the index of a value equals its normalized base-2
+exponent, meaning the value of _exponent_ in the base-2 fractional
+representation `1._significand_ * 2**_exponent_`.  Normal IEEE 754
+double-width floating point values have indices in the range
+`[-1022, +1023]` and subnormal values have indices in the range
+`[-1074, -1023]`.  This may be written as:
+
+```golang
+// GetExponent extracts the normalized base-2 fractional exponent.
+// Let the value be represented as `1.significand x 2**exponent`,
+// this returns `exponent`.  Not defined for 0, Inf, or NaN values.
+func GetExponent(value float64) int32 {
+    rawBits := math.Float64bits(value)
+    rawExponent := (int64(rawBits) & ExponentMask) >> SignificandWidth
+    rawSignificand := rawBits & SignificandMask
+    if rawExponent == 0 {
+        // Handle subnormal values: rawSignificand cannot be zero
+        // unless value is zero.
+        rawExponent -= int64(bits.LeadingZeros64(rawSignificand) - 12)
+    }
+    return int32(rawExponent - ExponentBias)
+}
+```
+
+##### Negative Scale: Extract and Shift the Exponent
+
+For negative scales, the index of a value equals the normalized
+base-2 exponent (as by `GetExponent()` above) shifted to the right
+by `-scale`.  Note that because of sign extension, this shift performs
+correct rounding for the negative indices.  This may be written as:
+
+```golang
+  return GetExponent(value) >> -scale
+```
+
+##### All Scales: Use the Logarithm Function
+
+For any scale, use of the built-in natural logarithm
+function.  A multiplicative factor equal to `2**scale / ln(2)`
+proves useful (where `ln()` is the natural logarithm), for example:
+
+```golang
+    scaleFactor := math.Log2E * math.Exp2(scale)
+    return int64(math.Floor(math.Log(value) * scaleFactor))
+```
+
+Note that in the example Golang code above, the built-in `math.Log2E`
+is defined as `1 / ln(2)`.
+
+##### Positive Scale: Use a Lookup Table
+
+For positive scales, lookup table methods have been demonstrated
+that are able to exactly compute the index in constant time from a
+lookup table with `O(2**scale)` entries.
+
+##### Producer Recommendations
+
+At the lowest or highest end of the 64 bit IEEE floating point, a
+bucket's range may only be partially representable by the floating
+point number format.  When mapping a number in these buckets, a
+producer may correctly return the index of such a partially
+representable bucket.  This is considered a normal condition.
+
+For positive scales, the logarithm method is preferred because it
+requires very little code, is easy to validate and is nearly as fast
+and accurate as the lookup table approach.  For zero scale and
+negative scales, directly calculating the index from the
+floating-point representation is more efficient.
+
+The use of a built-in logarithm function could lead to results that
+differ from the bucket index that would be computed using arbitrary
+precision or a lookup table, however producers are not required to
+perform an exact computation.  As a result, ExponentialHistogram
+exemplars could map into buckets with zero count.  We expect to find
+such values counted in the adjacent buckets.
+
+#### Consumer Expectations
+
+ExponentialHistogram bucket indices are expected to map into buckets
+where both the upper and lower boundaries can be represented
+using IEEE 754 double-width floating point values.  Consumers MAY
+round the unrepresentable boundary of a partially representable bucket
+index to the nearest representable value.
+
+Consumers SHOULD reject ExponentialHistogram data with `scale` and
+bucket indices that overflow or underflow this representation.
+Consumers that reject such data SHOULD warn the user through error
+logging that out-of-range data was received.
+
 ### Summary (Legacy)
 
 [Summary](https://github.com/open-telemetry/opentelemetry-proto/blob/v0.9.0/opentelemetry/proto/metrics/v1/metrics.proto#L268)
-metric data points convey quantile summaries, e.g. What is the 99-th percentile
-latency of my HTTP server.  Unlike other point types in OpenTelemetry, Summary
-points cannot always be merged in a meaningful way. This point type is not
-recommended for new applications and exists for compatibility with other
-formats.
+metric data points convey quantile summaries, e.g. What is the 99-th
+percentile latency of my HTTP server.  Unlike other point types in
+OpenTelemetry, Summary points cannot always be merged in a meaningful
+way. This point type is not recommended for new applications and
+exists for compatibility with other formats.
 
 ## Exemplars