stats: report sample count as an integer to prevent loss of precision by zuercher · Pull Request #6274 · envoyproxy/envoy

zuercher · 2019-03-12T22:05:00Z

When the number of samples in a histogram exceeds roughly 2^24, the
value of the "+Inf" bucket in Prometheus stats output is rounded
and rendered in scientific notation. This causes incorrect results
in the Prometheus histogram_quantile function, which assumes that
the rounding-error between the 1 hour and +Inf buckets represents
some number of requests that took in excess of 1 hour.

libcirclhist actually stores the sample count as a uint64_t, so
stop implicitly converting it to double and output the count
precisely (as we do with the non-infinite buckets). Also
modifies the output format of the sum metric to avoid scientific
notation.

Risk Level: low
Testing: added test case
Doc Changes: n/a
Release Notes: n/a

Signed-off-by: Stephan Zuercher zuercher@gmail.com

When the number of samples in a histogram exceeds roughly 2^24, the value of the "+Inf" bucket in Prometheus stats output is rounded and rendered in scientific notation. This causes incorrect results in the Prometheus histogram_quantile function, which assumes that the rounding-error between the 1 hour and +Inf buckets represents some number of requests that took in excess of 1 hour. libcirclhist actually stores the sample count as a uint64_t, so stop implicitly converting it to double and output the count precisely (as we do with the non-infinite buckets). Also modifies the output format of the sum metric to avoid scientific notation. Risk Level: low Testing: added test case Doc Changes: n/a Release Notes: n/a Signed-off-by: Stephan Zuercher <zuercher@gmail.com>

dnoe · 2019-03-13T14:22:45Z

source/server/http/admin.cc

    response.add(fmt::format("{0}_bucket{{{1}le=\"+Inf\"}} {2}\n", metric_name, hist_tags,
                             stats.sampleCount()));
-    response.add(fmt::format("{0}_sum{{{1}}} {2}\n", metric_name, tags, stats.sampleSum()));
+    response.add(fmt::format("{0}_sum{{{1}}} {2:.32g}\n", metric_name, tags, stats.sampleSum()));


My only concern with this is that it's changing something user visible and might break parsers. But if I understand correctly this output is meant to be parsed specifically by Prometheus scraper, so it's probably not being scraped by someone's custom stuff. WDYT?

I agree. It's also changing it to a purely [0-9]+(\.[0-9]+)? format, so I think that makes it less likely to be misinterpreted. To be fair, I believe prometheus can consume the scientific notation values correctly outside of the tags it uses for histogram bucketing. I changed this so that it matched the other outputs.

zuercher · 2019-03-20T16:18:50Z

Ping @dnoe

stale · 2019-03-27T16:54:56Z

This pull request has been automatically marked as stale because it has not had activity in the last 7 days. It will be closed in 7 days if no further activity occurs. Please feel free to give a status update now, ping for review, or re-open when it's ready. Thank you for your contributions!

zuercher · 2019-04-01T21:11:30Z

@dnoe do you have time to move this forward or shall I look for another reviewer?

dnoe

LGTM. Sorry for the delay, I recently adjusted email filters in an attempt to not lose track of PRs and this got lost in the transition.

* master: (137 commits) test: router upstream log to v2 config stubs (envoyproxy#6499) remove idle timeout validation (envoyproxy#6500) build: Change namespace of chromium_url. (envoyproxy#6506) coverage: exclude chromium_url (envoyproxy#6498) fix(tracing): allow 256 chars in path tag (envoyproxy#6492) Common: Introduce StopAllIteration filter status for decoding and encoding filters (envoyproxy#5954) build: update PGV url (envoyproxy#6495) subset lb: avoid partitioning host lists on worker threads (envoyproxy#6302) ci: Make envoy_select_quiche no-op. (envoyproxy#6393) watcher: notify when watched files are modified (envoyproxy#6215) stat: Add counterFromStatName(), gaugeFromStatName(), and histogramFromStatName() (envoyproxy#6475) bump to 1.11.0-dev (envoyproxy#6490) release: bump to 1.10.0 (envoyproxy#6489) hcm: path normalization. (#1) build: import manually minified Chrome URL lib. (envoyproxy#3) codec: reject embedded NUL in headers. (envoyproxy#2) Added veryfication if path contains query params and add them to path header (envoyproxy#6466) redis: basic integration test for redis_proxy (envoyproxy#6450) stats: report sample count as an integer to prevent loss of precision (envoyproxy#6274) Added VHDS protobuf message and updated RouteConfig to include it. (envoyproxy#6418) ... Signed-off-by: Michael Puncel <mpuncel@squareup.com>

dnoe reviewed Mar 13, 2019

View reviewed changes

dnoe self-assigned this Mar 13, 2019

stale bot added the stale stalebot believes this issue/PR has not been touched recently label Mar 27, 2019

stale bot removed the stale stalebot believes this issue/PR has not been touched recently label Apr 1, 2019

dnoe approved these changes Apr 2, 2019

View reviewed changes

mattklein123 merged commit 406547d into envoyproxy:master Apr 3, 2019

zuercher deleted the zuercher_fix_sample_count branch November 7, 2019 18:48

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

stats: report sample count as an integer to prevent loss of precision#6274

stats: report sample count as an integer to prevent loss of precision#6274
mattklein123 merged 1 commit intoenvoyproxy:masterfrom
zuercher:zuercher_fix_sample_count

zuercher commented Mar 12, 2019

Uh oh!

dnoe Mar 13, 2019

Uh oh!

zuercher Mar 13, 2019 •

edited

Loading

Uh oh!

zuercher commented Mar 20, 2019

Uh oh!

stale bot commented Mar 27, 2019

Uh oh!

zuercher commented Apr 1, 2019

Uh oh!

dnoe left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

zuercher commented Mar 12, 2019

Uh oh!

dnoe Mar 13, 2019

Choose a reason for hiding this comment

Uh oh!

zuercher Mar 13, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

zuercher commented Mar 20, 2019

Uh oh!

stale bot commented Mar 27, 2019

Uh oh!

zuercher commented Apr 1, 2019

Uh oh!

dnoe left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

zuercher Mar 13, 2019 •

edited

Loading