Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

prometheus: add some extra buckets for upload metrics #591

Merged
merged 3 commits into from
Aug 6, 2024
Merged

Conversation

matt-codecov
Copy link
Contributor

@matt-codecov matt-codecov commented Aug 6, 2024

(click on VPN) https://l.codecov.dev/S0nAAE is a dash with some new upload size / report size metrics. it has some quirks:

  • histogram_quantile interpolates values <1 for the number of reports in an upload
  • report json size, raw upload size, and raw reports per upload hit the top bucket at p99 often
  • chunks file size doesn't hit the top bucket but it's pretty close

i added some buckets to help with those quirks

@matt-codecov matt-codecov requested a review from a team August 6, 2024 01:58
@codecov-qa
Copy link

codecov-qa bot commented Aug 6, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 97.55%. Comparing base (c864a13) to head (c4f1a71).

✅ All tests successful. No failed tests found.

Impacted file tree graph

@@           Coverage Diff           @@
##             main     #591   +/-   ##
=======================================
  Coverage   97.55%   97.55%           
=======================================
  Files         425      425           
  Lines       35593    35593           
=======================================
  Hits        34724    34724           
  Misses        869      869           
Flag Coverage Δ
integration 97.55% <ø> (ø)
latest-uploader-overall 97.55% <ø> (ø)
unit 97.55% <ø> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

Components Coverage Δ
NonTestCode 94.70% <ø> (ø)
OutsideTasks 97.80% <ø> (ø)
Files Coverage Δ
services/report/prometheus_metrics.py 100.00% <ø> (ø)
tasks/upload_finisher.py 73.61% <ø> (ø)

@codecov-notifications
Copy link

codecov-notifications bot commented Aug 6, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

✅ All tests successful. No failed tests found.

Impacted file tree graph

@@           Coverage Diff           @@
##             main     #591   +/-   ##
=======================================
  Coverage   97.55%   97.55%           
=======================================
  Files         425      425           
  Lines       35593    35593           
=======================================
  Hits        34724    34724           
  Misses        869      869           
Flag Coverage Δ
integration 97.55% <ø> (ø)
latest-uploader-overall 97.55% <ø> (ø)
unit 97.55% <ø> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

Components Coverage Δ
NonTestCode 94.70% <ø> (ø)
OutsideTasks 97.80% <ø> (ø)
Files Coverage Δ
services/report/prometheus_metrics.py 100.00% <ø> (ø)
tasks/upload_finisher.py 73.61% <ø> (ø)

Copy link

codecov-public-qa bot commented Aug 6, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 97.55%. Comparing base (c864a13) to head (c4f1a71).

✅ All tests successful. No failed tests found.

Impacted file tree graph

@@           Coverage Diff           @@
##             main     #591   +/-   ##
=======================================
  Coverage   97.55%   97.55%           
=======================================
  Files         425      425           
  Lines       35593    35593           
=======================================
  Hits        34724    34724           
  Misses        869      869           
Flag Coverage Δ
integration 97.55% <ø> (ø)
latest-uploader-overall 97.55% <ø> (ø)
unit 97.55% <ø> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

Components Coverage Δ
NonTestCode 94.70% <ø> (ø)
OutsideTasks 97.80% <ø> (ø)
Files Coverage Δ
services/report/prometheus_metrics.py 100.00% <ø> (ø)
tasks/upload_finisher.py 73.61% <ø> (ø)

Copy link

codecov bot commented Aug 6, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 97.60%. Comparing base (c864a13) to head (c4f1a71).

✅ All tests successful. No failed tests found.

Additional details and impacted files

Impacted file tree graph

@@           Coverage Diff           @@
##             main     #591   +/-   ##
=======================================
  Coverage   97.60%   97.60%           
=======================================
  Files         460      460           
  Lines       36799    36799           
=======================================
  Hits        35916    35916           
  Misses        883      883           
Flag Coverage Δ
integration 97.55% <ø> (ø)
latest-uploader-overall 97.55% <ø> (ø)
unit 97.55% <ø> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

Components Coverage Δ
NonTestCode 94.78% <ø> (ø)
OutsideTasks 97.80% <ø> (ø)
Files Coverage Δ
services/report/prometheus_metrics.py 100.00% <ø> (ø)
tasks/upload_finisher.py 73.73% <ø> (ø)

This change has been scanned for critical changes. Learn more

Comment on lines 27 to 29
# The 0.98 bucket is to stop Prometheus from interpolating values much
# lower than 1 in its histogram_quantile function.
buckets=[0.98, 1, 5, 10, 20, 50, 100],
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is really weird tbh. Will this ever emit metrics <1? In which case, we would probably want to early-return much earlier? Also, this metric would probably benefit from a bunch more smaller buckets, maybe even all the way from 1..=10?

Copy link
Contributor Author

@matt-codecov matt-codecov Aug 6, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agree this is very weird. i don't think we expect any raw uploads to contain 0 raw reports, but i don't know whether it can happen anyway. that's part of why i'm introducing this weird bucket

the reason for it is: the true median may be 1.0, but all prometheus can say for sure is that it's in the bucket between 0.0 and 1.0. based on the relative size of the bucket, it will interpolate a value in that range to call the median

by adding a bucket at 0.98, the range for the estimated median is limited to 0.98-1.0. so it'll look nicer on a graph, but also, we can ask prometheus how many values lower than 0.98 were observed to know whether any 0-report uploads are coming in. 0.99 or higher would probably work but i am being overly cautious of floating point

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

more granular buckets in the single digits is a good idea, i'll rebalance the buckets a bit

@matt-codecov matt-codecov added this pull request to the merge queue Aug 6, 2024
Merged via the queue into main with commit c1ebb39 Aug 6, 2024
25 of 26 checks passed
@matt-codecov matt-codecov deleted the pr591 branch August 6, 2024 19:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants