Skip to content

[Native] Add prometheus metric exporter using prometheus c++ library#21753

Closed
jaystarshot wants to merge 2 commits intoprestodb:masterfrom
jaystarshot:velox-prometheus-metrics
Closed

[Native] Add prometheus metric exporter using prometheus c++ library#21753
jaystarshot wants to merge 2 commits intoprestodb:masterfrom
jaystarshot:velox-prometheus-metrics

Conversation

@jaystarshot
Copy link
Copy Markdown
Member

@jaystarshot jaystarshot commented Jan 22, 2024

Support Prometheus metrics using prometheus cpp library https://github.com/jupp0r/prometheus-cpp which is more stable

This is used in Uber, we are still not in production but we are in shadow mode

Motivation and Context

Impact

Test Plan

Contributor checklist

  • Please make sure your submission complies with our development, formatting, commit message, and attribution guidelines.
  • PR description addresses the issue accurately and concisely. If the change is non-trivial, a GitHub Issue is referenced.
  • Documented new properties (with its default value), SQL syntax, functions, or other functionality.
  • If release notes are required, they follow the release notes guidelines.
  • Adequate tests were added if applicable.
  • CI passed.

Release Notes

Please follow release notes guidelines and fill in the release notes below.

== RELEASE NOTES ==

General Changes
* ...
* ...

Hive Changes
* ...
* ...

If release note is NOT required, use:

== NO RELEASE NOTE ==

@jaystarshot jaystarshot changed the title Add prometheus metric exporter using prometheus c++ library [Native] Add prometheus metric exporter using prometheus c++ library Jan 22, 2024
@jaystarshot jaystarshot force-pushed the velox-prometheus-metrics branch from 802dffe to e15fba0 Compare January 23, 2024 20:32
@aditi-pandit
Copy link
Copy Markdown
Contributor

@jaystarshot : Thanks for this work. Its really good.

Curious if you ran some benchmarks (taking https://github.com/jupp0r/prometheus-cpp?tab=readme-ov-file#benchmarks as an example) with your code in Prestissimo ? We might be interested in how this holds up over long periods. Its possible that production servers run for upto a month before recycling the counters as well.

The other implementation was super simplistic but was very stream-lined for Ahana/IBM use-case.

@karteekmurthys : Would you and Jay be able to co-author an RFC which outlines details of the metrics, how the monitoring is configured, production characteristic. It would help us be more organized for this.

@jaystarshot
Copy link
Copy Markdown
Member Author

jaystarshot commented Jan 25, 2024

@aditi-pandit Thanks! But I have just started development this week and its not really in production yet. I am also new to cpp and it will take some time. My inital goal was to make it production-ready, deploy, test then update this PR here but I didn't want conflicting implementations in opensource, hence I reached out early.
I also think we need a way to control what metrics will be allowed like jmxExporter has in its yaml files.
Also I roughly think that this library can be used in a sidecar in future if we want a similar setup as jmxexporter. Not sure about the inter container protocol (i think jmxexporter talks to the presto jvm process in some unique protocol)

@jaystarshot jaystarshot force-pushed the velox-prometheus-metrics branch from e15fba0 to f4065ea Compare January 26, 2024 05:48
@jaystarshot
Copy link
Copy Markdown
Member Author

image
This is working fine on my local setup for count as of now

@jaystarshot jaystarshot force-pushed the velox-prometheus-metrics branch from f4065ea to 4feeab4 Compare January 27, 2024 01:01
Summary:
1. Known limitation - Added unit test won't run in CI, this is because the prometheus flag is off in CI  since enabling it causes some issues in server tests

Test Plan:
Unit test + tested via deployment,
Metrics populated - https://ugrafana.uberinternal.comx

https://querybuilder.uberinternal.com/r/DjLbAMs1F/edit shadow tested

Reviewers: #ldap_velox-core, hitarth

Reviewed By: #ldap_velox-core, hitarth

Subscribers: hitarth, ptiwary, curt

JIRA Issues: PRESTO-6312

Differential Revision: https://code.uberinternal.com/D12756007
@jaystarshot jaystarshot force-pushed the velox-prometheus-metrics branch from 4feeab4 to 27d3e8e Compare February 29, 2024 05:08
.Help("Summary for " + keyStr)
.Register(*registry);
auto& summary = summaryFamily.Add({{"cluster", cluster_}, {"node", node_}, {"host", host_}},
prometheus::Summary::Quantiles{{0.5, 0.05}});
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

0.5 quantile is not the average, instead it is the median value. This is as good as reporting instantaneous value.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

image
It has sum and count so avg can be computed as well

@jaystarshot
Copy link
Copy Markdown
Member Author

Closing since the other commit is merged

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants