[native] Expose REST API to fetch worker stats in Prometheus format#22360
[native] Expose REST API to fetch worker stats in Prometheus format#22360majetideepak merged 1 commit intoprestodb:masterfrom
Conversation
f218fc1 to
6470454
Compare
|
Codenotify: Notifying subscribers in CODENOTIFY files for diff bccb13c...c73c5aa. No notifications. |
6470454 to
711cfb6
Compare
| #include <gtest/gtest.h> | ||
|
|
||
| namespace facebook::presto::prometheus { | ||
| class PrometheusReporterTest : public testing::Test { |
There was a problem hiding this comment.
This test will not run in the CI since that flag is off by default, can we do anything else to make it run?
There was a problem hiding this comment.
We may have to create a new CI job that builds code with prometheus enabled and run this test. Or, always build the code in existing CI jobs with prometheus enabled.
There was a problem hiding this comment.
Maybe just add a comment stating that this test is NoOp in CI for now
| // the ::prometheus::Registry allows us to Add new metric object each time | ||
| // which overwrites the existing metric object in its internal mapping. | ||
| mutable CounterMap countersMap_; | ||
| mutable GaugeMap gaugeMap_; |
There was a problem hiding this comment.
Is it better to use folly::ConcurrentHashMaps directly instead of having a custom mutex? That way concurrency will be better since all maps do not need to share a single mutex
There was a problem hiding this comment.
Switched to using Synchronous<> decorator.
| mutable std::unordered_map<std::string, ::facebook::velox::StatType> | ||
| registeredMetrics_; | ||
| // @TODO: may be add StatType::HISTOGRAM to avoid this member. | ||
| mutable std::unordered_set<std::string> registeredHistogramMetrics_; |
There was a problem hiding this comment.
Do we need this set? Can we just use histogramMap_ instead?
There was a problem hiding this comment.
Agree that we don't need it. I wanted to isolate the registered metrics Map which is modified once at startup from the actual metrics map (which can be modified concurrently).
| // If percentiles are provided, create a Summary type metric and register. | ||
| if (pcts.size() > 0) { | ||
| auto& summaryFamily = ::prometheus::BuildSummary() | ||
| .Name(keyStr.append(kSummarySuffix)) |
There was a problem hiding this comment.
Wont the metric name changed if we add a ksummarySuffix? Why is this needed?
There was a problem hiding this comment.
Or is it so that maybe we don't need the histogram if pcts are present and we can just register summary?
There was a problem hiding this comment.
If you check prometheus, Histograms and Summaries are 2 different metric types. If the incoming metric has quantiles (pcts) specified, then we use summary type as well.
https://prometheus.io/docs/concepts/metric_types/#histogram
There was a problem hiding this comment.
https://prometheus.io/docs/practices/histograms/#:~:text=The%20essential%20difference%20between%20summaries,the%20server%20side%20using%20the
The essential difference between summaries and histograms is that summaries calculate streaming φ-quantiles on the client side and expose them directly, while histograms expose bucketed observation counts and the calculation of quantiles from the buckets of a histogram happens on the server side using the [histogram_quantile() function](https://prometheus.io/docs/prometheus/latest/querying/functions/#histogram_quantile).
So maybe we do not need to register histogram if quantiles are present?
There was a problem hiding this comment.
Not sure about this^ just fyi
There was a problem hiding this comment.
We can revisit this. Not sure at this point if both give different viewpoints or Summary is sufficient over histograms.
There was a problem hiding this comment.
Cool thats fine, just check the kSummarySuffix i think the code here should be keyStr + "_Suffix" by value and we should not change the reference here
| case facebook::velox::StatType::SUM: | ||
| case facebook::velox::StatType::AVG: | ||
| case facebook::velox::StatType::RATE: { | ||
| auto& gaugeFamily = |
There was a problem hiding this comment.
Not entirely sure about the code style but if we don't write std::string as ::std::string then why should we write ::prometheus?
There was a problem hiding this comment.
It is a required thing to resolve global namespace. Compiler complains if we remove this. I don't have an exact reason why we need it.
There was a problem hiding this comment.
@karteekmurthys : Can you add "using namespace prometheus" statement to avoid this ?
| } | ||
| // Prometheus format requires '.' to be replaced. | ||
| std::string keyStr = std::string(key); | ||
| std::replace(keyStr.begin(), keyStr.end(), '.', '_'); |
There was a problem hiding this comment.
I don't see the replacement of '.' being used in addMetricValue function do we need this there?
presto-native-execution/presto_cpp/main/common/prometheus-metrics/PrometheusReporter.cpp
Outdated
Show resolved
Hide resolved
| // If percentiles are provided, create a Summary type metric and register. | ||
| if (pcts.size() > 0) { | ||
| auto& summaryFamily = ::prometheus::BuildSummary() | ||
| .Name(keyStr.append(kSummarySuffix)) |
There was a problem hiding this comment.
you have keyStr.append(kSummarySuffix) which means that the reference changed, then you have summaryMap_.emplace(std::string(key), summaryMetric);. So i think you are storing the key+suffix in the summary map
| .Register(*registry_); | ||
| ::prometheus::Summary::Quantiles quantiles; | ||
| for (auto pct : pcts) { | ||
| quantiles.push_back( |
There was a problem hiding this comment.
can just use quantiles.emplace_back(pct / (double)100, 0);
jaystarshot
left a comment
There was a problem hiding this comment.
Added some comments + I think we need to setup a CI job for this (maybe make unittest with flag on just for this unit test temoporariliy)
711cfb6 to
a4a7bd3
Compare
aditi-pandit
left a comment
There was a problem hiding this comment.
Thanks @karteekmurthys. Only few minor comments.
| // Initialize singleton for the reporter | ||
| folly::Singleton<facebook::velox::BaseStatsReporter> reporter([]() { | ||
| auto nodeConfig = facebook::presto::NodeConfig::instance(); | ||
| std::string cluster = nodeConfig->nodeEnvironment(); |
There was a problem hiding this comment.
const for cluster, hostName and worker variables.
|
|
||
| // Fill global pool fields. | ||
| poolInfo.maxBytes = nodeMemoryGb * 1024 * 1024 * 1024; | ||
| // TODO(sperhsin): If 'current bytes' is the same as we get by summing up all |
There was a problem hiding this comment.
Nit spelling spershin
|
|
||
| void PrestoServer::reportWorkerMetrics(proxygen::ResponseHandler* downstream) { | ||
| auto nodeConfig = facebook::presto::NodeConfig::instance(); | ||
| std::string cluster = nodeConfig->nodeEnvironment(); |
| char* hostName = std::getenv("HOSTNAME"); | ||
| std::string worker = !hostName ? "" : hostName; | ||
| #ifdef PRESTO_ENABLE_PROMETHEUS_REPORTER | ||
| auto reporter = std::dynamic_pointer_cast< |
There was a problem hiding this comment.
Maybe all variables above should be within #ifdef. Else it can be empty.
| proxygen::ResponseHandler* downstream) { | ||
| server->reportServerInfo(downstream); | ||
| }); | ||
| httpServer_->registerGet( |
There was a problem hiding this comment.
It seems like this will always be registered irrespective of whether you are using Prometheus. Is that desired ? We could avoid it in the Meta environment where they are not using Prometheus.
There was a problem hiding this comment.
For now it is registered and if the Prometheus is not enabled it is a no-op.
There was a problem hiding this comment.
It is odd to make it a no-op. It is better to return an invalid HTTP request error than an empty result. The later would be confusing.
Let's move the above if(systemConfig->enableRuntimeMetricsCollection()) check here and include this.
| auto bound = min + bucketWidth; | ||
| std::string keyStr = std::string(key); | ||
| std::replace(keyStr.begin(), keyStr.end(), '.', '_'); | ||
| auto& histogramFamily = |
There was a problem hiding this comment.
Nit : Add newline before
| } | ||
|
|
||
| void PrometheusReporter::registerHistogramMetricExportType( | ||
| folly::StringPiece key, |
There was a problem hiding this comment.
Defined in interface.
| "test_key4{" + labelsSerialized + "} 0"}; | ||
|
|
||
| auto verifySerializedResult = [&](std::string& fullSerializedResult, | ||
| std::vector<std::string>& expected) { |
| "# TYPE test_key4 gauge", | ||
| "test_key4{" + labelsSerialized + "} 0"}; | ||
|
|
||
| auto verifySerializedResult = [&](std::string& fullSerializedResult, |
There was a problem hiding this comment.
I feel you should send a const& here and make a copy in the verify routine. It makes it more readable.
| "# TYPE test_key4 gauge", | ||
| "test_key4{" + labelsSerialized + "} 0"}; | ||
|
|
||
| auto verifySerializedResult = [&](std::string& fullSerializedResult, |
There was a problem hiding this comment.
You can even make this a separate function instead of a lambda
22c1820 to
12302f1
Compare
czentgr
left a comment
There was a problem hiding this comment.
A few comments related to build integration.
presto-native-execution/presto_cpp/main/common/prometheus-metrics/PrometheusReporter.cpp
Outdated
Show resolved
Hide resolved
majetideepak
left a comment
There was a problem hiding this comment.
@karteekmurthys some high-level comments.
| proxygen::ResponseHandler* downstream) { | ||
| server->reportServerInfo(downstream); | ||
| }); | ||
| httpServer_->registerGet( |
There was a problem hiding this comment.
It is odd to make it a no-op. It is better to return an invalid HTTP request error than an empty result. The later would be confusing.
Let's move the above if(systemConfig->enableRuntimeMetricsCollection()) check here and include this.
presto-native-execution/presto_cpp/main/common/prometheus-metrics/PrometheusReporter.h
Outdated
Show resolved
Hide resolved
12302f1 to
1d56b00
Compare
|
@tanjialiang, @amitkdutta Can you make another pass? All your feedback has been addressed. |
fdf16dc to
2600215
Compare
presto-native-execution/presto_cpp/main/runtime-metrics/PrometheusStatsReporter.cpp
Show resolved
Hide resolved
presto-native-execution/presto_cpp/main/runtime-metrics/PrometheusStatsReporter.cpp
Show resolved
Hide resolved
presto-native-execution/presto_cpp/main/runtime-metrics/PrometheusStatsReporter.cpp
Show resolved
Hide resolved
2600215 to
b3a24cc
Compare
|
@amitkdutta, please import this PR and let us know if you see any other issues inside Meta. Thanks! |
|
@amitkdutta has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
amitkdutta
left a comment
There was a problem hiding this comment.
Looks great to me. The only issue is renaming the virtual function from enableRuntimeMetricReporting to enableWorkerStatsReporting, which we can fix inside Meta.
Thanks @karteekmurthys for all the iterations.
|
@karteekmurthys Let's fix the commit name to match the new naming |
|
@amitkdutta @tanjialiang Thanks for the review. |
b3a24cc to
7e49c3f
Compare
|
@tdcmeehan can you please approve this PR? The circleci change requires a committer approval. Thanks! |
|
nit: Should use "[native]"(lowercase) as a tag instead of "[Native]". |
|
Should we add documentation on how to enable reporting? |
I have opened another PR for that https://github.com/prestodb/presto/pull/23107/files. |
lingbin
left a comment
There was a problem hiding this comment.
Looking forward to this feature.
presto-native-execution/presto_cpp/main/runtime-metrics/PrometheusStatsReporter.cpp
Show resolved
Hide resolved
| size_t value) const { | ||
| auto metricIterator = registeredMetricsMap_.find(key); | ||
| if (metricIterator == registeredMetricsMap_.end()) { | ||
| VLOG(1) << "addMetricValue for unregistered metric " << key; |
There was a problem hiding this comment.
addMetricValue -> addHistogramMetricValue
Add Prometheus Reporter using the prometheus-cpp library. Add a CMake flag PRESTO_ENABLE_PROMETHEUS_REPORTER to enable Prometheus Reporter. Add REST API '/v1/info/metrics' to fetch worker metrics from Prometheus Reporter in prometheus format. This endpoint is only enabled if the Prometheus Reporter is enabled. Co-authored-by:jaystarshot <jay.narale@uber.com>
7e49c3f to
c73c5aa
Compare
Description
Related RFC.