-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-29400][CORE] Improve PrometheusResource to use labels #26060
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Hi, @gatorsmile . |
|
Test build #111920 has finished for PR 26060 at commit
|
|
Hi, @srowen , @dbtsai , @HyukjinKwon . |
|
@dongjoon-hyun Thanks for fixing this.
If a user uses a central Prometheus server to scrape its spark application with this PR. for each time, it has a new Spark application, it will have N metrics(say 10) and assume it has M workers(20) on average. As app_id will change each time, with time going, old metrics will not disappear, it will add up to millions and even billions of metrics. This will cause a heavy load for a traditional Prometheus server. There are several solutions(M3, Cortex, Thanos) to address this issue, but we should make it clear about the cardinality for users to use such metrics. It would be great we give some suggestion how we want users to use such metrics in practice, especially on how to handle short-lived metrics and high cardinality metrics |
| "executor_id" -> executor.id | ||
| ).map { case (k, v) => s"""$k="$v"""" }.mkString("{", ", ", "}") | ||
| sb.append(s"${prefix}rddBlocks_Count$labels ${executor.rddBlocks}\n") | ||
| sb.append(s"${prefix}memoryUsed_Count$labels ${executor.memoryUsed}\n") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not related to this PR. But why they all end with _Count? For rddBlocks, it is ok, but some seems not suitable, like memoryUsed_Count.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for review, @viirya . Yes. Of course, we rename it freely because we start to support them natively.
|
Hi, @yuecong . Thank you for review.
I don't think you mean |
This is awesome! |
For Prometheus, not just the metrics name, but also labels count for a high-cardinality. Inside the tsdb for Prometheus server, actually, the combination of metrics names and labels are the key.
|
|
@yuecong . That is a general issue on Apache Spark monitoring instead of this PR, isn't it? So, I have three questions for you.
|
|
Do the metrics for the Spark application disappear after the application finished? I guess the answer is No. If driver keeps all the metrics for all the spark applications running using the driver, will this not cause a high memory usage for the driver? Do you have some tests on this? |
|
This PR doesn't collect new metrics, only exposing the existing one. So, the following is not about this PR. If you have a concern on Apache Spark driver, you can file a new issue on that.
Second, from |
I agree with it is a general challenge for Apache Spark monitoring using normal Prometheus server. I would suggest just make it clear about its high-cardinality. Maybe this is orthogonal to your PR. just my two cents. People use a highly scalable Prometheus(e.g. M3, Cotext, etc) to handle Spark metrics. Also if we could have one custom exporter to allow users to use a push model to expose it to some distributed time serials database or a pub-sub system(e.g. kafka), it can solve this high cardinality issue as well |
could you share the link on this one? we are not using TTL feature for Prometheus yet. |
|
Sorry for the misleading naming. I meant the following.
|
Thanks for the pointer. We are using a retention period for sure. But this will not help to expire some metrics as far as it goes into the Prometheus server. Also, it is one challenge to remove metrics from the Prometheus registry from the client-side as well. Again, these challenges become more critical when it is high cardinality metrics. This is why Prometheus community does not suggest people to use some unbounded values for a label. |
|
BTW, the following cover some gigantic clusters, but not all cases. There is a different and cheaper approach like
|
Would like to hear more about this. Our experience on Federation is that if we do not filter out some metrics but just federate metrics from each child server, the Federation prom will become the bottleneck. Federation provid a way to gather metrics from other Prometheus server, but it does not solve the scalability issues. |
|
Nope. Why do you collect all? It's up to your configuration. Back to the beginning, I fully understand your cluster's underlying issues. However, none of them blocks Apache Spark support
I'm not sure your metric. Could you share us the size of your clusters and the number of apps and the number of metrics? Does it run on Apache Spark? |
|
I don't see any number from you so far here. :) |
|
Gentle ping, @gatorsmile , @srowen , @dbtsai , @viirya , @yuecong . Brief Summary
|
|
(I don't have any opinion on this - don't know Prometheus) |
| val prefix = "metrics_executor_" | ||
| val labels = Seq( | ||
| "application_id" -> store.applicationInfo.id, | ||
| "application_name" -> store.applicationInfo.name, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure if application name is needed, because you have application id already.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the only field human-readable to distinguish the jobs.
| "application_name" -> store.applicationInfo.name, | ||
| "executor_id" -> executor.id | ||
| ).map { case (k, v) => s"""$k="$v"""" }.mkString("{", ", ", "}") | ||
| sb.append(s"${prefix}rddBlocks_Count$labels ${executor.rddBlocks}\n") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually prefix is fixed? So they are now the same metrics. And application id, executor id now are labels on them?
Does it have bad impact on the metrics usage later? Because now all applications are recorded under the same metrics. I am not sure how Prometheus processes, but naturally I'd think Prometheus needs to search specified application id in the metrics of all applications.
Previously you have appId and executor id in metric name.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right. The redundant information is moved to labels.
Actually prefix is fixed? So they are now the same metrics. And application id, executor id now are labels on them?
No. Prometheus query language support to handle them individually.
Does it have bad impact on the metrics usage later? Because now all applications are recorded under the same metrics.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No. Prometheus query language support to handle them individually.
Yes. But I am wondering is, now all numbers from all applications are recorded under same metric. To retrieve number for specified application, does not Prometheus need to search it among all applications' metric numbers?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I may misunderstand Prometheus's approach. If so, then this might not be a problem.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For that Prometheus question, different labels mean different time-series in Prometheus.
Prometheus fundamentally stores all data as time series: streams of timestamped values belonging to the same metric and the same set of labeled dimensions.
Here are the reference for the details~
|
Seems fine. |
|
Thank you all, @yuecong , @viirya , @srowen , @HyukjinKwon . |
### What changes were proposed in this pull request? [SPARK-29064](apache#25770) introduced `PrometheusResource` to expose `ExecutorSummary`. This PR aims to improve it further more `Prometheus`-friendly to use [Prometheus labels](https://prometheus.io/docs/concepts/data_model/#metric-names-and-labels). ### Why are the changes needed? **BEFORE** ``` metrics_app_20191008151432_0000_driver_executor_rddBlocks_Count 0 metrics_app_20191008151432_0000_driver_executor_memoryUsed_Count 0 metrics_app_20191008151432_0000_driver_executor_diskUsed_Count 0 ``` **AFTER** ``` $ curl -s http://localhost:4040/metrics/executors/prometheus/ | head -n3 metrics_executor_rddBlocks_Count{application_id="app-20191008151625-0000", application_name="Spark shell", executor_id="driver"} 0 metrics_executor_memoryUsed_Count{application_id="app-20191008151625-0000", application_name="Spark shell", executor_id="driver"} 0 metrics_executor_diskUsed_Count{application_id="app-20191008151625-0000", application_name="Spark shell", executor_id="driver"} 0 ``` ### Does this PR introduce any user-facing change? No, but `Prometheus` understands the new format and shows more intelligently. <img width="735" alt="ui" src="https://user-images.githubusercontent.com/9700541/66438279-1756f900-e9e1-11e9-91c7-c04c6ce9172f.png"> ### How was this patch tested? Manually. **SETUP** ``` $ sbin/start-master.sh $ sbin/start-slave.sh spark://`hostname`:7077 $ bin/spark-shell --master spark://`hostname`:7077 --conf spark.ui.prometheus.enabled=true ``` **RESULT** ``` $ curl -s http://localhost:4040/metrics/executors/prometheus/ | head -n3 metrics_executor_rddBlocks_Count{application_id="app-20191008151625-0000", application_name="Spark shell", executor_id="driver"} 0 metrics_executor_memoryUsed_Count{application_id="app-20191008151625-0000", application_name="Spark shell", executor_id="driver"} 0 metrics_executor_diskUsed_Count{application_id="app-20191008151625-0000", application_name="Spark shell", executor_id="driver"} 0 ``` Closes apache#26060 from dongjoon-hyun/SPARK-29400. Authored-by: Dongjoon Hyun <[email protected]> Signed-off-by: Dongjoon Hyun <[email protected]>
|
I see that this PR is closed but https://issues.apache.org/jira/browse/SPARK-29400 says issue is resolved. Is that true ? |
|
Right, the change was merged several years ago? |
What changes were proposed in this pull request?
SPARK-29064 introduced
PrometheusResourceto exposeExecutorSummary. This PR aims to improve it further morePrometheus-friendly to use Prometheus labels.Why are the changes needed?
BEFORE
AFTER
Does this PR introduce any user-facing change?
No, but
Prometheusunderstands the new format and shows more intelligently.How was this patch tested?
Manually.
SETUP
RESULT