Skip to content
Closed
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -40,30 +40,35 @@ private[v1] class PrometheusResource extends ApiRequestContext {
def executors(): String = {
val sb = new StringBuilder
val store = uiRoot.asInstanceOf[SparkUI].store
val appId = store.applicationInfo.id.replaceAll("[^a-zA-Z0-9]", "_")
store.executorList(true).foreach { executor =>
val prefix = s"metrics_${appId}_${executor.id}_executor_"
sb.append(s"${prefix}rddBlocks_Count ${executor.rddBlocks}\n")
sb.append(s"${prefix}memoryUsed_Count ${executor.memoryUsed}\n")
sb.append(s"${prefix}diskUsed_Count ${executor.diskUsed}\n")
sb.append(s"${prefix}totalCores_Count ${executor.totalCores}\n")
sb.append(s"${prefix}maxTasks_Count ${executor.maxTasks}\n")
sb.append(s"${prefix}activeTasks_Count ${executor.activeTasks}\n")
sb.append(s"${prefix}failedTasks_Count ${executor.failedTasks}\n")
sb.append(s"${prefix}completedTasks_Count ${executor.completedTasks}\n")
sb.append(s"${prefix}totalTasks_Count ${executor.totalTasks}\n")
sb.append(s"${prefix}totalDuration_Value ${executor.totalDuration}\n")
sb.append(s"${prefix}totalGCTime_Value ${executor.totalGCTime}\n")
sb.append(s"${prefix}totalInputBytes_Count ${executor.totalInputBytes}\n")
sb.append(s"${prefix}totalShuffleRead_Count ${executor.totalShuffleRead}\n")
sb.append(s"${prefix}totalShuffleWrite_Count ${executor.totalShuffleWrite}\n")
sb.append(s"${prefix}maxMemory_Count ${executor.maxMemory}\n")
val prefix = "metrics_executor_"
val labels = Seq(
"application_id" -> store.applicationInfo.id,
"application_name" -> store.applicationInfo.name,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure if application name is needed, because you have application id already.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the only field human-readable to distinguish the jobs.

"executor_id" -> executor.id
).map { case (k, v) => s"""$k="$v"""" }.mkString("{", ", ", "}")
sb.append(s"${prefix}rddBlocks_Count$labels ${executor.rddBlocks}\n")
Copy link
Member

@viirya viirya Oct 9, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually prefix is fixed? So they are now the same metrics. And application id, executor id now are labels on them?

Does it have bad impact on the metrics usage later? Because now all applications are recorded under the same metrics. I am not sure how Prometheus processes, but naturally I'd think Prometheus needs to search specified application id in the metrics of all applications.

Previously you have appId and executor id in metric name.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right. The redundant information is moved to labels.

Actually prefix is fixed? So they are now the same metrics. And application id, executor id now are labels on them?

No. Prometheus query language support to handle them individually.

Does it have bad impact on the metrics usage later? Because now all applications are recorded under the same metrics.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No. Prometheus query language support to handle them individually.

Yes. But I am wondering is, now all numbers from all applications are recorded under same metric. To retrieve number for specified application, does not Prometheus need to search it among all applications' metric numbers?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I may misunderstand Prometheus's approach. If so, then this might not be a problem.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For that Prometheus question, different labels mean different time-series in Prometheus.

Prometheus fundamentally stores all data as time series: streams of timestamped values belonging to the same metric and the same set of labeled dimensions.

Here are the reference for the details~

sb.append(s"${prefix}memoryUsed_Count$labels ${executor.memoryUsed}\n")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not related to this PR. But why they all end with _Count? For rddBlocks, it is ok, but some seems not suitable, like memoryUsed_Count.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for review, @viirya . Yes. Of course, we rename it freely because we start to support them natively.

sb.append(s"${prefix}diskUsed_Count$labels ${executor.diskUsed}\n")
sb.append(s"${prefix}totalCores_Count$labels ${executor.totalCores}\n")
sb.append(s"${prefix}maxTasks_Count$labels ${executor.maxTasks}\n")
sb.append(s"${prefix}activeTasks_Count$labels ${executor.activeTasks}\n")
sb.append(s"${prefix}failedTasks_Count$labels ${executor.failedTasks}\n")
sb.append(s"${prefix}completedTasks_Count$labels ${executor.completedTasks}\n")
sb.append(s"${prefix}totalTasks_Count$labels ${executor.totalTasks}\n")
sb.append(s"${prefix}totalDuration_Value$labels ${executor.totalDuration}\n")
sb.append(s"${prefix}totalGCTime_Value$labels ${executor.totalGCTime}\n")
sb.append(s"${prefix}totalInputBytes_Count$labels ${executor.totalInputBytes}\n")
sb.append(s"${prefix}totalShuffleRead_Count$labels ${executor.totalShuffleRead}\n")
sb.append(s"${prefix}totalShuffleWrite_Count$labels ${executor.totalShuffleWrite}\n")
sb.append(s"${prefix}maxMemory_Count$labels ${executor.maxMemory}\n")
executor.executorLogs.foreach { case (k, v) => }
executor.memoryMetrics.foreach { m =>
sb.append(s"${prefix}usedOnHeapStorageMemory_Count ${m.usedOnHeapStorageMemory}\n")
sb.append(s"${prefix}usedOffHeapStorageMemory_Count ${m.usedOffHeapStorageMemory}\n")
sb.append(s"${prefix}totalOnHeapStorageMemory_Count ${m.totalOnHeapStorageMemory}\n")
sb.append(s"${prefix}totalOffHeapStorageMemory_Count ${m.totalOffHeapStorageMemory}\n")
sb.append(s"${prefix}usedOnHeapStorageMemory_Count$labels ${m.usedOnHeapStorageMemory}\n")
sb.append(s"${prefix}usedOffHeapStorageMemory_Count$labels ${m.usedOffHeapStorageMemory}\n")
sb.append(s"${prefix}totalOnHeapStorageMemory_Count$labels ${m.totalOnHeapStorageMemory}\n")
sb.append(s"${prefix}totalOffHeapStorageMemory_Count$labels " +
s"${m.totalOffHeapStorageMemory}\n")
}
executor.peakMemoryMetrics.foreach { m =>
val names = Array(
Expand All @@ -89,7 +94,7 @@ private[v1] class PrometheusResource extends ApiRequestContext {
"MajorGCTime"
)
names.foreach { name =>
sb.append(s"$prefix${name}_Count ${m.getMetricValue(name)}\n")
sb.append(s"$prefix${name}_Count$labels ${m.getMetricValue(name)}\n")
}
}
}
Expand Down