-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-23431][CORE] Expose the new executor memory metrics at the stage level #23340
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
ok to test |
|
Test build #100275 has finished for PR 23340 at commit
|
|
Jenkins, retest this please |
squito
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you update the PR description to make it clear this is only updating the REST endpoints for now? As you refer to 'showing' the metrics, it sounds like they're visible in the UI.
| val activeStages = liveStages.values().asScala | ||
| .filter(_.status == v1.StageStatus.ACTIVE) | ||
| activeStages.foreach { stage => | ||
| stage.peakExecutorMetrics.compareAndUpdatePeakValues(updates) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do you need a maybeUpdate(stage, now) after this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, thanks for catching this.
rezasafi
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you very much for the pr.
|
|
||
| Option(liveStages.get((executorMetrics.stageId, executorMetrics.stageAttemptId))) | ||
| .foreach { stage => | ||
| stage.peakExecutorMetrics.compareAndUpdatePeakValues(executorMetrics.executorMetrics) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: Is it possible to change the name of the parameter or the field member? executorMetrics.executorMetrics feels kind of weird
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about "event" for the parameter name -- this would be more inline with the other methods.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah, sounds good to me.
| var metrics = createMetrics(default = 0L) | ||
|
|
||
| // peak values for executor level metrics | ||
| var peakExecutorMetrics = new ExecutorMetrics() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can this also be a val like the one in line 387?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, this should be val.
|
Test build #100289 has finished for PR 23340 at commit
|
|
Test build #100294 has finished for PR 23340 at commit
|
| } | ||
| // check if there is a new peak value for any of the executor level memory metrics, | ||
| // while reading from the log. SparkListenerStageExecutorMetrics are only processed | ||
| // when reading logs. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I need a little refresher here -- does this comment also apply to the new code you're adding above?
and why again does this only apply for when we're reading from the event logs (maybe this comment should be updated to point to whatever is happening for a live app)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, comments are still applicable, although for stage level. I'll move the comments around, and and add a pointer to onExecutorMetricsUpdate for the live app case.
|
Test build #100324 has finished for PR 23340 at commit
|
|
Test build #104844 has finished for PR 23340 at commit
|
|
We're closing this PR because it hasn't been updated in a while. If you'd like to revive this PR, please reopen it! |
|
Ping—any chance someone could review this patch? These are useful metrics. |
|
This is reopened according to the community request. I also removed the label, |
|
Can one of the admins verify this patch? |
|
@edwinalu thanks for getting this started so long ago. Looks like there are some conflicts after so much time has passed. Do you have time/interest to continue working on this or would you like some help getting this sorted out? |
|
@stackedsax , I don't have time to continue working on this unfortunately. If you or someone else is able to help with adding stage level metrics, that would be great. Please let me know if there's any questions about the original PR. |
|
@stackedsax, do you plan to work on this? Otherwise, I can start looking into this next week. Thanks! |
|
@imback82 I'll be taking this up if you don't (on behalf of @stackedsax), but since I am not a Spark contributor at the moment it'd probably be a lot faster and easier if you did it. So please do look into it—thank you! |
|
Thanks @itamarst. I started looking into this and will update this thread. |
… API ### What changes were proposed in this pull request? Note that this PR is forked from #23340 originally written by edwinalu. This PR proposes to expose the peak executor metrics at the stage level via the REST APIs: * `/applications/<application_id>/stages/`: peak values of executor metrics for **each stage** * `/applications/<application_id>/stages/<stage_id>/< stage_attempt_id >`: peak values of executor metrics for **each executor** for the stage, followed by peak values of executor metrics for the stage ### Why are the changes needed? The stage level peak executor metrics can help better understand your application's resource utilization. ### Does this PR introduce _any_ user-facing change? 1. For the `/applications/<application_id>/stages/` API, you will see the following new info for **each stage**: ```JSON "peakExecutorMetrics" : { "JVMHeapMemory" : 213367864, "JVMOffHeapMemory" : 189011656, "OnHeapExecutionMemory" : 0, "OffHeapExecutionMemory" : 0, "OnHeapStorageMemory" : 2133349, "OffHeapStorageMemory" : 0, "OnHeapUnifiedMemory" : 2133349, "OffHeapUnifiedMemory" : 0, "DirectPoolMemory" : 282024, "MappedPoolMemory" : 0, "ProcessTreeJVMVMemory" : 0, "ProcessTreeJVMRSSMemory" : 0, "ProcessTreePythonVMemory" : 0, "ProcessTreePythonRSSMemory" : 0, "ProcessTreeOtherVMemory" : 0, "ProcessTreeOtherRSSMemory" : 0, "MinorGCCount" : 13, "MinorGCTime" : 115, "MajorGCCount" : 4, "MajorGCTime" : 339 } ``` 2. For the `/applications/<application_id>/stages/<stage_id>/<stage_attempt_id>` API, you will see the following new info for **each executor** under `executorSummary`: ```JSON "peakMemoryMetrics" : { "JVMHeapMemory" : 0, "JVMOffHeapMemory" : 0, "OnHeapExecutionMemory" : 0, "OffHeapExecutionMemory" : 0, "OnHeapStorageMemory" : 0, "OffHeapStorageMemory" : 0, "OnHeapUnifiedMemory" : 0, "OffHeapUnifiedMemory" : 0, "DirectPoolMemory" : 0, "MappedPoolMemory" : 0, "ProcessTreeJVMVMemory" : 0, "ProcessTreeJVMRSSMemory" : 0, "ProcessTreePythonVMemory" : 0, "ProcessTreePythonRSSMemory" : 0, "ProcessTreeOtherVMemory" : 0, "ProcessTreeOtherRSSMemory" : 0, "MinorGCCount" : 0, "MinorGCTime" : 0, "MajorGCCount" : 0, "MajorGCTime" : 0 } ``` , and the following at the stage level: ```JSON "peakExecutorMetrics" : { "JVMHeapMemory" : 213367864, "JVMOffHeapMemory" : 189011656, "OnHeapExecutionMemory" : 0, "OffHeapExecutionMemory" : 0, "OnHeapStorageMemory" : 2133349, "OffHeapStorageMemory" : 0, "OnHeapUnifiedMemory" : 2133349, "OffHeapUnifiedMemory" : 0, "DirectPoolMemory" : 282024, "MappedPoolMemory" : 0, "ProcessTreeJVMVMemory" : 0, "ProcessTreeJVMRSSMemory" : 0, "ProcessTreePythonVMemory" : 0, "ProcessTreePythonRSSMemory" : 0, "ProcessTreeOtherVMemory" : 0, "ProcessTreeOtherRSSMemory" : 0, "MinorGCCount" : 13, "MinorGCTime" : 115, "MajorGCCount" : 4, "MajorGCTime" : 339 } ``` ### How was this patch tested? Added tests. Closes #29020 from imback82/metrics. Lead-authored-by: Terry Kim <[email protected]> Co-authored-by: edwinalu <[email protected]> Signed-off-by: Gengliang Wang <[email protected]>
|
This can be closed now. We can use changes in #29020 if backporting to 2.4 is needed. (@dongjoon-hyun / @gengliangwang can confirm, but I don't think this will be backported to 2.4.) |
|
We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable. |
What changes were proposed in this pull request?
Collect and show the new executor memory metrics for each stage, to provide more information on how memory is used per stage. Peak values for metrics are show for each stage. For executor summaries for each stage, the peak values per executor are also shown.
How was this patch tested?
Added new unit tests.
Please review http://spark.apache.org/contributing.html before opening a pull request.