Skip to content

Conversation

@imback82
Copy link
Contributor

@imback82 imback82 commented Jul 7, 2020

What changes were proposed in this pull request?

Note that this PR is forked from #23340 originally written by @edwinalu.

This PR proposes to expose the peak executor metrics at the stage level via the REST APIs:

  • /applications/<application_id>/stages/: peak values of executor metrics for each stage
  • /applications/<application_id>/stages/<stage_id>/< stage_attempt_id >: peak values of executor metrics for each executor for the stage, followed by peak values of executor metrics for the stage

Why are the changes needed?

The stage level peak executor metrics can help better understand your application's resource utilization.

Does this PR introduce any user-facing change?

  1. For the /applications/<application_id>/stages/ API, you will see the following new info for each stage:
  "peakExecutorMetrics" : {
    "JVMHeapMemory" : 213367864,
    "JVMOffHeapMemory" : 189011656,
    "OnHeapExecutionMemory" : 0,
    "OffHeapExecutionMemory" : 0,
    "OnHeapStorageMemory" : 2133349,
    "OffHeapStorageMemory" : 0,
    "OnHeapUnifiedMemory" : 2133349,
    "OffHeapUnifiedMemory" : 0,
    "DirectPoolMemory" : 282024,
    "MappedPoolMemory" : 0,
    "ProcessTreeJVMVMemory" : 0,
    "ProcessTreeJVMRSSMemory" : 0,
    "ProcessTreePythonVMemory" : 0,
    "ProcessTreePythonRSSMemory" : 0,
    "ProcessTreeOtherVMemory" : 0,
    "ProcessTreeOtherRSSMemory" : 0,
    "MinorGCCount" : 13,
    "MinorGCTime" : 115,
    "MajorGCCount" : 4,
    "MajorGCTime" : 339
  }
  1. For the /applications/<application_id>/stages/<stage_id>/<stage_attempt_id> API, you will see the following new info for each executor under executorSummary:
  "peakMemoryMetrics" : {
    "JVMHeapMemory" : 0,
    "JVMOffHeapMemory" : 0,
    "OnHeapExecutionMemory" : 0,
    "OffHeapExecutionMemory" : 0,
    "OnHeapStorageMemory" : 0,
    "OffHeapStorageMemory" : 0,
    "OnHeapUnifiedMemory" : 0,
    "OffHeapUnifiedMemory" : 0,
    "DirectPoolMemory" : 0,
    "MappedPoolMemory" : 0,
    "ProcessTreeJVMVMemory" : 0,
    "ProcessTreeJVMRSSMemory" : 0,
    "ProcessTreePythonVMemory" : 0,
    "ProcessTreePythonRSSMemory" : 0,
    "ProcessTreeOtherVMemory" : 0,
    "ProcessTreeOtherRSSMemory" : 0,
    "MinorGCCount" : 0,
    "MinorGCTime" : 0,
    "MajorGCCount" : 0,
    "MajorGCTime" : 0
  }

, and the following at the stage level:

"peakExecutorMetrics" : {
    "JVMHeapMemory" : 213367864,
    "JVMOffHeapMemory" : 189011656,
    "OnHeapExecutionMemory" : 0,
    "OffHeapExecutionMemory" : 0,
    "OnHeapStorageMemory" : 2133349,
    "OffHeapStorageMemory" : 0,
    "OnHeapUnifiedMemory" : 2133349,
    "OffHeapUnifiedMemory" : 0,
    "DirectPoolMemory" : 282024,
    "MappedPoolMemory" : 0,
    "ProcessTreeJVMVMemory" : 0,
    "ProcessTreeJVMRSSMemory" : 0,
    "ProcessTreePythonVMemory" : 0,
    "ProcessTreePythonRSSMemory" : 0,
    "ProcessTreeOtherVMemory" : 0,
    "ProcessTreeOtherRSSMemory" : 0,
    "MinorGCCount" : 13,
    "MinorGCTime" : 115,
    "MajorGCCount" : 4,
    "MajorGCTime" : 339
  }

How was this patch tested?

Added tests.

@SparkQA
Copy link

SparkQA commented Jul 7, 2020

Test build #125188 has finished for PR 29020 at commit bbeab0e.

  • This patch fails RAT tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@itamarst
Copy link

itamarst commented Jul 7, 2020

@imback82 Thank you!

@imback82
Copy link
Contributor Author

imback82 commented Jul 7, 2020

retest this please

@SparkQA
Copy link

SparkQA commented Jul 7, 2020

Test build #125244 has finished for PR 29020 at commit bbeab0e.

  • This patch fails RAT tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jul 7, 2020

Test build #125246 has finished for PR 29020 at commit cc4018a.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jul 8, 2020

Test build #125278 has finished for PR 29020 at commit 2d5b73d.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@imback82
Copy link
Contributor Author

imback82 commented Jul 8, 2020

retest this please

@SparkQA
Copy link

SparkQA commented Jul 8, 2020

Test build #125309 has finished for PR 29020 at commit 2d5b73d.

  • This patch fails due to an unknown error code, -9.
  • This patch merges cleanly.
  • This patch adds no public classes.

@imback82
Copy link
Contributor Author

imback82 commented Jul 8, 2020

retest this please

@SparkQA
Copy link

SparkQA commented Jul 8, 2020

Test build #125328 has started for PR 29020 at commit 2d5b73d.

@imback82
Copy link
Contributor Author

imback82 commented Jul 8, 2020

Retest this please

@SparkQA
Copy link

SparkQA commented Jul 8, 2020

Test build #125387 has finished for PR 29020 at commit 2d5b73d.

  • This patch fails to generate documentation.
  • This patch merges cleanly.
  • This patch adds no public classes.

@imback82
Copy link
Contributor Author

imback82 commented Jul 8, 2020

retest this please

@SparkQA
Copy link

SparkQA commented Jul 8, 2020

Test build #125390 has finished for PR 29020 at commit 2d5b73d.

  • This patch fails to generate documentation.
  • This patch merges cleanly.
  • This patch adds no public classes.

@imback82
Copy link
Contributor Author

imback82 commented Jul 9, 2020

retest this please

@SparkQA
Copy link

SparkQA commented Jul 9, 2020

Test build #125508 has finished for PR 29020 at commit 2d5b73d.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@imback82
Copy link
Contributor Author

imback82 commented Jul 9, 2020

retest this please

@SparkQA
Copy link

SparkQA commented Jul 10, 2020

Test build #125517 has finished for PR 29020 at commit 2d5b73d.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@imback82
Copy link
Contributor Author

retest this please

@SparkQA
Copy link

SparkQA commented Jul 10, 2020

Test build #125544 has finished for PR 29020 at commit 2d5b73d.

  • This patch fails due to an unknown error code, -9.
  • This patch merges cleanly.
  • This patch adds no public classes.

@imback82
Copy link
Contributor Author

retest this please

@SparkQA
Copy link

SparkQA commented Jul 10, 2020

Test build #125636 has finished for PR 29020 at commit 2d5b73d.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@imback82
Copy link
Contributor Author

retest this please

@SparkQA
Copy link

SparkQA commented Jul 17, 2020

Test build #126070 has finished for PR 29020 at commit 895f5fd.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@imback82
Copy link
Contributor Author

@gengliangwang gentle ping

@gengliangwang
Copy link
Member

cc @srowen @vanzin

@dongjoon-hyun
Copy link
Member

Retest this please.

Copy link
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi, @imback82 .
For the following your comment, please add an empty commit with author @edwinalu . Then, Apache Spark script will make him as a co-author.

 I will also ask a reviewing committer to put @edwinalu as a co-author if that's the practice when a PR is "inherited". Thanks!

To committers, please don't merge this PR until this has a correct authorship.

@dongjoon-hyun
Copy link
Member

To @gengliangwang, please try to recommend the authorship commit explicitly in the Apache Spark community. Otherwise, it's forgot frequently during merging steps. Thanks!

@imback82
Copy link
Contributor Author

Thanks @dongjoon-hyun for the reminder! I added @edwinalu as a co-author with:

git commit --allow-empty --author="edwinalu <[email protected]>" -m "Add co-author"

@edwinalu
Copy link
Contributor

@imback82 , thanks for all your work on this!

@SparkQA
Copy link

SparkQA commented Jul 24, 2020

Test build #126509 has finished for PR 29020 at commit 895f5fd.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jul 24, 2020

Test build #126512 has finished for PR 29020 at commit ddcb772.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@wypoon
Copy link
Contributor

wypoon commented Jul 25, 2020

In the PR description,

"This PR proposes to expose the peak executor metrics at the stage level via the REST API (/applications/<application_id>/stages/ or /applications/<application_id>/stages/<stage_id>/<stage_attemp_id>)."

there is a typo; "<stage_attemp_id>" should be "<stage_attempt_id>".

It might be helpful to clarify that the peak values of executor metrics for the stage are shown for each stage in the first (/applications/<application_id>/stages/) and peak values of executor metrics for each executor for the stage are shown for the second /applications/<application_id>/stages/<stage_id>/<stage_attempt_id>). In other words, the first shows per-stage peaks, and the second shows per-executor per-stage peaks (for just that stage).
At least, this is my understanding of what is intended.
Edwina's PR description has

"Peak values for metrics are show for each stage. For executor summaries for each stage, the peak values per executor are also shown."

@imback82
Copy link
Contributor Author

Thanks @wypoon for the suggestion. I updated the description, and hopefully that clarifies things a bit more.

@gengliangwang
Copy link
Member

@dongjoon-hyun Thanks for the reminder!

@gengliangwang
Copy link
Member

retest this please.

@gengliangwang
Copy link
Member

@imback82 I saw this error from github action tests:

ERROR: this R is version 3.4.4, package 'SparkR' requires R >= 3.5
[error] running /home/runner/work/spark/spark/R/install-dev.sh

Could you rebase the PR to the latest master? I will merge it once the tests are passed. Thanks!

@SparkQA
Copy link

SparkQA commented Aug 4, 2020

Test build #127024 has finished for PR 29020 at commit ddcb772.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Aug 4, 2020

Test build #127027 has finished for PR 29020 at commit 2115007.

  • This patch fails due to an unknown error code, -9.
  • This patch merges cleanly.
  • This patch adds no public classes.

@imback82
Copy link
Contributor Author

imback82 commented Aug 4, 2020

retest this please

@SparkQA
Copy link

SparkQA commented Aug 4, 2020

Test build #127043 has finished for PR 29020 at commit 2115007.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@gengliangwang
Copy link
Member

Thanks, merging to master

@dongjoon-hyun
Copy link
Member

Thank you, @imback82 and @gengliangwang !

@imback82
Copy link
Contributor Author

imback82 commented Aug 4, 2020

Thanks everyone!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants