-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-29303][Web UI] Add UI support for stage level scheduling #28094
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
… down in environment page
|
Test build #120691 has finished for PR 28094 at commit
|
|
Test build #120735 has finished for PR 28094 at commit
|
|
@HyukjinKwon @attilapiros anyone available for reviewing? |
attilapiros
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry this could be out of scope of this PR but I started to read the relevant changes and related to
spark/core/src/test/scala/org/apache/spark/util/JsonProtocolSuite.scala
Lines 400 to 401 in 0dee439
| val stageInfo = new StageInfo(1, 1, "me-stage", 1, Seq.empty, Seq(1, 2, 3), "details", | |
| resourceProfileId = ResourceProfile.DEFAULT_RESOURCE_PROFILE_ID) |
I would expect a check of an old StageInfo JSON (without resourceProfileId) and whether that would be constructed the same with StageInfo constructed without any resourceProfileId as both expected to use the default value. So I believe StageInfo should have a default value for the resourceProfileId:
@DeveloperApi
class StageInfo(
...
val resourceProfileId: Int = ResourceProfile.DEFAULT_RESOURCE_PROFILE_ID) {
This can be checked in a separate unit test too as the name in case of failure would be misleading.
This way we could preserve the compatibility with earlier event logs. Is not it?
core/src/main/scala/org/apache/spark/resource/ResourceProfileManager.scala
Show resolved
Hide resolved
core/src/test/scala/org/apache/spark/scheduler/EventLoggingListenerSuite.scala
Outdated
Show resolved
Hide resolved
|
correct, the default value should keep it compatible. I did some testing with old history files and they had worked. I can add unit test if I missed it and recheck just to make sure |
attilapiros
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will continue reviewing this but here are my comments so far.
core/src/main/scala/org/apache/spark/status/AppStatusListener.scala
Outdated
Show resolved
Hide resolved
core/src/main/scala/org/apache/spark/status/AppStatusStore.scala
Outdated
Show resolved
Hide resolved
core/src/main/scala/org/apache/spark/status/AppStatusStore.scala
Outdated
Show resolved
Hide resolved
core/src/main/scala/org/apache/spark/status/AppStatusListener.scala
Outdated
Show resolved
Hide resolved
core/src/main/scala/org/apache/spark/status/AppStatusListener.scala
Outdated
Show resolved
Hide resolved
core/src/main/scala/org/apache/spark/ui/env/EnvironmentPage.scala
Outdated
Show resolved
Hide resolved
core/src/main/scala/org/apache/spark/ui/env/EnvironmentPage.scala
Outdated
Show resolved
Hide resolved
core/src/main/scala/org/apache/spark/ui/env/EnvironmentPage.scala
Outdated
Show resolved
Hide resolved
core/src/main/resources/org/apache/spark/ui/static/executorspage.js
Outdated
Show resolved
Hide resolved
core/src/main/scala/org/apache/spark/ui/env/EnvironmentPage.scala
Outdated
Show resolved
Hide resolved
|
I finished the first round. In addition to the shared review comments I miss a unit test from the |
|
Oh sorry and one more idea (this can be done separate PR too): on the Executors and Stage page where the resource profile ID is given we could add a tool tip with the resource profile content if you like. |
|
thanks for the review, will update the PR hopefully end of the week early next week as I got pulled into something. |
|
thanks, I'll add some tests in JsonProtocolSuite. I think the tooltip can be done separately so I'll file a jira for that |
|
Test build #122275 has finished for PR 28094 at commit
|
|
Test build #122280 has finished for PR 28094 at commit
|
|
test passes locally, not sure why its failing here. I realize I accidentally left a test I don't need so removed it |
|
Test build #122291 has finished for PR 28094 at commit
|
|
It fails as Resource Profile IDs are autogenerated and they come from a global state: private var _id = ResourceProfile.getNextProfileIdWhen you execute this suite only then it is really a 0 as in the given JSON string but on the Jenkins some earlier test increased it. So just a |
|
Test build #122316 has finished for PR 28094 at commit
|
attilapiros
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some tiny things otherwise LGTM.
pom.xml
Outdated
| <groupId>net.alchim31.maven</groupId> | ||
| <artifactId>scala-maven-plugin</artifactId> | ||
| <version>4.3.0</version> | ||
| <version>3.4.4</version> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This must be a mistake I guess.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes thanks for catching.
| val execStr = new StringBuilder() | ||
| execStr ++= s"\t${execReq.resourceName}: [amount: ${execReq.amount}" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: this lines can be merged into a bit more efficient one:
val execStr = new StringBuilder(s"\t${execReq.resourceName}: [amount: ${execReq.amount}")|
Test build #122409 has finished for PR 28094 at commit
|
|
anyone else have comments @HyukjinKwon @mridulm |
| val attributes: Map[String, String], | ||
| val resources: Map[String, ResourceInformation]) | ||
| val resources: Map[String, ResourceInformation], | ||
| val resourceProfileId: Int) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While deserializing existing history files, which wont have resource profile related events, will it default to using ResourceProfile.DEFAULT_RESOURCE_PROFILE_ID ?
Or is it backwardly incompatible ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so the LiveEntity creates the ExecutorSummary and it has a default of the ResourceProfile.DEFAULT_RESOURCE_PROFILE_ID and executorInfoFromJson will default it to ResourceProfile.DEFAULT_RESOURCE_PROFILE_ID when that field isn't found as well.
So it will just default to that for older history files that don't have this field.
I had tested this with some 2.x history files originally and it was working that way as expected. But I should retest again after the changes from review comments but I don't think anything has broken that.
Definitely something we want to make sure its backwards compatible.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks Tom ! My only concern was around what happens if the resource event is not fired (which it wont for older files) and how this works w.r.t existing jsons which dont have rp field populated.
Once you have validated this, rest looked good to me - but I am not really an expert on the UI side of things, so someone more familiar should probably take a look too.
|
thanks for looking @mridulm, I'll upmerge this and retest for backwards compatibility again. |
|
Upmerged and retested with 2.4, 3.0, and 3.1 history files. Everything is working with the older files from 2.4 and 3.0 and it defaults to use the DEFAULT_PROFILE when its missing from the history file. |
|
Test build #122813 has finished for PR 28094 at commit
|
|
Merging to master |
|
Hey sorry for late response. I just found out during checking the queued PRs. I just skimmed and seems fine. |
What changes were proposed in this pull request?
This adds UI updates to support stage level scheduling and ResourceProfiles. 3 main things have been added. ResourceProfile id added to the Stage page, the Executors page now has an optional selectable column to show the ResourceProfile Id of each executor, and the Environment page now has a section with the ResourceProfile ids. Along with this the rest api for environment page was updated to include the Resource profile information.
I debating on splitting the resource profile information into its own page but I wasn't sure it called for a completely separate page. Open to peoples thoughts on this.
Screen shots:




Why are the changes needed?
For user to be able to know what resource profile was used with which stage and executors. The resource profile information is also available so user debugging can see exactly what resources were requested with that profile.
Does this PR introduce any user-facing change?
Yes, UI updates.
How was this patch tested?
Unit tests and tested on yarn both active applications and with the history server.