-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-31837][CORE] Shift to the new highest locality level if there is when recomputeLocality #28656
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-31837][CORE] Shift to the new highest locality level if there is when recomputeLocality #28656
Changes from all commits
d02bc04
8c333e1
c742666
312700a
0773806
cb51bb4
8744f1e
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -1107,10 +1107,19 @@ private[spark] class TaskSetManager( | |
| def recomputeLocality(): Unit = { | ||
| // A zombie TaskSetManager may reach here while executorLost happens | ||
| if (isZombie) return | ||
| val previousLocalityIndex = currentLocalityIndex | ||
| val previousLocalityLevel = myLocalityLevels(currentLocalityIndex) | ||
| val previousMyLocalityLevels = myLocalityLevels | ||
| myLocalityLevels = computeValidLocalityLevels() | ||
| localityWaits = myLocalityLevels.map(getLocalityWait) | ||
| currentLocalityIndex = getLocalityIndex(previousLocalityLevel) | ||
| if (currentLocalityIndex > previousLocalityIndex) { | ||
| // SPARK-31837: If the new level is more local, shift to the new most local locality | ||
| // level in terms of better data locality. For example, say the previous locality | ||
| // levels are [PROCESS, NODE, ANY] and current level is ANY. After recompute, the | ||
| // locality levels are [PROCESS, NODE, RACK, ANY]. Then, we'll shift to RACK level. | ||
| currentLocalityIndex = getLocalityIndex(myLocalityLevels.diff(previousMyLocalityLevels).head) | ||
|
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Hi all, there's a defect in the previous implement(always reset
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. yes this is one of the cases I was referring to. Ideally you would never run into this case because a host is on a rack so you would always have it. Unfortunately Spark defaults the rack to None so you can. I was going to improve upon it in the jira I filed. We can certainly handle some here if you want
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
What do you mean by "handle some here"? I read your JIRA and don't find the specific solution that could be added to this PR. Could you please elaborate more?
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I just mean there are a bunch of corner cases and I don't think resetting this on every executor added is ideal. I did not list out all of them. On Yarn it actually defaults to default-rack rather then None so it actually won't have this issue because every node has a rack. I agree that the code you have here is an improvement to handle the rack case. |
||
| } | ||
| } | ||
|
|
||
| def executorAdded(): Unit = { | ||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should change this to only happen when executorAdded. this is also called on lost and decommission and it doesn't make sense to go to "lower" level. Note we may want to stay away from saying higher in the comment below. The code values, lower is actually more strict - meaning process is lowest value. so perhaps don't say higher or lower but say more local or less local. Perhaps pass in parameter from executorAdded.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's impossible to go to "lower" or more local level in case of lost and decommission. Lost and decommission would remove executors, so the locality levels can only be less compared to the previous locality levels. It also means, lost and decommission will not add new more local levels.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yea, good point!