-
Notifications
You must be signed in to change notification settings - Fork 28.9k
[SPARK-54088][CORE] When TaskSchedulerImpl searches for an executor to schedule an unschedulable task, it should exclude executors that are pending to remove #52793
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
…o schedule an unschedulable task, it should exclude executors that are pending to remove
| // in turn is used to decide when we can attain data locality on a given host | ||
| protected val hostToExecutors = new HashMap[String, HashSet[String]] | ||
|
|
||
| protected val availableHostToExecutors = new HashMap[String, HashSet[String]] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we need this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maintain this for quick check
| // unschedulable tasks else we will abort immediately. | ||
| executorIdToRunningTaskIds.find(x => !isExecutorBusy(x._1)) match { | ||
| executorIdToRunningTaskIds.find( | ||
| x => isExecutorAvailable(x._1) && !isExecutorBusy(x._1)) match { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we just extend busy to include those we're planning on removing?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IF we can directly use CoarseGrainedSchedulerBackend's executorPendingToRemove should be more better, but seems can't?
| .filter { id => !executorsPendingToRemove.contains(id) } | ||
| .filter { id => force || !scheduler.isExecutorBusy(id) } | ||
| executorsToKill.foreach { id => executorsPendingToRemove(id) = !countFailures } | ||
| executorsToKill.foreach { id => { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what about decommissioned executors ?
|
|
||
| if (!launchedAnyTask) { | ||
| taskSet.getCompletelyExcludedTaskIfAny(hostToExecutors).foreach { taskIndex => | ||
| taskSet.getCompletelyExcludedTaskIfAny(availableHostToExecutors).foreach { taskIndex => |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
availableHostToExecutors or hostToAvailableExecutors
|
|
||
| if (!launchedAnyTask) { | ||
| taskSet.getCompletelyExcludedTaskIfAny(hostToExecutors).foreach { taskIndex => | ||
| taskSet.getCompletelyExcludedTaskIfAny(availableHostToExecutors).foreach { taskIndex => |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it will be simpler to just have a hash set updated for killed executor and filter out from executorIdToHost
What changes were proposed in this pull request?
Call kill executor
executor 18 failed to be killed
causing TaskSchedulerImpl can't mark TaskSet as UnschedulableTaskSet then call a new container then causing task stuck
Why are the changes needed?
Avoiding task stuck
Does this PR introduce any user-facing change?
No
How was this patch tested?
MT
Was this patch authored or co-authored using generative AI tooling?
NO