-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-11460][Core] Increase locality level when waiting too long from task set creation time #9433
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Test build #44916 has finished for PR 9433 at commit
|
|
ping @rxin |
|
The reason the current delay scheduling algorithm doesn’t work this way is because it was considering a scenario like: a bunch of jobs are using the same workers, so as a result, job A can only run a small number of tasks at a time. In this scenario, it doesn’t make sense to look at the time since Job A was submitted, because there will be points where a lot of time has elapsed just because the fairness policy dictates that job A can’t run many tasks at once. If job A has the opportunity to run a task on a non-local worker, it probably doesn’t want to “waste” one of the few slots it’s allowed to use at once on the non-local executor, instead preferring to wait spark.locality.wait to see if a local executor becomes available (for more about this, see the delay scheduling paper: http://elmeleegy.com/khaled/papers/delay_scheduling.pdf). This patch is addressing a different scenario, where the amount of concurrency the job can use is mostly limited by the locality wait, rather than by some fairness policy. I think the ideal way to fix this problem is with the following policy: if the task set is using less than the number of slots it could be using (where “# slots it could be using” is all of the slots in the cluster if the job is running alone, or the job’s fair share, if it’s not) for some period of time, increase the locality level. The current delay scheduling policy used by Spark is essentially implementing a very simplified version of this ideal policy, where the way it determines if the job is using as many slots as it could be is just to see if a task has been launched recently. This patch adds another heuristic to get closer to this ideal policy. I’m hesitant to merge this patch for a few reasons: On the other hand, the use case this is addressing seems likely to be fairly common, since many folks run Spark jobs in a context where only one Spark job is running on a set of workers at a time. @mateiz what are your thoughts on this? As an aside, can you clarify the description of this PR to explicitly say that you’re introducing a new set of locality timers, based on the time since the task set was submitted? |
Conflicts: core/src/main/scala/org/apache/spark/scheduler/TaskSetManager.scala
|
@kayousterhout Sorry for replying late. For your first concern, that is the job A might not want to waste on non-local executor and is more like to wait spark.locality.wait for local executor. As in |
|
Test build #60484 has finished for PR 9433 at commit
|
|
@kayousterhout About the issue that this may make the scheduling policy harder to reason about and confuse others. I think we can disable this by default (by using Long.MaxValue as default waiting time). So let the people who know how to tune the configuration parameters to use it. |
|
ping @kayousterhout |
|
@kayousterhout Any ideas on this? |
|
Seems no interest from you for this. Close this now. |
What changes were proposed in this pull request?
Currently we increase the locality level of tasks if the time elapsed since last task launched at current locality level is more than a threshold.
However, it is possible that the time lapsed is not more than the threshold, but we still wait too long since the task set was created. In this case, we should also increase the locality level to allow more tasks can be executed.
spark.locality.wait is based on last launched time. However, as the JIRA claimed, it is possible that we wait too long from task set creation time, not last launch time. In this case, we should also increase locality level.
This patch introduces a new set of locality timers, based on the time since the task set was submitted. Once the time lapsed since the task set was submitted is more than particular timer, the allowed locality level of task will be increased.
How was this patch tested?
TaskSetManagerSuite.