[SPARK-17637][Scheduler]Packed scheduling for Spark tasks across executors #15541

zhzhan · 2016-10-19T01:26:48Z

What changes were proposed in this pull request?

Restructure the code and implement two new task assigner.
PackedAssigner: try to allocate tasks to the executors with least available cores, so that spark can release reserved executors when dynamic allocation is enabled.

BalancedAssigner: try to allocate tasks to the executors with more available cores in order to balance the workload across all executors.

By default, the original round robin assigner is used.

We test a pipeline, and new PackedAssigner save around 45% regarding the reserved cpu and memory with dynamic allocation enabled.

How was this patch tested?

(Please explain how this patch was tested. E.g. unit tests, integration tests, manual tests)
Both unit test in TaskSchedulerImplSuite and manual tests in production pipeline.

SparkQA · 2016-10-19T04:01:15Z

Test build #67157 has finished for PR 15541 at commit 75cdd1a.

This patch passes all tests.
This patch merges cleanly.
This patch adds the following public classes (experimental):
- class OfferState(val workOffer: WorkerOffer)
- class RoundRobinAssigner extends TaskAssigner
- class BalancedAssigner extends TaskAssigner
- class PackedAssigner extends TaskAssigner

zhzhan · 2016-10-19T04:10:42Z

@rxin @gatorsmile Can you please take a look, and kindly provide your comments.

gatorsmile · 2016-10-19T04:17:46Z

core/src/main/scala/org/apache/spark/scheduler/TaskAssigner.scala

How about this?

val assignerName = conf.get(config.SPARK_SCHEDULER_TASK_ASSIGNER.key, "roundrobin") val className = assignerMap.getOrElse(assignerName.toLowerCase, roundrobin)

Put a log info or warn when the given assignerName is not correct instead of slightly turning to default one.

gatorsmile · 2016-10-19T04:19:59Z

core/src/main/scala/org/apache/spark/scheduler/TaskAssigner.scala

gatorsmile · 2016-10-19T04:35:21Z

core/src/main/scala/org/apache/spark/scheduler/TaskAssigner.scala

If I just read this function, my first question is how we can ensure this will not be out of boundary? We need to leave a comment to explain this. Or add a safety check for avoiding any bug we could add in the future

will change to make it similar to Iterator.next() method and add comments with similar comments to the Iterator.next()

gatorsmile · 2016-10-19T04:37:10Z

core/src/main/scala/org/apache/spark/scheduler/TaskSchedulerImpl.scala

Two space between private and val

gatorsmile · 2016-10-19T04:46:50Z

core/src/main/scala/org/apache/spark/scheduler/TaskAssigner.scala

You can remove line 109 after following the change:

assigner.withCpuPerTask(CPUS_PER_TASK = conf.getInt("spark.task.cpus", 1))

viirya · 2016-10-19T05:00:40Z

core/src/main/scala/org/apache/spark/scheduler/TaskAssigner.scala

If the current offer is not assigned, why we need to step to next offer if coresAvailable is still enough?

There are two cases :
a) There is no (or insufficient) locality information - in which case, what you describe will hold.
All subsequent requests will also not result in assignment.

b) If there are other executors for which sufficient locality affinity holds, then a 'later' executor in the iteration order can satisfy the locality preference.

The assignment is decided by TaskSetManager eventually - the Assigner is simply specifying the order in which iteration proceeds.

If the current offer is rejected, it is not valid for the current taskset, (probably due to locality restriction). Each scheduling algorithm has to respect the locality restriction, and in the meantime provide next available offer to the taskset.

viirya · 2016-10-19T05:16:11Z

core/src/main/scala/org/apache/spark/scheduler/TaskAssigner.scala

As you will put offers into the PriorityQueue, is it still necessary to do shuffling?

The comments from @mridulm in last PR. I think it is reasonable, but don't have concrete answer in my mind.

"Would be good to shuffle workOffset's for this class too.
Practically, this ensures that initial heap will be randomized when cores are the same. "

It sounds correct. However, I don't think it has real effect. Once the cores are the same, meaning no task gets assigned in previous run. So it doesn't matter if we begin with different order of offers.

After talking to several other people, they don't feel the shuffle is strongly needed. @mridulm If you don't mind, I will remove it in my next patch.

@zhzhan Can you elaborate what the concern with shuffle'ing are ?
There were various reasons why we started shuffling offers.

@mridulm I am not sure how much the shuffle can impact the scheduling, and thus don't have strong opinion on this.
@viirya Even if the cores are the same, it does not mean that "no task gets assigned in previous run". Shuffling does take effect here. For example, the previous round may be (5, 4, 3), and one core is allocated, then the current round would be (4, ,4 3).

@zhzhan Let is preserve the behavior - for any application using this assigner, all tasksets will be executed based on the ordering of offers (both with and without good locality info).
The impact can be fairly non trivial - which is why shuffle'ing was initially added.

viirya · 2016-10-19T05:19:03Z

core/src/main/scala/org/apache/spark/scheduler/TaskAssigner.scala

Use val and call clear in init below?

viirya · 2016-10-19T05:21:23Z

core/src/main/scala/org/apache/spark/scheduler/TaskAssigner.scala

As pointed above, call clear?

viirya · 2016-10-19T05:28:51Z

core/src/main/scala/org/apache/spark/scheduler/TaskAssigner.scala

As this assigner will try to pack as more as possible tasks into the same worker, the concern would be the increasing memory pressure on the worker. Do you have experienced such issue in your practical usage?

Your concern is valid. Each scheduling algorithms has its pros and cons, and which one is chosen depends on user's requirement. We mainly want to use this to save reserved resources combined with dynamic allocation. In our pipeline, we didn't observe the problem. If it happens, we need to investigate the memory allocation part to see whether it has problem.

viirya · 2016-10-19T05:34:15Z

core/src/main/scala/org/apache/spark/scheduler/TaskAssigner.scala

Do we need reset()? Looks like we only need init(). As we will call init before each assignment, it should be complete in resetting the status to initial.

My concern is that if we do not use reset, the assigner has to keep internal resources until next time, but it is not big overhead.

Looks like you don't have big object which posts serious concern on this.

tejasapatil · 2016-10-19T04:15:43Z

core/src/main/scala/org/apache/spark/scheduler/TaskAssigner.scala

"this worker" ? this wont represent a worker here.

tejasapatil · 2016-10-19T04:16:23Z

core/src/main/scala/org/apache/spark/scheduler/TaskAssigner.scala

nit: requested => requests

tejasapatil · 2016-10-19T04:16:35Z

core/src/main/scala/org/apache/spark/scheduler/TaskAssigner.scala

nit: Tracking => Tracks

tejasapatil · 2016-10-19T04:18:24Z

core/src/main/scala/org/apache/spark/scheduler/TaskAssigner.scala

nit: different the locality restrictions => different locality restrictions

tejasapatil · 2016-10-19T04:20:51Z

core/src/main/scala/org/apache/spark/scheduler/TaskAssigner.scala

It might be better to put these into separate points and not as a paragraph. Also, I am not sure what the protocol is about putting details like method names in the doc. As things stand, it will serve good for people trying to read the code but as the codebase evolves, things might get out of sync if this comment is not updated.

tejasapatil · 2016-10-19T05:13:04Z

docs/configuration.md

nit: create a list for each policy and explain inline instead of saying former, latter below.

tejasapatil · 2016-10-19T05:21:14Z

core/src/main/scala/org/apache/spark/scheduler/TaskSchedulerImpl.scala

nit: extra space in private val

tejasapatil · 2016-10-19T05:21:51Z

core/src/main/scala/org/apache/spark/scheduler/TaskSchedulerImpl.scala

nit: space after while

tejasapatil · 2016-10-19T05:32:49Z

core/src/main/scala/org/apache/spark/scheduler/TaskAssigner.scala

Shouldn't you add it to the heap any ways despite of what assigned is set to ?

If it is rejected, it is not valid for this round of assignment for this specific task set anymore. Because it means it is not valid for all tasks in the task set.

tejasapatil · 2016-10-19T05:51:51Z

core/src/main/scala/org/apache/spark/scheduler/TaskAssigner.scala

same as before. Shouldn't you add it to the heap any ways despite of what assigned is set to ?

Similar to the reason above, if the offer is rejected, we have to move forward

viirya · 2016-10-19T06:21:33Z

docs/configuration.md

nit: packed and balanced are provided.

SparkQA · 2016-10-20T01:55:21Z

Test build #67224 has finished for PR 15541 at commit e81b279.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

viirya · 2016-10-20T07:05:06Z

core/src/main/scala/org/apache/spark/scheduler/TaskAssigner.scala

Sounds like the shuffling reason of BalancedAssigner can be applied here too? If shuffling, this ensures that initial sorted offers will be randomized when cores are the same, right?

@viirya Agree, good point.

SparkQA · 2016-10-20T23:26:29Z

Test build #67290 has finished for PR 15541 at commit 945d623.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

zhzhan · 2016-10-21T03:30:36Z

@rxin Can you please take a look, and let me know if you have any concern?

gatorsmile · 2016-10-21T05:19:50Z

Accidentally, I deleted all my comments. You might need to check the emails to find all my comments. :)

wangmiao1981 · 2016-10-21T05:48:02Z

core/src/main/scala/org/apache/spark/scheduler/TaskAssigner.scala

Is this class private to scheduler?

zhzhan · 2016-10-21T05:50:20Z

@gatorsmile I didn't see your new comments

wangmiao1981 · 2016-10-21T05:50:25Z

core/src/main/scala/org/apache/spark/scheduler/TaskAssigner.scala

perform -> performs

wangmiao1981 · 2016-10-21T05:55:04Z

core/src/main/scala/org/apache/spark/scheduler/TaskAssigner.scala

invoke -> invokes

wangmiao1981 · 2016-10-21T05:59:48Z

core/src/main/scala/org/apache/spark/scheduler/TaskAssigner.scala

@return is not aligned with the line above.

wangmiao1981 · 2016-10-21T06:00:40Z

core/src/main/scala/org/apache/spark/scheduler/TaskAssigner.scala

Whether -> whether

wangmiao1981 · 2016-10-21T06:03:49Z

core/src/main/scala/org/apache/spark/scheduler/TaskAssigner.scala

It -> In

wangmiao1981 · 2016-10-21T06:08:18Z

core/src/main/scala/org/apache/spark/scheduler/TaskAssigner.scala

I remembered that in your original PR, there is a resource release method. Do you still need it?

Based on the review comments, we do not need it anymore. The resource will be released in the init method.

wangmiao1981 · 2016-10-21T06:20:22Z

docs/configuration.md

missed space between . and The

SparkQA · 2016-10-22T22:21:55Z

Test build #67395 has finished for PR 15541 at commit dd2b207.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2016-10-24T06:26:47Z

Test build #67428 has finished for PR 15541 at commit a820e96.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

zhzhan · 2016-10-25T16:38:43Z

@rxin Would you like to take a look and let you know if you have any concern? Thanks.

gatorsmile · 2016-11-01T02:06:58Z

retest this please

rxin · 2016-11-01T02:09:45Z

Sure will take a look in the next couple of days to get this into 2.1 if possible.

rxin · 2016-11-01T02:17:02Z

core/src/main/scala/org/apache/spark/scheduler/TaskAssigner.scala

what's the difference between this and packed? Wouldn't they look similar? Why would anyone use one over another?

@rxin These two do the opposite thing. The packed scheduler tries to schedule tasks to workers as few as possible so that some workers without task running can be released. The balanced assigner tries to schedule the tasks to workers with the least work load.

If user wants optimal resource reservation, they may want the packer assigner. If user observe some memory pressure, they may want to try the balanced assigner.

Sorry I made a mistake -- I meant to ask the difference between balance and round robin. Isn't the two similar?

These two assigner may behave similar in practice. The difference is that the balanced assigner tries to distribute the work load more aggressively.

SparkQA · 2016-11-01T04:48:34Z

Test build #67863 has finished for PR 15541 at commit a820e96.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

rxin · 2016-11-01T06:59:34Z

core/src/main/scala/org/apache/spark/scheduler/TaskAssigner.scala

i'd actually fail Spark if it cannot be constructed -- otherwise it is easier to make mistakes.

rxin · 2016-11-01T07:01:52Z

To be honest, I find the current API pretty weird (it is some stateful object that has to be reset every time). I suspect you designed this API by just abstracting out the logic you wanted to change from the existing implementation, but that doesn't necessarily lead to intuitive apis. It's been a while since I last checked the scheduler code, so it'd take me a while to page back in.

lins05 · 2016-11-01T13:12:07Z

docs/configuration.md

Nit: I suggest double quote the keywords "roundrobin", "packed", and "balanced" in this paragraph. E.g. the "balanced" task assigner sounds better to me than the balanced task assigner.

lins05 · 2016-11-01T15:50:02Z

core/src/main/scala/org/apache/spark/scheduler/TaskSchedulerImpl.scala

The comments of the resourceOffers method shoud be updated. It still says We fill each node with tasks in a round-robin manner so that tasks are balanced across the cluster.

zhzhan · 2016-11-01T18:10:21Z

@rxin Thanks for the feedback regarding the TaskAssigner API. The current API is designed based on the current logic of TaskSchedulerImp, where the scheduler takes many rounds to assign the tasks for each task set. I have not figured out a better way yet. Any suggestions are welcome.

SparkQA · 2016-11-01T23:49:53Z

Test build #67924 has finished for PR 15541 at commit b06de5e.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2016-11-22T22:27:39Z

Test build #69015 has finished for PR 15541 at commit ada2a45.

This patch fails from timeout after a configured wait of `250m`.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2017-02-07T15:21:02Z

Test build #72506 has finished for PR 15541 at commit ada2a45.

This patch fails Spark unit tests.
This patch does not merge cleanly.
This patch adds no public classes.

jiangxb1987 · 2017-06-22T12:24:27Z

@rxin @zhzhan Is there any chance that we could reach any consensus on the API design and move forward with this PR?

SparkQA · 2018-10-22T04:55:23Z

Test build #97708 has started for PR 15541 at commit ada2a45.

SparkQA · 2018-10-22T05:52:13Z

Test build #97724 has started for PR 15541 at commit ada2a45.

SparkQA · 2018-10-22T13:13:08Z

Test build #97780 has started for PR 15541 at commit ada2a45.

AmplabJenkins · 2018-10-22T16:36:50Z

Build finished. Test FAILed.

gatorsmile reviewed Oct 19, 2016

View reviewed changes

core/src/main/scala/org/apache/spark/scheduler/TaskAssigner.scala Outdated

Copy link

Member

gatorsmile Oct 19, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

one line?

zhzhan reacted with thumbs up emoji

gatorsmile reviewed Oct 19, 2016

View reviewed changes

core/src/main/scala/org/apache/spark/scheduler/TaskSchedulerImpl.scala Outdated

Copy link

Member

gatorsmile Oct 19, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two space between private and val

gatorsmile reviewed Oct 19, 2016

View reviewed changes

viirya reviewed Oct 19, 2016

View reviewed changes

tejasapatil reviewed Oct 19, 2016

View reviewed changes

viirya reviewed Oct 19, 2016

View reviewed changes

docs/configuration.md Outdated

Copy link

Member

viirya Oct 19, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: packed and balanced are provided.

viirya reviewed Oct 20, 2016

View reviewed changes

wangmiao1981 reviewed Oct 21, 2016

View reviewed changes

core/src/main/scala/org/apache/spark/scheduler/TaskAssigner.scala Outdated

Copy link

Contributor

wangmiao1981 Oct 21, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this class private to scheduler?

wangmiao1981 reviewed Oct 21, 2016

View reviewed changes

core/src/main/scala/org/apache/spark/scheduler/TaskAssigner.scala Outdated

Copy link

Contributor

wangmiao1981 Oct 21, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

perform -> performs

wangmiao1981 reviewed Oct 21, 2016

View reviewed changes

core/src/main/scala/org/apache/spark/scheduler/TaskAssigner.scala Outdated

Copy link

Contributor

wangmiao1981 Oct 21, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

invoke -> invokes

wangmiao1981 reviewed Oct 21, 2016

View reviewed changes

core/src/main/scala/org/apache/spark/scheduler/TaskAssigner.scala Outdated

Copy link

Contributor

wangmiao1981 Oct 21, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@return is not aligned with the line above.

wangmiao1981 reviewed Oct 21, 2016

View reviewed changes

core/src/main/scala/org/apache/spark/scheduler/TaskAssigner.scala Outdated

Copy link

Contributor

wangmiao1981 Oct 21, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Whether -> whether

wangmiao1981 reviewed Oct 21, 2016

View reviewed changes

core/src/main/scala/org/apache/spark/scheduler/TaskAssigner.scala Outdated

Copy link

Contributor

wangmiao1981 Oct 21, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It -> In

wangmiao1981 reviewed Oct 21, 2016

View reviewed changes

docs/configuration.md Outdated

Copy link

Contributor

wangmiao1981 Oct 21, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

missed space between . and The

rxin reviewed Nov 1, 2016

View reviewed changes

lins05 reviewed Nov 1, 2016

View reviewed changes

zhzhan force-pushed the TaskAssigner branch from a820e96 to b06de5e Compare November 1, 2016 21:16

Zhan Zhang added 9 commits November 22, 2016 10:00

TaskAssigner to support different scheduling algorithms

45d568c

solve review comments

7e5ec1e

shuffle all initial offers

c1f2a9c

solve review comments

bbd39e7

resolve review comments

4ebef6c

solve review comments

a42325d

solve review comments

ce6f06a

solve review comments

bfd4173

solve review comments

ada2a45

zhzhan force-pushed the TaskAssigner branch from b06de5e to ada2a45 Compare November 22, 2016 18:15

zhzhan closed this Oct 22, 2018

[SPARK-17637][Scheduler]Packed scheduling for Spark tasks across executors #15541

[SPARK-17637][Scheduler]Packed scheduling for Spark tasks across executors #15541

Uh oh!

Conversation

zhzhan commented Oct 19, 2016

What changes were proposed in this pull request?

How was this patch tested?

Uh oh!

SparkQA commented Oct 19, 2016

Uh oh!

zhzhan commented Oct 19, 2016

Uh oh!

gatorsmile Oct 19, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

gatorsmile Oct 19, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

gatorsmile Oct 19, 2016 •

edited

Loading

gatorsmile Oct 19, 2016 •

edited

Loading

viirya Oct 20, 2016 •

edited

Loading