[SPARK-21492][SQL] Memory leak in SortMergeJoin #18694

zhzhan · 2017-07-20T21:38:46Z

What changes were proposed in this pull request?

Currently the memory leak error message is degraded to warning. But it does happens and impact perf of running jobs. This diff fix the memory leak caused in SortMergeJoin.

The diff is trying to exhaust the iterator, even it is not required, in order to make sure the iterator is destructed.

How was this patch tested?

Relies on existing unit test. Test in production job, and the memory leak is fixed by the diff.

SparkQA · 2017-07-20T21:42:28Z

Test build #79807 has started for PR 18694 at commit b8acae2.

gatorsmile · 2017-07-21T00:58:08Z

sql/core/src/main/scala/org/apache/spark/sql/execution/joins/SortMergeJoinExec.scala


+  def destruct(): Unit = {
+    while (leftIter.advanceNext()) {}
+    while(rightIter.advanceNext()) {}


Nit add a space between while and (

gatorsmile · 2017-07-21T01:00:19Z

retest this please

gatorsmile · 2017-07-21T01:00:52Z

What is the perf impact and how large is it?

cloud-fan · 2017-07-21T01:38:00Z

sql/core/src/main/scala/org/apache/spark/sql/execution/joins/SortMergeJoinExec.scala


+  def destruct(): Unit = {
+     while (streamedIter.advanceNext()) {}
+     while (bufferedIter.advanceNext()) {}


can you explain more where is the memory leak? Why we have to exhaust the iterators?

~~I think if we don't advance to next rows, the iterators should not load the rows?~~** If you exhaust the iterators, it actually spend unnecessary time to pull and process those rows.

** Sort will consume the input iterator first to sort.

It does introduce extra overhead. The other way is to introduce a new interface for RowIterator to destruct itself, which may be more elegant and need more change to core data structure. Memory leak is worse than extra overhead, because it causes more spill and other issues.

Detail is explained below.

viirya · 2017-07-21T02:17:00Z

Can you show some experiments that indicate there's memory leak?

SparkQA · 2017-07-21T03:24:11Z

Test build #79813 has finished for PR 18694 at commit b8acae2.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

zhzhan · 2017-07-21T03:59:41Z

The memory leak happens on following scenario. For example, in inner join, the left side is exhausted, we will stop advance the right side. Because the right side is not reach the end, the memory hold will not be released, cannot be used by any other operator, for example, UnsafeShuffleWriter, causing more spills.

When the iterator is exhausted, the following line will release all the memory.
https://github.com/apache/spark/blob/master/sql/catalyst/src/main/java/org/apache/spark/sql/execution/UnsafeExternalRowSorter.java#L169

Although the readingIterator is constructed here, and it can spill. But it will keep the current used page in memory, until the caller again invoke loadNext. Otherwise, the current page may be used by others at the time (Note that the last record in loadNext will be cloned in order to release the current page (https://github.com/apache/spark/blob/master/sql/catalyst/src/main/java/org/apache/spark/sql/execution/UnsafeExternalRowSorter.java#L167)).
The following comments actually explain why it is not spilled for the current page. We observed 32M not released.
https://github.com/apache/spark/blob/master/core/src/main/java/org/apache/spark/util/collection/unsafe/sort/UnsafeExternalSorter.java#L519

viirya · 2017-07-21T04:57:58Z

I'd doubt this is actually a memory leak as UnsafeExternalSorter already avoids memory leaks by registering a cleanup task: https://github.com/apache/spark/blob/master/core/src/main/java/org/apache/spark/util/collection/unsafe/sort/UnsafeExternalSorter.java#L159

Btw, as the comment indicates, you still can't avoid holding the memory for the cases like the operator followed by limit.

zhzhan · 2017-07-21T05:00:44Z

Thanks for the comments. Memory leak here means the data structure hold memory unnecessarily and other operator cannot use it. cleanup hook is used after task is done. The diff solve the leak for SortMergeJoin only and does not apply to the limit case. Limit is another special case and need to be taken care of separately. Actually this leak happens a lot more often than limit case, and is a serious issue.

viirya · 2017-07-21T05:31:48Z

This only makes sense if the downstream operators consume all the iterator of SortMergeJoin first and then performs its work. If the downstream operators are piped with SortMergeJoin, once the iterator from SortMergeJoin is exhausted, even one side in the Join is not exhausted, the whole pipe is finished and the cleanup will be run, right?

Compared with the overhead this change brings in, how much it improves the performance?

zhzhan · 2017-07-21T06:01:19Z

If it is assumed that the pipeline is as simple as one stage only has one operator need to spill, you are right. But if the pipeline is more complex, for example multiple operator needs to spill, this leak can cause serious issue.

SparkQA · 2017-07-21T06:17:45Z

Test build #79818 has finished for PR 18694 at commit 2703c1f.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

viirya · 2017-07-21T07:22:48Z

Do you observe significant performance improvement with this change?

zhzhan · 2017-07-21T07:49:57Z

Currently the patch helps the scenario such as Join(A, Join(B,C)). It is critical for us because we have some internal usage in which each stage may consists of tens of sort operators. We found each operators takes the memory without releasing the current page, and causes a lot of spills. Such memory leak becomes critical (ShuffledHashJoin has similar issues and we did not hit issues caused by Limit).

To me, the leak itself is a bug. If it is agreed that we should fix this type of leak, we can find a more elegant way, such as new close() interface, to avoid the overhead.

kiszk · 2017-07-24T06:28:53Z

Since To register cleanup to a cleanup task may not work as discussed at #18543, I am a little bit supportive to explicitly free memory as possible.
On the other hand, I think that the current destruct implementation looks time-consuming. Are there any other approaches to quickly free memory and to have less places to insert destruct?

cloud-fan · 2017-07-26T17:19:37Z

IMO the cleanup hook is a workaround for the limitation of iterator model: although the parent knows when to release resource of its child, but there is no way to notify the child via iterator.

Maybe we can add a close method to the iterator used in Spark SQL?

zhzhan · 2017-07-27T04:44:11Z

Close the PR and will work on adding close interface for the iterator used in SparkSQL to remove extra overhead.

taosaildrone · 2019-02-05T00:30:02Z

@zhzhan @cloud-fan , is there a jira or PR for the iterator close?

fix memory leak on SortMergeJoin

b8acae2

gatorsmile reviewed Jul 21, 2017

View reviewed changes

cloud-fan reviewed Jul 21, 2017

View reviewed changes

solve review comments

2703c1f

zhzhan closed this Jul 27, 2017

[SPARK-21492][SQL] Memory leak in SortMergeJoin #18694

[SPARK-21492][SQL] Memory leak in SortMergeJoin #18694

Uh oh!

Conversation

zhzhan commented Jul 20, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What changes were proposed in this pull request?

How was this patch tested?

Uh oh!

SparkQA commented Jul 20, 2017

Uh oh!

gatorsmile Jul 21, 2017

Choose a reason for hiding this comment

Uh oh!

gatorsmile commented Jul 21, 2017

Uh oh!

gatorsmile commented Jul 21, 2017

Uh oh!

cloud-fan Jul 21, 2017

Choose a reason for hiding this comment

Uh oh!

viirya Jul 21, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

zhzhan Jul 21, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

zhzhan Jul 21, 2017

Choose a reason for hiding this comment

Uh oh!

viirya commented Jul 21, 2017

Uh oh!

SparkQA commented Jul 21, 2017

Uh oh!

zhzhan commented Jul 21, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

viirya commented Jul 21, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

zhzhan commented Jul 21, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

viirya commented Jul 21, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

zhzhan commented Jul 21, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

SparkQA commented Jul 21, 2017

Uh oh!

viirya commented Jul 21, 2017

Uh oh!

zhzhan commented Jul 21, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

kiszk commented Jul 24, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

cloud-fan commented Jul 26, 2017

Uh oh!

zhzhan commented Jul 27, 2017

Uh oh!

taosaildrone commented Feb 5, 2019

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

zhzhan commented Jul 20, 2017 •

edited

Loading

viirya Jul 21, 2017 •

edited

Loading

zhzhan Jul 21, 2017 •

edited

Loading

zhzhan commented Jul 21, 2017 •

edited

Loading

viirya commented Jul 21, 2017 •

edited

Loading

zhzhan commented Jul 21, 2017 •

edited

Loading

viirya commented Jul 21, 2017 •

edited

Loading

zhzhan commented Jul 21, 2017 •

edited

Loading

zhzhan commented Jul 21, 2017 •

edited

Loading

kiszk commented Jul 24, 2017 •

edited

Loading