[SPARK-22713][CORE] ExternalAppendOnlyMap leaks when spilled during iteration#21369
[SPARK-22713][CORE] ExternalAppendOnlyMap leaks when spilled during iteration#21369eyalfa wants to merge 22 commits intoapache:masterfrom
Conversation
…removing the reference to the initial iterator.
|
Test build #90829 has finished for PR 21369 at commit
|
|
@lianhuiwang, @davies, @hvanhovell can you please have a look? |
|
Test build #90830 has finished for PR 21369 at commit
|
|
Test build #90831 has finished for PR 21369 at commit
|
|
retest this please |
|
Test build #90832 has finished for PR 21369 at commit
|
| val nextUpstream = spillMemoryIteratorToDisk(upstream) | ||
| assert(!upstream.hasNext) | ||
| hasSpilled = true | ||
| upstream = nextUpstream |
There was a problem hiding this comment.
Does the change means we should reassign upstream (which eliminates reference to currentMap) after spill immediately, otherwise, we may hit OOM (e.g. never readNext() after spill - is this the real cause for JIRA issue?) ?
There was a problem hiding this comment.
Basically yes, according to my understanding of the code this should have happened on the subsequent hasNext/next call. However according to the analysis in the jira the iterator kept holding this reference, my guess: at this point the entire program started suffering lengthy GC pauses that got it into behaving as if under a deadlock,effectively leaving the ref in place (just a guess)
There was a problem hiding this comment.
Thanks for fixing this issue. I think the potential solution is to change the upstream reference, but I have not tested if this change is sufficient and safe.
There was a problem hiding this comment.
@JerryLead, I'd appreciate if you could test this.
One more thing that bugs me is that there's another case when the iterator no longer needs the upstream iterator/underlying map but still holds a reference to it:
When the iterator reach EOF its hasNext method returns false which causes the wrapping CompletionIterator to call the cleanup function which simply nulls the underlying map member variable in ExternalAppendOnlyMap, the iterator member is not nulled out so we end up with the CompletionIterator holding two paths to the upstrean iterator which leads to the underlying map: first it still holds a reference to the iterator itself, however it still holds a reference to the cleanup closure which refers the ExternalAppendOnlyMap which still refers to the current iterator which refers upstream...
This can be solven in one of two ways:
Simple way, when creating the completion iterator, provide a closure referring the iterator, not the ExternalAppendOnlyMap.
Thorough way, modify completion iterator to null out references after cleaning up.
Having that said, I'm not sure how long a completed iterator may be 'sitting' before being discarded so I'm not sure if this is worth fixing, especially using the thorough approach.
There was a problem hiding this comment.
destroy re-assigns upstream, once destroy is called, we should be fine?
There was a problem hiding this comment.
@cloud-fan , the assumption here is that there are two references to the underlying map: the upstream iterator and the external map itself.
the destroy method first removes the ref via upstream-iterator than delegates to the method that clears the ref via the external map member (currentMap I think), so unless we've missed another ref we should be fine.
as I wrote above, I think there's a potentially more fundamental issue with CompletionIterator which keeps holding references via it's sub and completionFunction members , these might stall some objects from being collected and can be eliminated upon exhaustion of the iterator. there might be some more 'candidates' like LazyIterator and InterruptibleIterator, I think this desrves some more investigation.
There was a problem hiding this comment.
@cloud-fan , do you think this is worth doing, I'm referring to the CompletionIterator delaying GC of the sub iterator and cleanup function (usually a closure referring to a larger collection).
if so, I'd open a separate JIRA+PR for this.
|
cc @JerryLead |
|
well, I took the time trying to figure out how's the iterator is eventually being used, |
…showing that an exhausted iterator still reffers the underlying map.
…SpillableIterator.toCompletionIterator and use that instead of 'maually' creating the completion iterator. also introduced SpillableIterator.destoy which removes the reference to the upstream iterator and calls freeCurrentMap().
|
Test build #90854 has finished for PR 21369 at commit
|
|
retest this please |
|
Test build #90861 has finished for PR 21369 at commit
|
advancedxy
left a comment
There was a problem hiding this comment.
The change generally looks good to me except some style issues.
And we should make sure this change indeed fixes the memory leak, so appreciated if @JerryLead could verify it.
| private val sortedMap = CompletionIterator[(K, C), Iterator[(K, C)]](destructiveIterator( | ||
| currentMap.destructiveSortedIterator(keyComparator)), freeCurrentMap()) | ||
| private val sortedMap = destructiveIterator( | ||
| currentMap.destructiveSortedIterator(keyComparator)) |
There was a problem hiding this comment.
These two lines can be merged into one line?
There was a problem hiding this comment.
unfortunately no, scala-style enforces a max of 100 chars per line
| } | ||
| } | ||
|
|
||
| def destroy() : Unit = { |
There was a problem hiding this comment.
Should be private as freeCurrentMap
| upstream = Iterator.empty | ||
| } | ||
|
|
||
| def toCompletionIterator: CompletionIterator[(K, C), SpillableIterator] = { |
There was a problem hiding this comment.
I'd prefer private for this method
|
|
||
| /** | ||
| * A comparator which sorts arbitrary keys based on their hash codes. | ||
| * A comparator which sorts arbitrary keys bas on their hash codes. |
| import org.apache.spark.util.CompletionIterator | ||
|
|
||
| class ExternalAppendOnlyMapSuite extends SparkFunSuite with LocalSparkContext { | ||
| class ExternalAppendOnlyMapSuite extends SparkFunSuite with LocalSparkContext{ |
There was a problem hiding this comment.
Note the space. LocalSparkContext {
|
Thank you all for fixing this issue. I'm sorry that I'm now writing a research paper about Spark GC and going to submit this paper next month. Since there is a lot of paper work to do, the verification may be performed next month. |
|
Test build #90943 has finished for PR 21369 at commit
|
|
@advancedxy , using jvisualvm+heap dump I could see that the second introduced test case ("drop all references to the underlying map once the iterator is exhausted") eliminated all references to the underlying map: |
| def destructiveIterator(inMemoryIterator: Iterator[(K, C)]): Iterator[(K, C)] = { | ||
| readingIterator = new SpillableIterator(inMemoryIterator) | ||
| readingIterator | ||
| readingIterator.toCompletionIterator |
There was a problem hiding this comment.
This change the original behavior of destructiveIterator . I'd prefer do like this:
CompletionIterator[(K, C), Iterator[(K, C)]](
destructiveIterator(currentMap.iterator), readingIterator.destroy)
which keep compatibility with current code, and do not introduce unnecessary function.
There was a problem hiding this comment.
What behavior does it change? Your suggested codes does exactly the same but is less streamlined and relies on an intermediate value (fortunately it's already a member variable)
There was a problem hiding this comment.
destructiveIterator should just return a destructive iterator (especially for map buffer) as it's function name implies, and it is none business of CompletionIterator . And developers should be free to define the complete function for the returned destructive iterator, in case of we want a different one somewhere else in future.
Your suggested codes does exactly the same but is less streamlined
I don't think this little change will pay a huge influence on streamlined .
and relies on an intermediate value (fortunately it's already a member variable)
The current fix leads to this, not me. And even this variable is not a member variable, we can define a temp local variable. It's not a big deal.
|
|
||
| private def destroy() : Unit = { | ||
| freeCurrentMap() | ||
| upstream = Iterator.empty |
There was a problem hiding this comment.
Safer, class remains usable if for some reason hasNext is called again, and this costs absolutely nothing.
| sc.stop() | ||
| } | ||
|
|
||
| test("spill during iteration") { |
There was a problem hiding this comment.
I understand what this test want to do. But it seems code without this PR could also pass it if everything goes normally. And I know it's a little hard to reflect the change by unit test. So, I'd prefer to leave some comments to explain the potential memory leak in source code above.
There was a problem hiding this comment.
This test was written BEFORE the actual fix and it did fail up untill the fix was in place. I do agree it's a bit clumsy and potential future changes may break the original intention of the test. I've referred a potential testing approach (currently limited to scala's source code) which couldn't be (easily) applied to this code base so I made a best effort to test this.
I agree this needs better documentation, I'll start be referring the issue in the test's name and will also add comments to the code.
Thanks. cc @cloud-fan for final review |
| assert(keys == (0 until 100)) | ||
| } | ||
|
|
||
| test("drop all references to the underlying map once the iterator is exhausted") { |
There was a problem hiding this comment.
let's also put the jira number in the test name.
| upstream = Iterator.empty | ||
| } | ||
|
|
||
| private[ExternalAppendOnlyMap] |
There was a problem hiding this comment.
it's pretty reasonable to have this method public.
There was a problem hiding this comment.
hmm... the class itself is private (slightly relaxed to package private to ease testing) so I'm not sure what's the benefit in making the method public,
in any case I think that once we see the use case for making this method public we'd probably has to further change the iterator/external map classes.
There was a problem hiding this comment.
It's weird to see a class private method. I'd suggest just remove private[ExternalAppendOnlyMap]. spill is only called in ExternalAppendOnlyMap and it's public.
| val nextUpstream = spillMemoryIteratorToDisk(upstream) | ||
| assert(!upstream.hasNext) | ||
| hasSpilled = true | ||
| upstream = nextUpstream |
There was a problem hiding this comment.
destroy re-assigns upstream, once destroy is called, we should be fine?
| assert(map.numSpills == 0, "map was not supposed to spill") | ||
|
|
||
| val it = map.iterator | ||
| assert( it.isInstanceOf[CompletionIterator[_, _]]) |
There was a problem hiding this comment.
nit: no space after assert(
| assert(map.currentMap == null) | ||
| assert(underlyingIt.upstream ne underlyingMapIterator) | ||
| assert(underlyingIt.upstream.getClass != underlyingMapIteratorClass) | ||
| assert(underlyingIt.upstream.getClass.getEnclosingClass != classOf[AppendOnlyMap[_, _]]) |
There was a problem hiding this comment.
we want to prove we are no longer holding the reference, why do we check type here?
There was a problem hiding this comment.
the underlying map's iterator is an anonymous class, this is the best I could come up with to check if the upstream iterator holds a ref to the underlying map.
@cloud-fan , do you have a better idea (I'm not 100% happy with this one)
There was a problem hiding this comment.
can we simply check assert(underlyingIt.upstream eq Iterator.empty)?
There was a problem hiding this comment.
hmm, we can in line 508 but not in this test.
in this test we look at the iterator immediately after a spill, at this point upstream is supposed to be replaced by a DiskMapIterator, I guess we can check for this directly (after relaxing its visibility to package private).
in line 508, we can simply compare with Iterator.empty
|
cc @JoshRosen |
|
Test build #91065 has finished for PR 21369 at commit
|
…write the first test using WeakReference.
…ive_spill__weak_ref_test
…iled getting the weak ref testing based approach to work
|
retest this please |
|
Test build #94559 has finished for PR 21369 at commit
|
…e wek ref to assert the map is no longer reachable.
…x tests to effectively test for non-reachabillity of the internal map.
…o SPARK-22713__ExternalAppendOnlyMap_effective_spill__weak_ref_test
…k_ref_test' into SPARK-22713__ExternalAppendOnlyMap_effective_spill
|
retest this please |
|
@hvanhovell ,thanks for picking this up 😎 |
|
Test build #94626 has finished for PR 21369 at commit
|
|
Test build #94627 has finished for PR 21369 at commit
|
|
Test build #94629 has finished for PR 21369 at commit
|
| /** | ||
| * Exposed for testing | ||
| */ | ||
| @volatile private[collection] var readingIterator: SpillableIterator = null |
There was a problem hiding this comment.
This is not exposed in the test.
| /** | ||
| * Exposed for testing | ||
| */ | ||
| private[collection] class SpillableIterator(var upstream: Iterator[(K, C)]) |
| // https://github.com/scala/scala/blob/2.13.x/test/junit/scala/tools/testing/AssertUtil.scala | ||
| // (lines 69-89) | ||
| // assert(map.currentMap == null) | ||
| eventually{ |
There was a problem hiding this comment.
nit: add a space eventually {
| } | ||
| } | ||
|
|
||
| private def destroy() : Unit = { |
| } | ||
|
|
||
| def toCompletionIterator: CompletionIterator[(K, C), SpillableIterator] = { | ||
| CompletionIterator[(K, C), SpillableIterator](this, this.destroy ) |
| // (lines 69-89) | ||
| assert(map.currentMap == null) | ||
|
|
||
| eventually{ |
|
LGTM except some code style issues. Thanks for improving the test! |
…ssues commented by @cloud-fan .
|
Test build #94670 has finished for PR 21369 at commit
|
|
retest this please |
|
Test build #94678 has finished for PR 21369 at commit
|
|
thanks, merging to master! |
…teration This PR solves [SPARK-22713](https://issues.apache.org/jira/browse/SPARK-22713) which describes a memory leak that occurs when and ExternalAppendOnlyMap is spilled during iteration (opposed to insertion). (Please fill in changes proposed in this fix) ExternalAppendOnlyMap's iterator supports spilling but it kept a reference to the internal map (via an internal iterator) after spilling, it seems that the original code was actually supposed to 'get rid' of this reference on the next iteration but according to the elaborate investigation described in the JIRA this didn't happen. the fix was simply replacing the internal iterator immediately after spilling. I've introduced a new test to test suite ExternalAppendOnlyMapSuite, this test asserts that neither the external map itself nor its iterator hold any reference to the internal map after a spill. These approach required some access relaxation of some members variables and nested classes of ExternalAppendOnlyMap, this members are now package provate and annotated with VisibleForTesting. Closes apache#21369 from eyalfa/SPARK-22713__ExternalAppendOnlyMap_effective_spill. Authored-by: Eyal Farago <eyal@nrgene.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com> (cherry-picked from commit 2e3abdf) Ref: LIHADOOP-40170 RB=1413724 BUG=LIHADOOP-40170 G=superfriends-reviewers R=fli,mshen,yezhou,edlu A=yezhou
What changes were proposed in this pull request?
This PR solves SPARK-22713 which describes a memory leak that occurs when and ExternalAppendOnlyMap is spilled during iteration (opposed to insertion).
(Please fill in changes proposed in this fix)
ExternalAppendOnlyMap's iterator supports spilling but it kept a reference to the internal map (via an internal iterator) after spilling, it seems that the original code was actually supposed to 'get rid' of this reference on the next iteration but according to the elaborate investigation described in the JIRA this didn't happen.
the fix was simply replacing the internal iterator immediately after spilling.
How was this patch tested?
I've introduced a new test to test suite ExternalAppendOnlyMapSuite, this test asserts that neither the external map itself nor its iterator hold any reference to the internal map after a spill.
These approach required some access relaxation of some members variables and nested classes of ExternalAppendOnlyMap, this members are now package provate and annotated with @VisibleForTesting.