-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-11617] [network] Fix leak in TransportFrameDecoder. #9619
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
The code was using the wrong API to add data to the internal composite buffer, causing buffers to leak in certain situations. Use the right API and enhance the tests to catch memory leaks.
|
Jenkins, retest this please. |
|
Test build #45603 has finished for PR 9619 at commit
|
|
Test build #45608 has finished for PR 9619 at commit
|
This makes the frame decoder behave more like netty's ByteToMessageDecoder, at the expense of copying some data in a few cases.
|
Test build #45779 has finished for PR 9619 at commit
|
|
Test build #45909 has finished for PR 9619 at commit
|
|
retest this please |
|
Test build #45999 has finished for PR 9619 at commit
|
This test actually fails with java.lang.IndexOutOfBoundsException if the fix in this patch set is disabled.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@vanzin could you give a real case? Or this is just for correctness, even if downstream in Spark doesn't use retain?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually Spark does use retain() when fetching shuffle blocks, and for some reason that causes problems. I think the real problem is somewhere in netty code, but this is the workaround the netty code itself uses (see ByteToMessageDecoder).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah. Just saw retain() in ChunkFetchSuccess.
|
LGTM pending tests. |
|
Test build #46015 has finished for PR 9619 at commit
|
|
retest this please |
This is a bug that will be fixed in #9707 |
|
Test build #46019 has finished for PR 9619 at commit
|
|
Test build #46024 has finished for PR 9619 at commit
|
|
Merging this (master / 1.6). |
The code was using the wrong API to add data to the internal composite buffer, causing buffers to leak in certain situations. Use the right API and enhance the tests to catch memory leaks. Also, avoid reusing the composite buffers when downstream handlers keep references to them; this seems to cause a few different issues even though the ref counting code seems to be correct, so instead pay the cost of copying a few bytes when that situation happens. Author: Marcelo Vanzin <[email protected]> Closes #9619 from vanzin/SPARK-11617. (cherry picked from commit 540bf58) Signed-off-by: Marcelo Vanzin <[email protected]>
The code was using the wrong API to add data to the internal composite
buffer, causing buffers to leak in certain situations. Use the right
API and enhance the tests to catch memory leaks.
Also, avoid reusing the composite buffers when downstream handlers keep
references to them; this seems to cause a few different issues even though
the ref counting code seems to be correct, so instead pay the cost of copying
a few bytes when that situation happens.