Skip to content

Conversation

@JoshRosen
Copy link
Contributor

ExecutorClassLoader does not ensure proper cleanup of network connections that it opens. If it fails to load a class, it may leak partially-consumed InputStreams that are connected to the REPL's HTTP class server, causing that server to exhaust its thread pool, which can cause the entire job to hang. See SPARK-6209 for more details, including a bug reproduction.

This patch fixes this issue by ensuring proper cleanup of these resources. It also adds logging for unexpected error cases.

(See #4944 for the corresponding PR for 1.3/1.4).

…g to load classes (master branch PR)

ExecutorClassLoader does not ensure proper cleanup of network connections that it opens. If it fails to load a class, it may leak partially-consumed InputStreams that are connected to the REPL's HTTP class server, causing that server to exhaust its thread pool, which can cause the entire job to hang.  See [SPARK-6209](https://issues.apache.org/jira/browse/SPARK-6209) for more details, including a bug reproduction.

This patch fixes this issue by ensuring proper cleanup of these resources.  It also adds logging for unexpected error cases.

This PR is an extended version of apache#4935 and adds a regression test.

Author: Josh Rosen <[email protected]>

Closes apache#4944 from JoshRosen/executorclassloader-leak-master-branch and squashes the following commits:

e0e3c25 [Josh Rosen] Wrap try block around getReponseCode; re-enable keep-alive by closing error stream
961c284 [Josh Rosen] Roll back changes that were added to get the regression test to fail
7ee2261 [Josh Rosen] Add a failing regression test
e2d70a3 [Josh Rosen] Properly clean up after errors in ExecutorClassLoader

(cherry picked from commit 7215aa7)
Signed-off-by: Andrew Or <[email protected]>

Conflicts:
	repl/pom.xml
	repl/src/main/scala/org/apache/spark/repl/ExecutorClassLoader.scala
@SparkQA
Copy link

SparkQA commented Mar 24, 2015

Test build #29114 has started for PR 5174 at commit 16e38fe.

  • This patch merges cleanly.

@SparkQA
Copy link

SparkQA commented Mar 25, 2015

Test build #29114 has finished for PR 5174 at commit 16e38fe.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/29114/
Test PASSed.

@JoshRosen
Copy link
Contributor Author

Jenkins, retest this please.

@SparkQA
Copy link

SparkQA commented Apr 2, 2015

Test build #29616 has started for PR 5174 at commit 16e38fe.

  • This patch merges cleanly.

@SparkQA
Copy link

SparkQA commented Apr 2, 2015

Test build #29616 has finished for PR 5174 at commit 16e38fe.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/29616/
Test PASSed.

@JoshRosen
Copy link
Contributor Author

I'm going to merge this now (this is just a backport of the 1.3.x / master branch version of this patch and the relevant code is now pretty well-covered by tests, so this should be safe to include).

asfgit pushed a commit that referenced this pull request Apr 5, 2015
…g to load classes (branch-1.2)

ExecutorClassLoader does not ensure proper cleanup of network connections that it opens. If it fails to load a class, it may leak partially-consumed InputStreams that are connected to the REPL's HTTP class server, causing that server to exhaust its thread pool, which can cause the entire job to hang.  See [SPARK-6209](https://issues.apache.org/jira/browse/SPARK-6209) for more details, including a bug reproduction.

This patch fixes this issue by ensuring proper cleanup of these resources.  It also adds logging for unexpected error cases.

(See #4944 for the corresponding PR for 1.3/1.4).

Author: Josh Rosen <[email protected]>

Closes #5174 from JoshRosen/executorclassloaderleak-branch-1.2 and squashes the following commits:

16e38fe [Josh Rosen] [SPARK-6209] Clean up connections in ExecutorClassLoader after failing to load classes (master branch PR)
@JoshRosen JoshRosen closed this Apr 5, 2015
@JoshRosen JoshRosen deleted the executorclassloaderleak-branch-1.2 branch April 5, 2015 21:01
markhamstra pushed a commit to markhamstra/spark that referenced this pull request Apr 15, 2015
…g to load classes (branch-1.2)

ExecutorClassLoader does not ensure proper cleanup of network connections that it opens. If it fails to load a class, it may leak partially-consumed InputStreams that are connected to the REPL's HTTP class server, causing that server to exhaust its thread pool, which can cause the entire job to hang.  See [SPARK-6209](https://issues.apache.org/jira/browse/SPARK-6209) for more details, including a bug reproduction.

This patch fixes this issue by ensuring proper cleanup of these resources.  It also adds logging for unexpected error cases.

(See apache#4944 for the corresponding PR for 1.3/1.4).

Author: Josh Rosen <[email protected]>

Closes apache#5174 from JoshRosen/executorclassloaderleak-branch-1.2 and squashes the following commits:

16e38fe [Josh Rosen] [SPARK-6209] Clean up connections in ExecutorClassLoader after failing to load classes (master branch PR)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants