[SPARK-20848][SQL] Shutdown the pool after reading parquet files

viirya · cloud-fan · commit 2f68631f523e · 2017-05-25T00:36:22.000+08:00
## What changes were proposed in this pull request? From JIRA: On each call to spark.read.parquet, a new ForkJoinPool is created. One of the threads in the pool is kept in the WAITING state, and never stopped, which leads to unbounded growth in number of threads. We should shutdown the pool after reading parquet files. ## How was this patch tested? Added a test to ParquetFileFormatSuite. Please review http://spark.apache.org/contributing.html before opening a pull request. Author: Liang-Chi Hsieh <viirya@gmail.com> Closes #18073 from viirya/SPARK-20848. (cherry picked from commit f72ad30) Signed-off-by: Wenchen Fan <wenchen@databricks.com>
diff --git a/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFileFormat.scala b/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFileFormat.scala
@@ -496,7 +496,8 @@ object ParquetFileFormat extends Logging {
       partFiles: Seq[FileStatus],
       ignoreCorruptFiles: Boolean): Seq[Footer] = {
     val parFiles = partFiles.par
-    parFiles.tasksupport = new ForkJoinTaskSupport(new ForkJoinPool(8))
+    val pool = new ForkJoinPool(8)
+    parFiles.tasksupport = new ForkJoinTaskSupport(pool)
     parFiles.flatMap { currentFile =>
       try {
         // Skips row group information since we only need the schema.
@@ -512,6 +513,8 @@ object ParquetFileFormat extends Logging {
         } else {
           throw new IOException(s"Could not read footer for file: $currentFile", e)
         }
+      } finally {
+        pool.shutdown()
       }
     }.seq
   }

Original file line number	Diff line number	Diff line change
`@@ -496,7 +496,8 @@ object ParquetFileFormat extends Logging {`
`496`	`496`	`partFiles: Seq[FileStatus],`
`497`	`497`	`ignoreCorruptFiles: Boolean): Seq[Footer] = {`
`498`	`498`	`val parFiles = partFiles.par`
`499`		`- parFiles.tasksupport = new ForkJoinTaskSupport(new ForkJoinPool(8))`
	`499`	`+ val pool = new ForkJoinPool(8)`
	`500`	`+ parFiles.tasksupport = new ForkJoinTaskSupport(pool)`
`500`	`501`	`parFiles.flatMap { currentFile =>`
`501`	`502`	`try {`
`502`	`503`	`// Skips row group information since we only need the schema.`
`@@ -512,6 +513,8 @@ object ParquetFileFormat extends Logging {`
`512`	`513`	`} else {`
`513`	`514`	`throw new IOException(s"Could not read footer for file: $currentFile", e)`
`514`	`515`	`}`
	`516`	`+ } finally {`
	`517`	`+ pool.shutdown()`
`515`	`518`	`}`
`516`	`519`	`}.seq`
`517`	`520`	`}`