[SPARK-22793][SQL] Memory leak in Spark Thrift Server

zuotingbing · gatorsmile · commit be9a804f2ef7 · 2018-01-06T18:07:45.000+08:00
# What changes were proposed in this pull request? 1. Start HiveThriftServer2. 2. Connect to thriftserver through beeline. 3. Close the beeline. 4. repeat step2 and step 3 for many times. we found there are many directories never be dropped under the path `hive.exec.local.scratchdir` and `hive.exec.scratchdir`, as we know the scratchdir has been added to deleteOnExit when it be created. So it means that the cache size of FileSystem `deleteOnExit` will keep increasing until JVM terminated. In addition, we use `jmap -histo:live [PID]` to printout the size of objects in HiveThriftServer2 Process, we can find the object `org.apache.spark.sql.hive.client.HiveClientImpl` and `org.apache.hadoop.hive.ql.session.SessionState` keep increasing even though we closed all the beeline connections, which may caused the leak of Memory. # How was this patch tested? manual tests This PR follw-up the apache#19989 Author: zuotingbing <zuo.tingbing9@zte.com.cn> Closes apache#20029 from zuotingbing/SPARK-22793.
diff --git a/sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveSessionStateBuilder.scala b/sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveSessionStateBuilder.scala
@@ -42,7 +42,7 @@ class HiveSessionStateBuilder(session: SparkSession, parentState: Option[Session
    * Create a Hive aware resource loader.
    */
   override protected lazy val resourceLoader: HiveSessionResourceLoader = {
-    val client: HiveClient = externalCatalog.client.newSession()
+    val client: HiveClient = externalCatalog.client
     new HiveSessionResourceLoader(session, client)
   }
 

Original file line number	Diff line number	Diff line change
`@@ -42,7 +42,7 @@ class HiveSessionStateBuilder(session: SparkSession, parentState: Option[Session`
`42`	`42`	`* Create a Hive aware resource loader.`
`43`	`43`	`*/`
`44`	`44`	`override protected lazy val resourceLoader: HiveSessionResourceLoader = {`
`45`		`- val client: HiveClient = externalCatalog.client.newSession()`
	`45`	`+ val client: HiveClient = externalCatalog.client`
`46`	`46`	`new HiveSessionResourceLoader(session, client)`
`47`	`47`	`}`
`48`	`48`