Skip to content

Commit 1b50e0e

Browse files
dongjoon-hyungatorsmile
authored andcommitted
[SPARK-20256][SQL] SessionState should be created more lazily
## What changes were proposed in this pull request? `SessionState` is designed to be created lazily. However, in reality, it created immediately in `SparkSession.Builder.getOrCreate` ([here](https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/SparkSession.scala#L943)). This PR aims to recover the lazy behavior by keeping the options into `initialSessionOptions`. The benefit is like the following. Users can start `spark-shell` and use RDD operations without any problems. **BEFORE** ```scala $ bin/spark-shell java.lang.IllegalArgumentException: Error while instantiating 'org.apache.spark.sql.hive.HiveSessionStateBuilder' ... Caused by: org.apache.spark.sql.AnalysisException: org.apache.hadoop.hive.ql.metadata.HiveException: MetaException(message:java.security.AccessControlException: Permission denied: user=spark, access=READ, inode="/apps/hive/warehouse":hive:hdfs:drwx------ ``` As reported in SPARK-20256, this happens when the warehouse directory is not allowed for this user. **AFTER** ```scala $ bin/spark-shell ... Welcome to ____ __ / __/__ ___ _____/ /__ _\ \/ _ \/ _ `/ __/ '_/ /___/ .__/\_,_/_/ /_/\_\ version 2.3.0-SNAPSHOT /_/ Using Scala version 2.11.8 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_112) Type in expressions to have them evaluated. Type :help for more information. scala> sc.range(0, 10, 1).count() res0: Long = 10 ``` ## How was this patch tested? Manual. This closes #18512 . Author: Dongjoon Hyun <[email protected]> Closes #18501 from dongjoon-hyun/SPARK-20256.
1 parent a3c29fc commit 1b50e0e

File tree

1 file changed

+10
-2
lines changed

1 file changed

+10
-2
lines changed

sql/core/src/main/scala/org/apache/spark/sql/SparkSession.scala

Lines changed: 10 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -117,6 +117,12 @@ class SparkSession private(
117117
existingSharedState.getOrElse(new SharedState(sparkContext))
118118
}
119119

120+
/**
121+
* Initial options for session. This options are applied once when sessionState is created.
122+
*/
123+
@transient
124+
private[sql] val initialSessionOptions = new scala.collection.mutable.HashMap[String, String]
125+
120126
/**
121127
* State isolated across sessions, including SQL configurations, temporary tables, registered
122128
* functions, and everything else that accepts a [[org.apache.spark.sql.internal.SQLConf]].
@@ -132,9 +138,11 @@ class SparkSession private(
132138
parentSessionState
133139
.map(_.clone(this))
134140
.getOrElse {
135-
SparkSession.instantiateSessionState(
141+
val state = SparkSession.instantiateSessionState(
136142
SparkSession.sessionStateClassName(sparkContext.conf),
137143
self)
144+
initialSessionOptions.foreach { case (k, v) => state.conf.setConfString(k, v) }
145+
state
138146
}
139147
}
140148

@@ -940,7 +948,7 @@ object SparkSession {
940948
}
941949

942950
session = new SparkSession(sparkContext, None, None, extensions)
943-
options.foreach { case (k, v) => session.sessionState.conf.setConfString(k, v) }
951+
options.foreach { case (k, v) => session.initialSessionOptions.put(k, v) }
944952
defaultSession.set(session)
945953

946954
// Register a successfully instantiated context to the singleton. This should be at the

0 commit comments

Comments
 (0)