Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -772,15 +772,13 @@ class ALSCleanerSuite extends SparkFunSuite {
FileUtils.listFiles(localDir, TrueFileFilter.INSTANCE, TrueFileFilter.INSTANCE).asScala.toSet
try {
conf.set("spark.local.dir", localDir.getAbsolutePath)
val sc = new SparkContext("local[2]", "test", conf)
val sc = new SparkContext("local[2]", "ALSCleanerSuite", conf)
try {
sc.setCheckpointDir(checkpointDir.getAbsolutePath)
// Generate test data
val (training, _) = ALSSuite.genImplicitTestData(sc, 20, 5, 1, 0.2, 0)
// Implicitly test the cleaning of parents during ALS training
val spark = SparkSession.builder
.master("local[2]")
.appName("ALSCleanerSuite")
.sparkContext(sc)
.getOrCreate()
import spark.implicits._
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -43,8 +43,6 @@ private[ml] object TreeTests extends SparkFunSuite {
categoricalFeatures: Map[Int, Int],
numClasses: Int): DataFrame = {
val spark = SparkSession.builder()
.master("local[2]")
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

here we build SparkSession with an existing spark context, so setting the master is useless.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would be great if we can check whether there are similar changes can be made.

.appName("TreeTests")
.sparkContext(data.sparkContext)
.getOrCreate()
import spark.implicits._
Expand Down
25 changes: 9 additions & 16 deletions sql/core/src/main/scala/org/apache/spark/sql/SparkSession.scala
Original file line number Diff line number Diff line change
Expand Up @@ -757,6 +757,8 @@ object SparkSession {

private[this] var userSuppliedContext: Option[SparkContext] = None

// The `SparkConf` inside the given `SparkContext` may get changed if you specify some options
// for this builder.
private[spark] def sparkContext(sparkContext: SparkContext): Builder = synchronized {
userSuppliedContext = Option(sparkContext)
this
Expand Down Expand Up @@ -854,7 +856,7 @@ object SparkSession {
*
* @since 2.2.0
*/
def withExtensions(f: SparkSessionExtensions => Unit): Builder = {
def withExtensions(f: SparkSessionExtensions => Unit): Builder = synchronized {
f(extensions)
this
}
Expand Down Expand Up @@ -899,22 +901,14 @@ object SparkSession {

// No active nor global default session. Create a new one.
val sparkContext = userSuppliedContext.getOrElse {
// set app name if not given
val randomAppName = java.util.UUID.randomUUID().toString
val sparkConf = new SparkConf()
options.foreach { case (k, v) => sparkConf.set(k, v) }
if (!sparkConf.contains("spark.app.name")) {
sparkConf.setAppName(randomAppName)
}
val sc = SparkContext.getOrCreate(sparkConf)
// maybe this is an existing SparkContext, update its SparkConf which maybe used
// by SparkSession
options.foreach { case (k, v) => sc.conf.set(k, v) }
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

previously we set the given options to the newly created SparkConf, then create SparkContext, then set the options to sc.conf again, in case the SparkContext.getOrCreate returns an existing one. We can just create SparkConf and use it to create SparkContext, and then set options to sc.conf, so that we only need to set the conf once.

if (!sc.conf.contains("spark.app.name")) {
sc.conf.setAppName(randomAppName)
}
sc
options.get("spark.master").foreach(sparkConf.setMaster)
// set a random app name if not given.
sparkConf.setAppName(options.getOrElse("spark.app.name",
java.util.UUID.randomUUID().toString))
SparkContext.getOrCreate(sparkConf)
}
options.foreach { case (k, v) => sparkContext.conf.set(k, v) }
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in order to remove https://github.com/apache/spark/pull/18172/files#diff-d91c284798f1c98bf03a31855e26d71cL938 , here I change the behavior to also setting options to sparkContext.conf even if the sparkContext is supplied by users. This change is safe as SparkSession.Builder.sparkContext is private, and I checked all the callers to confirm that it's ok to do so.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we can add a comment to SparkSession.Builder.sparkContext that we will modify its conf?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yea good idea


// Initialize extensions if the user has defined a configurator class.
val extensionConfOption = sparkContext.conf.get(StaticSQLConf.SPARK_SESSION_EXTENSIONS)
Expand All @@ -935,7 +929,6 @@ object SparkSession {
}

session = new SparkSession(sparkContext, None, None, extensions)
options.foreach { case (k, v) => session.sessionState.conf.setConfString(k, v) }
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When building SessionState, we will merge spark conf to sql conf, which means we don't need to set options to session.sessionState.conf here, if the spark conf already contains options.

defaultSession.set(session)

// Register a successfully instantiated context to the singleton. This should be at the
Expand Down