Skip to content
Closed
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -42,6 +42,7 @@ private[sql] class SharedState(val sparkContext: SparkContext) extends Logging {
val warehousePath = {
val configFile = Utils.getContextOrSparkClassLoader.getResource("hive-site.xml")
if (configFile != null) {
logInfo(s"load config from hive-site.xml $configFile")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: loading hive config file: $configFile

sparkContext.hadoopConfiguration.addResource(configFile)
}

Expand All @@ -61,6 +62,12 @@ private[sql] class SharedState(val sparkContext: SparkContext) extends Logging {
// When neither spark.sql.warehouse.dir nor hive.metastore.warehouse.dir is set,
// we will set hive.metastore.warehouse.dir to the default value of spark.sql.warehouse.dir.
val sparkWarehouseDir = sparkContext.conf.get(WAREHOUSE_PATH)
logInfo(s"${WAREHOUSE_PATH.key} is set, Setting " +
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please remove "${WAREHOUSE_PATH.key} is set,

s"hive.metastore.warehouse.dir ($hiveWarehouseDir) to the value of " +
s"${WAREHOUSE_PATH.key} ('$sparkWarehouseDir').")
if (hiveWarehouseDir != null) {
sparkContext.hadoopConfiguration.set("hive.metastore.warehouse.dir", sparkWarehouseDir)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we can always set it.

Copy link
Contributor Author

@windpiger windpiger Feb 23, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, maybe when we always set it, the sparkContext.conf.set("hive.metastore.warehouse.dir", sparkWarehouseDir) could be removed, let me test it more, thanks~

}
sparkContext.conf.set("hive.metastore.warehouse.dir", sparkWarehouseDir)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we still need to do this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I removed hive.metastore.warehouse.dir in sparkConf above commit ,but some tests failed, so I revert this logic, let me dig it more.

Copy link
Contributor Author

@windpiger windpiger Feb 23, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I look into the code, and make some tests.
The hive.metastore.warehouse.dir in sparkConf still take effect in Spark, it is not useless.
The reason is that:

  1. when we run spark with HiveEnabled, it will create ShareState
  2. when create ShareState, it will create a HiveExternalCatalog
    https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/internal/SharedState.scala#L85
    3.when create HiveExternalCatalog, it will Create HiveClientImpl by HiveUtils
    https://github.com/apache/spark/blob/master/sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveExternalCatalog.scala#L65
    4.when create HiveClientImpl, it will call SessionState.start(state)
  3. and then in the SessionState.start(state), it will create a default database using hive.metastore.warehouse.dir in hiveConf which is created in HiveClientImpl https://github.com/apache/spark/blob/master/sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveClientImpl.scala#L189
  4. while the hiveConf created in HiveClientImpl from hadoopConf and sparkConf , and sparkConf will overwrite the value of the same key in hadoopConf. So it means that it actually will use hive.metastore.warehouse.dir in sparkConf to create the default database, if we does not overwrite the value in sparkConf in SharedState, the database location is not we expected which is the warehouse path. So here sparkContext.conf.set("hive.metastore.warehouse.dir", sparkWarehouseDir) should be retained

we can also find that,the default database does not created in SharedState, here condition is false, will not hit the create database logic. it has been created when we init the HiveClientImpl
https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/internal/SharedState.scala#L96

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how about sparkContext.conf.remove("hive.metastore.warehouse.dir")? it's good to have a single source of truth.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, it is ok to remove it, and it will be more clear that hive.metastore.warehouse.dir and WAREHOUSE_PATH are all at their right places.

sparkWarehouseDir
}
Expand Down