Skip to content

Conversation

@windpiger
Copy link
Contributor

What changes were proposed in this pull request?

In SPARK-15959, we bring back the hive.metastore.warehouse.dir , while in the logic, when use the value of spark.sql.warehouse.dir to overwrite hive.metastore.warehouse.dir , it set it to sparkContext.conf which does not overwrite the value is hadoopConf, I think it should put in sparkContext.hadoopConfiguration and overwrite the original value of hadoopConf

https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/internal/SharedState.scala#L64

How was this patch tested?

N/A

@SparkQA
Copy link

SparkQA commented Feb 20, 2017

Test build #73144 has finished for PR 16996 at commit 92c1452.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Feb 20, 2017

Test build #73152 has started for PR 16996 at commit ac0a1c6.

@windpiger
Copy link
Contributor Author

retest this please

@SparkQA
Copy link

SparkQA commented Feb 20, 2017

Test build #73155 has finished for PR 16996 at commit ac0a1c6.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@windpiger
Copy link
Contributor Author

cc @cloud-fan

// When neither spark.sql.warehouse.dir nor hive.metastore.warehouse.dir is set,
// we will set hive.metastore.warehouse.dir to the default value of spark.sql.warehouse.dir.
val sparkWarehouseDir = sparkContext.conf.get(WAREHOUSE_PATH)
logInfo(s"${WAREHOUSE_PATH.key} is set, Setting " +
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please remove "${WAREHOUSE_PATH.key} is set,

@gatorsmile
Copy link
Member

In your PR title, verwrite -> overwrite

@gatorsmile
Copy link
Member

Why we need to overwrite its original value? I thought we will not use that value any more.

@windpiger windpiger changed the title [SPARK-19664][SQL]put hive.metastore.warehouse.dir in hadoopconf to verwrite its original value [SPARK-19664][SQL]put hive.metastore.warehouse.dir in hadoopconf to overwrite its original value Feb 21, 2017
@windpiger
Copy link
Contributor Author

this is a minor improvement, it is better to keep the same value between sparkConf and hadoopConf, and add some loginfo for user debug.

@gatorsmile
Copy link
Member

cc @yhuai who did the original change. I am not sure whether we need to overwrite the original value of hadoopConf, although the change does not hurt anything IMO.

@SparkQA
Copy link

SparkQA commented Feb 21, 2017

Test build #73198 has finished for PR 16996 at commit 91b9fd2.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@windpiger
Copy link
Contributor Author

@yhuai could you help to review this? thanks~

val warehousePath = {
val configFile = Utils.getContextOrSparkClassLoader.getResource("hive-site.xml")
if (configFile != null) {
logInfo(s"load config from hive-site.xml $configFile")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: loading hive config file: $configFile

// When neither spark.sql.warehouse.dir nor hive.metastore.warehouse.dir is set,
// we will set hive.metastore.warehouse.dir to the default value of spark.sql.warehouse.dir.
val sparkWarehouseDir = sparkContext.conf.get(WAREHOUSE_PATH)
logInfo(s"Setting hive.metastore.warehouse.dir ($hiveWarehouseDir) to the value of " +
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: hive.metastore.warehouse.dir is not set, setting it to the value of ${WAREHOUSE_PATH.key}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

when we hit this condition because we provide a WAREHOUSE_PATH , maybe the hive.metastore.warehouse.dir also have been set

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

($hiveWarehouseDir) -> ('$hiveWarehouseDir')

if (hiveWarehouseDir != null) {
sparkContext.hadoopConfiguration.set("hive.metastore.warehouse.dir", sparkWarehouseDir)
}
sparkContext.conf.set("hive.metastore.warehouse.dir", sparkWarehouseDir)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we still need to do this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I removed hive.metastore.warehouse.dir in sparkConf above commit ,but some tests failed, so I revert this logic, let me dig it more.

Copy link
Contributor Author

@windpiger windpiger Feb 23, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I look into the code, and make some tests.
The hive.metastore.warehouse.dir in sparkConf still take effect in Spark, it is not useless.
The reason is that:

  1. when we run spark with HiveEnabled, it will create ShareState
  2. when create ShareState, it will create a HiveExternalCatalog
    https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/internal/SharedState.scala#L85
    3.when create HiveExternalCatalog, it will Create HiveClientImpl by HiveUtils
    https://github.com/apache/spark/blob/master/sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveExternalCatalog.scala#L65
    4.when create HiveClientImpl, it will call SessionState.start(state)
  3. and then in the SessionState.start(state), it will create a default database using hive.metastore.warehouse.dir in hiveConf which is created in HiveClientImpl https://github.com/apache/spark/blob/master/sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveClientImpl.scala#L189
  4. while the hiveConf created in HiveClientImpl from hadoopConf and sparkConf , and sparkConf will overwrite the value of the same key in hadoopConf. So it means that it actually will use hive.metastore.warehouse.dir in sparkConf to create the default database, if we does not overwrite the value in sparkConf in SharedState, the database location is not we expected which is the warehouse path. So here sparkContext.conf.set("hive.metastore.warehouse.dir", sparkWarehouseDir) should be retained

we can also find that,the default database does not created in SharedState, here condition is false, will not hit the create database logic. it has been created when we init the HiveClientImpl
https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/internal/SharedState.scala#L96

logInfo(s"Setting hive.metastore.warehouse.dir ($hiveWarehouseDir) to the value of " +
s"${WAREHOUSE_PATH.key} ('$sparkWarehouseDir').")
if (hiveWarehouseDir != null) {
sparkContext.hadoopConfiguration.set("hive.metastore.warehouse.dir", sparkWarehouseDir)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we can always set it.

Copy link
Contributor Author

@windpiger windpiger Feb 23, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, maybe when we always set it, the sparkContext.conf.set("hive.metastore.warehouse.dir", sparkWarehouseDir) could be removed, let me test it more, thanks~

@SparkQA
Copy link

SparkQA commented Feb 23, 2017

Test build #73340 has finished for PR 16996 at commit 7429cd8.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Feb 23, 2017

Test build #73343 has finished for PR 16996 at commit 1d18636.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

logInfo(s"Setting hive.metastore.warehouse.dir ($hiveWarehouseDir) to the value of " +
s"${WAREHOUSE_PATH.key} ('$sparkWarehouseDir').")
sparkContext.hadoopConfiguration.set("hive.metastore.warehouse.dir", sparkWarehouseDir)
sparkContext.conf.set("hive.metastore.warehouse.dir", sparkWarehouseDir)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how about sparkContext.conf.remove("hive.metastore.warehouse.dir")? it's good to have a single source of truth.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, it is ok to remove it, and it will be more clear that hive.metastore.warehouse.dir and WAREHOUSE_PATH are all at their right places.

@cloud-fan
Copy link
Contributor

LGTM

@SparkQA
Copy link

SparkQA commented Feb 24, 2017

Test build #73389 has finished for PR 16996 at commit 86deb62.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@cloud-fan
Copy link
Contributor

thanks, merging to master!

@asfgit asfgit closed this in 8f33731 Feb 24, 2017
Yunni pushed a commit to Yunni/spark that referenced this pull request Feb 27, 2017
…overwrite its original value

## What changes were proposed in this pull request?

In [SPARK-15959](https://issues.apache.org/jira/browse/SPARK-15959), we bring back the `hive.metastore.warehouse.dir` , while in the logic, when use the value of  `spark.sql.warehouse.dir` to overwrite `hive.metastore.warehouse.dir` , it set it to `sparkContext.conf` which does not overwrite the value is hadoopConf, I think it should put in `sparkContext.hadoopConfiguration` and overwrite the original value of hadoopConf

https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/internal/SharedState.scala#L64

## How was this patch tested?
N/A

Author: windpiger <[email protected]>

Closes apache#16996 from windpiger/hivemetawarehouseConf.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants