[SPARK-21428][SQL][FOLLOWUP]CliSessionState should point to the actual metastore not a dummy one #19068

yaooqinn · 2017-08-28T09:33:40Z

What changes were proposed in this pull request?

While running bin/spark-sql, we will reuse cliSessionState, but the Hive configurations generated here just points to a dummy meta store which actually should be the real one. And the warehouse is determined later in SharedState, HiveClient should respect this config changing in this case too.

How was this patch tested?

existing ut

cc @cloud-fan @jiangxb1987

cloud-fan · 2017-08-28T10:48:19Z

sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveUtils.scala

when is useInMemoryDerby false?

@cloud-fan updated. i notice that hiveConf initialized by executionHive way does not load hive-site.xml, so i changed it to meta Hive way

jiangxb1987 · 2017-08-28T17:18:33Z

...e-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/SparkSQLCLIDriver.scala

This function not only constructs the warehouse path, it also creates many other configs, why is it safe to just ignore em?

before #SPARK-21428, cliSessionState will be left behind; now we will reuse it. this func is used to organize execution hive configs, but now cliSessionState has to talk with metastore.

I don't think you have answered the question here - for example, the original code will update the HiveConf.ConfVars.METASTORECONNECTURLKEY property here, but you don't touch that after the change. I'm not confident to make that behavior change.

Change this key to connect a dummy metastore instead of the real one？

METASTORECONNECTURLKEY connects derby by default, we may set this key in hive-site.xml to talk with metastore. hiveclient will use the state init here, I don't think we should beinit hive conf as the old way.

@jiangxb1987 sorry for not @ you， please take a look again

dilipbiswal · 2017-09-09T07:28:40Z

Hello,

I am hitting this issue. Actually this seems like a regression as my script which was working before is no longer working. Here is my scenario.

1) spark-sql
    create database testdb;
2) exit spark-sql
3) spark-sql
    use testdb;  => I get a database not found error.

I am testing this on my laptop where there is no hive installed. In this case we end up pointing to
the default hive warehouse dir /user/hive/warehouse which is not present in my laptop.
Can we please have this reviewed and merged if the changes are good. Thanks in advance.

yaooqinn · 2017-09-11T01:55:36Z

@dilipbiswal This is because the CliSessionState instance initialized in SparkCLISQLDriver pointing to a dummy metastore and reused later in the hive metastore client

gatorsmile · 2017-09-11T05:03:52Z

cc @cloud-fan @jiangxb1987

yaooqinn · 2017-09-12T07:19:35Z

cc @cloud-fan @jiangxb1987 @gatorsmile pr title and descriptions updated

cloud-fan · 2017-09-13T05:41:19Z

OK to test

cloud-fan · 2017-09-13T05:41:41Z

@dilipbiswal does your script work after this PR?

cloud-fan · 2017-09-13T05:48:27Z

but the Hive configurations generated here just points to a dummy meta store

Why did this works well before?

dilipbiswal · 2017-09-13T05:59:48Z

@cloud-fan Yeah.. I have tried my script against this PR and it works fine. I am not familiar with the changes and don't know if it can have any side effects. One thing that haven't had the time to find out is in my script ..

1) spark-sql
    create database testdb;
2) exit spark-sql
3) spark-sql
    use testdb;  => I get a database not found error.

How did the create database succeed i.e i didn't get any error ? If it did, then where did it create the database at ? Perhaps you know the answer ? :-)

dilipbiswal · 2017-09-13T06:31:37Z

@cloud-fan it didn't trigger the test ?

yaooqinn · 2017-09-13T07:03:01Z

@cloud-fan The cliSessionState is meant to be reused but discarded for isolated hive client classloader couldn't get it through SessionState.get(), so hive client will generated a SessionState instance everytime while calling HiveClient.newSession(). In my foregoing pr, I made it reused but i didn't notice that it is just pointing to a dummy metatstore. This has to be fixed.

cloud-fan · 2017-09-13T07:31:14Z

sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveClientImpl.scala

can you add some comments to explain this logic?

cloud-fan · 2017-09-13T07:33:47Z

sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveUtils.scala

shouldn't the default value of extraConfig be Map.empty?

This doesn't change the original logic, check https://github.com/apache/spark/blob/master/sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveUtils.scala#L283.

cloud-fan · 2017-09-13T07:36:20Z

sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveUtils.scala

can you add some comments for this method? Especially what's the difference between this one and hiveClientConfigurations.

hiveClientConfigurations is used to change those time values in the form of "1s"/ "10min" to long values, we may give it a more proper name

yaooqinn · 2017-09-13T09:43:47Z

jenkins unreachable cc @cloud-fan

cloud-fan · 2017-09-13T13:03:58Z

ok to test

cloud-fan · 2017-09-13T13:15:45Z

sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveUtils.scala

it's not only time configs, I think we'd better call it config, following IsolatedClientLoader.config

cloud-fan · 2017-09-13T13:30:51Z

sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveUtils.scala

How about we remove these default values and explicitly specify them in https://github.com/apache/spark/pull/19068/files#diff-f7aac41bf732c1ba1edbac436d331a55R84?

SparkQA · 2017-09-13T14:11:38Z

Test build #81718 has finished for PR 19068 at commit 267a1b2.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2017-09-13T15:09:20Z

Test build #81721 has finished for PR 19068 at commit 9682eab.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

cloud-fan · 2017-09-14T07:44:26Z

sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveUtils.scala

extraConfig -> extraConfigs, and please update the description, it's not time configurations.

cloud-fan · 2017-09-14T07:46:52Z

can you run ./build/sbt "hive/test" locally to make sure all hive tests pass?

…termined in SharedState

SparkQA · 2017-09-19T03:29:34Z

Test build #81906 has finished for PR 19068 at commit b21fc72.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

yaooqinn · 2017-09-19T04:58:32Z

@cloud-fan i met linkage err before, and now i simplify the logic, could you trigger jenkins before reverting

SparkQA · 2017-09-19T04:59:06Z

Test build #81909 has finished for PR 19068 at commit f2618b9.

This patch fails Scala style tests.
This patch merges cleanly.
This patch adds no public classes.

cloud-fan · 2017-09-19T05:05:51Z

retest this please

SparkQA · 2017-09-19T07:04:45Z

Test build #81910 has finished for PR 19068 at commit c5c1c26.

This patch fails due to an unknown error code, -9.
This patch merges cleanly.
This patch adds no public classes.

yaooqinn · 2017-09-19T07:28:29Z

retest this please

SparkQA · 2017-09-19T10:00:00Z

Test build #81916 has finished for PR 19068 at commit c5c1c26.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

cloud-fan · 2017-09-19T11:35:58Z

LGTM, merging to master!

jinxing64 · 2017-09-20T08:00:47Z

@yaooqinn
This change works well for me, thanks for fix !
After this change, hive client for execution(points to a dummy local metastore) will never be used when running sql inspark-sql, hive client points a true metastore. Right ?
So why it is designed to have the hive client in SparkSQLCLIDriver points to a dummy local metastore before ? Is this change breaking some design?

cloud-fan reviewed Aug 28, 2017

View reviewed changes

jiangxb1987 reviewed Aug 28, 2017

View reviewed changes

yaooqinn mentioned this pull request Sep 5, 2017

[SPARK-21916][SQL] Set isolationOn=true when create hive client for metadata. #19127

Closed

yaooqinn changed the title ~~[SPARK-21428][SQL][FOLLOWUP] Reused state should respect warehouse dir determined in SharedState~~ [SPARK-21428][SQL][FOLLOWUP]CliSessionState should point to the actual metastore not a dummy one Sep 12, 2017

cloud-fan reviewed Sep 13, 2017

View reviewed changes

cloud-fan reviewed Sep 14, 2017

View reviewed changes

yaooqinn added 4 commits September 18, 2017 11:38

SPARK-21428-FOLLOWUP: CliSessionState should respect warehouse dir de…

d847e67

…termined in SharedState

hiveclient doesn't repect hive-site.xml in spark-sql

dc99612

extract dir from hadoop conf

eef8f58

comments added

779b0f8

cloud-fan mentioned this pull request Sep 19, 2017

Revert "[SPARK-21428] Turn IsolatedClientLoader off while using builtin Hive jars for reusing CliSessionState #19273

Closed

fix ut

f2618b9

code style

c5c1c26

asfgit closed this in 581200a Sep 19, 2017

[SPARK-21428][SQL][FOLLOWUP]CliSessionState should point to the actual metastore not a dummy one #19068

[SPARK-21428][SQL][FOLLOWUP]CliSessionState should point to the actual metastore not a dummy one #19068

Uh oh!

Conversation

yaooqinn commented Aug 28, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What changes were proposed in this pull request?

How was this patch tested?

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jiangxb1987 Aug 28, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

dilipbiswal commented Sep 9, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

yaooqinn commented Sep 11, 2017

Uh oh!

gatorsmile commented Sep 11, 2017

Uh oh!

yaooqinn commented Sep 12, 2017

Uh oh!

cloud-fan commented Sep 13, 2017

Uh oh!

cloud-fan commented Sep 13, 2017

Uh oh!

cloud-fan commented Sep 13, 2017

Uh oh!

dilipbiswal commented Sep 13, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

dilipbiswal commented Sep 13, 2017

Uh oh!

yaooqinn commented Sep 13, 2017

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

yaooqinn commented Sep 13, 2017

Uh oh!

cloud-fan commented Sep 13, 2017

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

SparkQA commented Sep 13, 2017

Uh oh!

SparkQA commented Sep 13, 2017

Uh oh!

Choose a reason for hiding this comment

yaooqinn commented Aug 28, 2017 •

edited

Loading

jiangxb1987 Aug 28, 2017 •

edited

Loading

dilipbiswal commented Sep 9, 2017 •

edited

Loading

dilipbiswal commented Sep 13, 2017 •

edited

Loading