Skip to content

Conversation

@yaooqinn
Copy link
Member

@yaooqinn yaooqinn commented Aug 28, 2017

What changes were proposed in this pull request?

While running bin/spark-sql, we will reuse cliSessionState, but the Hive configurations generated here just points to a dummy meta store which actually should be the real one. And the warehouse is determined later in SharedState, HiveClient should respect this config changing in this case too.

How was this patch tested?

existing ut

cc @cloud-fan @jiangxb1987

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

when is useInMemoryDerby false?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cloud-fan updated. i notice that hiveConf initialized by executionHive way does not load hive-site.xml, so i changed it to meta Hive way

Copy link
Contributor

@jiangxb1987 jiangxb1987 Aug 28, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function not only constructs the warehouse path, it also creates many other configs, why is it safe to just ignore em?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

before #SPARK-21428, cliSessionState will be left behind; now we will reuse it. this func is used to organize execution hive configs, but now cliSessionState has to talk with metastore.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think you have answered the question here - for example, the original code will update the HiveConf.ConfVars.METASTORECONNECTURLKEY property here, but you don't touch that after the change. I'm not confident to make that behavior change.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Change this key to connect a dummy metastore instead of the real one?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

METASTORECONNECTURLKEY connects derby by default, we may set this key in hive-site.xml to talk with metastore. hiveclient will use the state init here, I don't think we should beinit hive conf as the old way.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jiangxb1987 sorry for not @ you, please take a look again

@dilipbiswal
Copy link
Contributor

dilipbiswal commented Sep 9, 2017

Hello,

I am hitting this issue. Actually this seems like a regression as my script which was working before is no longer working. Here is my scenario.

1) spark-sql
    create database testdb;
2) exit spark-sql
3) spark-sql
    use testdb;  => I get a database not found error.

I am testing this on my laptop where there is no hive installed. In this case we end up pointing to
the default hive warehouse dir /user/hive/warehouse which is not present in my laptop.
Can we please have this reviewed and merged if the changes are good. Thanks in advance.

@yaooqinn
Copy link
Member Author

@dilipbiswal This is because the CliSessionState instance initialized in SparkCLISQLDriver pointing to a dummy metastore and reused later in the hive metastore client

@gatorsmile
Copy link
Member

cc @cloud-fan @jiangxb1987

@yaooqinn yaooqinn changed the title [SPARK-21428][SQL][FOLLOWUP] Reused state should respect warehouse dir determined in SharedState [SPARK-21428][SQL][FOLLOWUP]CliSessionState should point to the actual metastore not a dummy one Sep 12, 2017
@yaooqinn
Copy link
Member Author

cc @cloud-fan @jiangxb1987 @gatorsmile pr title and descriptions updated

@cloud-fan
Copy link
Contributor

OK to test

@cloud-fan
Copy link
Contributor

@dilipbiswal does your script work after this PR?

@cloud-fan
Copy link
Contributor

but the Hive configurations generated here just points to a dummy meta store

Why did this works well before?

@dilipbiswal
Copy link
Contributor

dilipbiswal commented Sep 13, 2017

@cloud-fan Yeah.. I have tried my script against this PR and it works fine. I am not familiar with the changes and don't know if it can have any side effects. One thing that haven't had the time to find out is in my script ..

1) spark-sql
    create database testdb;
2) exit spark-sql
3) spark-sql
    use testdb;  => I get a database not found error.

How did the create database succeed i.e i didn't get any error ? If it did, then where did it create the database at ? Perhaps you know the answer ? :-)

@dilipbiswal
Copy link
Contributor

@cloud-fan it didn't trigger the test ?

@yaooqinn
Copy link
Member Author

@cloud-fan The cliSessionState is meant to be reused but discarded for isolated hive client classloader couldn't get it through SessionState.get(), so hive client will generated a SessionState instance everytime while calling HiveClient.newSession(). In my foregoing pr, I made it reused but i didn't notice that it is just pointing to a dummy metatstore. This has to be fixed.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you add some comments to explain this logic?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, soon

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shouldn't the default value of extraConfig be Map.empty?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you add some comments for this method? Especially what's the difference between this one and hiveClientConfigurations.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hiveClientConfigurations is used to change those time values in the form of "1s"/ "10min" to long values, we may give it a more proper name

@yaooqinn
Copy link
Member Author

jenkins unreachable cc @cloud-fan

@cloud-fan
Copy link
Contributor

ok to test

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's not only time configs, I think we'd better call it config, following IsolatedClientLoader.config

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about we remove these default values and explicitly specify them in https://github.com/apache/spark/pull/19068/files#diff-f7aac41bf732c1ba1edbac436d331a55R84?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK

@SparkQA
Copy link

SparkQA commented Sep 13, 2017

Test build #81718 has finished for PR 19068 at commit 267a1b2.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Sep 13, 2017

Test build #81721 has finished for PR 19068 at commit 9682eab.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

extraConfig -> extraConfigs, and please update the description, it's not time configurations.

@cloud-fan
Copy link
Contributor

can you run ./build/sbt "hive/test" locally to make sure all hive tests pass?

@SparkQA
Copy link

SparkQA commented Sep 19, 2017

Test build #81906 has finished for PR 19068 at commit b21fc72.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@yaooqinn
Copy link
Member Author

@cloud-fan i met linkage err before, and now i simplify the logic, could you trigger jenkins before reverting

@SparkQA
Copy link

SparkQA commented Sep 19, 2017

Test build #81909 has finished for PR 19068 at commit f2618b9.

  • This patch fails Scala style tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@cloud-fan
Copy link
Contributor

retest this please

@SparkQA
Copy link

SparkQA commented Sep 19, 2017

Test build #81910 has finished for PR 19068 at commit c5c1c26.

  • This patch fails due to an unknown error code, -9.
  • This patch merges cleanly.
  • This patch adds no public classes.

@yaooqinn
Copy link
Member Author

retest this please

@SparkQA
Copy link

SparkQA commented Sep 19, 2017

Test build #81916 has finished for PR 19068 at commit c5c1c26.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@cloud-fan
Copy link
Contributor

LGTM, merging to master!

@asfgit asfgit closed this in 581200a Sep 19, 2017
@jinxing64
Copy link

@yaooqinn
This change works well for me, thanks for fix !
After this change, hive client for execution(points to a dummy local metastore) will never be used when running sql inspark-sql, hive client points a true metastore. Right ?
So why it is designed to have the hive client in SparkSQLCLIDriver points to a dummy local metastore before ? Is this change breaking some design?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants