-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-21428][SQL][FOLLOWUP]CliSessionState should point to the actual metastore not a dummy one #19068
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
when is useInMemoryDerby false?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@cloud-fan updated. i notice that hiveConf initialized by executionHive way does not load hive-site.xml, so i changed it to meta Hive way
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This function not only constructs the warehouse path, it also creates many other configs, why is it safe to just ignore em?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
before #SPARK-21428, cliSessionState will be left behind; now we will reuse it. this func is used to organize execution hive configs, but now cliSessionState has to talk with metastore.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think you have answered the question here - for example, the original code will update the HiveConf.ConfVars.METASTORECONNECTURLKEY property here, but you don't touch that after the change. I'm not confident to make that behavior change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Change this key to connect a dummy metastore instead of the real one?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
METASTORECONNECTURLKEY connects derby by default, we may set this key in hive-site.xml to talk with metastore. hiveclient will use the state init here, I don't think we should beinit hive conf as the old way.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jiangxb1987 sorry for not @ you, please take a look again
|
Hello, I am hitting this issue. Actually this seems like a regression as my script which was working before is no longer working. Here is my scenario. I am testing this on my laptop where there is no hive installed. In this case we end up pointing to |
|
@dilipbiswal This is because the CliSessionState instance initialized in SparkCLISQLDriver pointing to a dummy metastore and reused later in the hive metastore client |
|
cc @cloud-fan @jiangxb1987 @gatorsmile pr title and descriptions updated |
|
OK to test |
|
@dilipbiswal does your script work after this PR? |
Why did this works well before? |
|
@cloud-fan Yeah.. I have tried my script against this PR and it works fine. I am not familiar with the changes and don't know if it can have any side effects. One thing that haven't had the time to find out is in my script .. How did the create database succeed i.e i didn't get any error ? If it did, then where did it create the database at ? Perhaps you know the answer ? :-) |
|
@cloud-fan it didn't trigger the test ? |
|
@cloud-fan The cliSessionState is meant to be reused but discarded for isolated hive client classloader couldn't get it through |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you add some comments to explain this logic?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, soon
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
shouldn't the default value of extraConfig be Map.empty?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This doesn't change the original logic, check https://github.com/apache/spark/blob/master/sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveUtils.scala#L283.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you add some comments for this method? Especially what's the difference between this one and hiveClientConfigurations.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hiveClientConfigurations is used to change those time values in the form of "1s"/ "10min" to long values, we may give it a more proper name
|
jenkins unreachable cc @cloud-fan |
|
ok to test |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it's not only time configs, I think we'd better call it config, following IsolatedClientLoader.config
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about we remove these default values and explicitly specify them in https://github.com/apache/spark/pull/19068/files#diff-f7aac41bf732c1ba1edbac436d331a55R84?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK
|
Test build #81718 has finished for PR 19068 at commit
|
|
Test build #81721 has finished for PR 19068 at commit
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
extraConfig -> extraConfigs, and please update the description, it's not time configurations.
|
can you run |
|
Test build #81906 has finished for PR 19068 at commit
|
|
@cloud-fan i met linkage err before, and now i simplify the logic, could you trigger jenkins before reverting |
|
Test build #81909 has finished for PR 19068 at commit
|
|
retest this please |
|
Test build #81910 has finished for PR 19068 at commit
|
|
retest this please |
|
Test build #81916 has finished for PR 19068 at commit
|
|
LGTM, merging to master! |
|
@yaooqinn |
What changes were proposed in this pull request?
While running bin/spark-sql, we will reuse cliSessionState, but the Hive configurations generated here just points to a dummy meta store which actually should be the real one. And the warehouse is determined later in SharedState, HiveClient should respect this config changing in this case too.
How was this patch tested?
existing ut
cc @cloud-fan @jiangxb1987