-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-13446] [SQL] Support reading data from Hive 2.0.1 metastore #17061
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Test build #73458 has finished for PR 17061 at commit
|
| if (version == "2.0") { | ||
| hadoopConf.set("datanucleus.schema.autoCreateAll", "true") | ||
| } | ||
| client = buildClient(version, hadoopConf, HiveUtils.hiveClientConfigurations(hadoopConf)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Previously, we did not add the configuration generated by HiveUtils.hiveClientConfigurations. Here, we added it to see whether it could cause any test failure.
|
Test build #73469 has finished for PR 17061 at commit
|
|
Test build #73470 has finished for PR 17061 at commit
|
|
LGTM. |
|
Thank you! @vanzin |
|
retest this please |
| "net.hydromatic:quidem")) | ||
|
|
||
| case object v2_0 extends HiveVersion("2.0.1", | ||
| exclusions = Seq("eigenbase:eigenbase-properties", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is the exclusions still useful?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was originally added because VersionsSuite would fail without it, since it uses maven/ivy to download the different Hive libraries (because they used to depend on snapshot releases of libraries, or libraries that don't exist anymore on maven central). Perhaps Hive has updated its dependencies enough that some of this can be cleaned up...
|
we should also update |
| hadoopConf.set("test", "success") | ||
| client = buildClient(version, hadoopConf) | ||
| if (version == "2.0") { | ||
| hadoopConf.set("datanucleus.schema.autoCreateAll", "true") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hive changed the default from true to false since 2.0
datanucleus.autoCreateSchema
Default Value: true
Added In: Hive 0.7.0
Removed In: Hive 2.0.0 with HIVE-6113, replaced by datanucleus.schema.autoCreateAll
Creates necessary schema on a startup if one does not exist. Set this to false, after creating it once.
In Hive 0.12.0 and later releases, datanucleus.autoCreateSchema is disabled if hive.metastore.schema.verification is true.
datanucleus.schema.autoCreateAll
Default Value: false
Added In: Hive 2.0.0 with HIVE-6113, replaces datanucleus.autoCreateSchema (with different default value)
Creates necessary schema on a startup if one does not exist. Reset this to false, after creating it once.
datanucleus.schema.autoCreateAll is disabled if hive.metastore.schema.verification is true.
Without changing the flag, we will get the following error
14:59:39.253 WARN DataNucleus.Query: Query for candidates of org.apache.hadoop.hive.metastore.model.MDatabase and subclasses resulted in no possible candidates
Required table missing : "DBS" in Catalog "" Schema "". DataNucleus requires this table to perform its persistence operations. Either your MetaData is incorrect, or you need to enable "datanucleus.schema.autoCreateTables"
org.datanucleus.store.rdbms.exceptions.MissingTableException: Required table missing : "DBS" in Catalog "" Schema "". DataNucleus requires this table to perform its persistence operations. Either your MetaData is incorrect, or you need to enable "datanucleus.schema.autoCreateTables"
at org.datanucleus.store.rdbms.table.AbstractTable.exists(AbstractTable.java:606)
at org.datanucleus.store.rdbms.RDBMSStoreManager$ClassAdder.performTablesValidation(RDBMSStoreManager.java:3365)
at org.datanucleus.store.rdbms.RDBMSStoreManager$ClassAdder.run(RDBMSStoreManager.java:2877)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See the JIRA: https://issues.apache.org/jira/browse/HIVE-6113
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let's add a comment so that future readers can know this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure. Will do.
|
Test build #73860 has finished for PR 17061 at commit
|
|
LGTM, pending tests |
|
Test build #73872 has finished for PR 17061 at commit
|
|
thanks, merging to master! |
What changes were proposed in this pull request?
This PR is to make Spark work with Hive 2.0's metastores. Compared with Hive 1.2, Hive 2.0's metastore has an API update due to removal of
HOLD_DDLTIMEin https://issues.apache.org/jira/browse/HIVE-12224. Based on the following Hive JIRA description,HOLD_DDLTIMEshould be removed from our internal API too. (#17063 was submitted for it):In the next PR, we will support 2.1.0 metastore, whose APIs were changed due to https://issues.apache.org/jira/browse/HIVE-12730. However, before that, we need a code cleanup for stats collection and setting.
How was this patch tested?
Added test cases to VersionsSuite.scala