[SPARK-13446] [SQL] Support reading data from Hive 2.0.1 metastore #17061

gatorsmile · 2017-02-25T02:24:33Z

What changes were proposed in this pull request?

This PR is to make Spark work with Hive 2.0's metastores. Compared with Hive 1.2, Hive 2.0's metastore has an API update due to removal of HOLD_DDLTIME in https://issues.apache.org/jira/browse/HIVE-12224. Based on the following Hive JIRA description, HOLD_DDLTIME should be removed from our internal API too. (#17063 was submitted for it):

This arcane feature was introduced long ago via HIVE-1394 It was broken as soon as it landed, HIVE-1442 and is thus useless. Fact that no one has fixed it since informs that its not really used by anyone. Better is to remove it so no one hits the bug of HIVE-1442

In the next PR, we will support 2.1.0 metastore, whose APIs were changed due to https://issues.apache.org/jira/browse/HIVE-12730. However, before that, we need a code cleanup for stats collection and setting.

How was this patch tested?

Added test cases to VersionsSuite.scala

SparkQA · 2017-02-25T03:52:16Z

Test build #73458 has finished for PR 17061 at commit 60af17f.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

gatorsmile · 2017-02-25T18:57:53Z

sql/hive/src/test/scala/org/apache/spark/sql/hive/client/VersionsSuite.scala

+      if (version == "2.0") {
+        hadoopConf.set("datanucleus.schema.autoCreateAll", "true")
+      }
+      client = buildClient(version, hadoopConf, HiveUtils.hiveClientConfigurations(hadoopConf))


Previously, we did not add the configuration generated by HiveUtils.hiveClientConfigurations. Here, we added it to see whether it could cause any test failure.

SparkQA · 2017-02-25T19:37:36Z

Test build #73469 has finished for PR 17061 at commit febb392.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2017-02-25T20:20:45Z

Test build #73470 has finished for PR 17061 at commit 9ea5850.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

gatorsmile · 2017-02-26T04:16:45Z

cc @cloud-fan @yhuai @sameeragarwal

vanzin · 2017-03-03T00:32:50Z

LGTM.

gatorsmile · 2017-03-03T01:51:17Z

Thank you! @vanzin

gatorsmile · 2017-03-03T19:14:48Z

retest this please

cloud-fan · 2017-03-03T19:30:25Z

sql/hive/src/main/scala/org/apache/spark/sql/hive/client/package.scala

        "net.hydromatic:quidem"))
+
+    case object v2_0 extends HiveVersion("2.0.1",
+      exclusions = Seq("eigenbase:eigenbase-properties",


is the exclusions still useful?

This was originally added because VersionsSuite would fail without it, since it uses maven/ivy to download the different Hive libraries (because they used to depend on snapshot releases of libraries, or libraries that don't exist anymore on maven central). Perhaps Hive has updated its dependencies enough that some of this can be cleaned up...

cloud-fan · 2017-03-03T19:33:00Z

we should also update InsertIntoHiveTable, it will be great if we can improve it to fail compilation when adding new hive versions.

cloud-fan · 2017-03-03T19:41:53Z

sql/hive/src/test/scala/org/apache/spark/sql/hive/client/VersionsSuite.scala

      hadoopConf.set("test", "success")
-      client = buildClient(version, hadoopConf)
+      if (version == "2.0") {
+        hadoopConf.set("datanucleus.schema.autoCreateAll", "true")


Hive changed the default from true to false since 2.0

datanucleus.autoCreateSchema Default Value: true Added In: Hive 0.7.0 Removed In: Hive 2.0.0 with HIVE-6113, replaced by datanucleus.schema.autoCreateAll Creates necessary schema on a startup if one does not exist. Set this to false, after creating it once. In Hive 0.12.0 and later releases, datanucleus.autoCreateSchema is disabled if hive.metastore.schema.verification is true.

datanucleus.schema.autoCreateAll Default Value: false Added In: Hive 2.0.0 with HIVE-6113, replaces datanucleus.autoCreateSchema (with different default value) Creates necessary schema on a startup if one does not exist. Reset this to false, after creating it once. datanucleus.schema.autoCreateAll is disabled if hive.metastore.schema.verification is true.

Without changing the flag, we will get the following error

14:59:39.253 WARN DataNucleus.Query: Query for candidates of org.apache.hadoop.hive.metastore.model.MDatabase and subclasses resulted in no possible candidates Required table missing : "DBS" in Catalog "" Schema "". DataNucleus requires this table to perform its persistence operations. Either your MetaData is incorrect, or you need to enable "datanucleus.schema.autoCreateTables" org.datanucleus.store.rdbms.exceptions.MissingTableException: Required table missing : "DBS" in Catalog "" Schema "". DataNucleus requires this table to perform its persistence operations. Either your MetaData is incorrect, or you need to enable "datanucleus.schema.autoCreateTables" at org.datanucleus.store.rdbms.table.AbstractTable.exists(AbstractTable.java:606) at org.datanucleus.store.rdbms.RDBMSStoreManager$ClassAdder.performTablesValidation(RDBMSStoreManager.java:3365) at org.datanucleus.store.rdbms.RDBMSStoreManager$ClassAdder.run(RDBMSStoreManager.java:2877)

See the JIRA: https://issues.apache.org/jira/browse/HIVE-6113

let's add a comment so that future readers can know this

Sure. Will do.

SparkQA · 2017-03-03T20:40:21Z

Test build #73860 has finished for PR 17061 at commit 9ea5850.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

cloud-fan · 2017-03-03T23:34:46Z

LGTM, pending tests

SparkQA · 2017-03-04T00:55:30Z

Test build #73872 has finished for PR 17061 at commit b713a81.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

cloud-fan · 2017-03-04T01:00:19Z

thanks, merging to master!

wip

60af17f

gatorsmile changed the title ~~[SPARK-13446] [SQL] Support reading data from Hive 2.0.0 metastore [WIP]~~ [SPARK-13446] [SQL] Support reading data from Hive 2.0.1 metastore [WIP] Feb 25, 2017

gatorsmile added 3 commits February 25, 2017 09:52

Merge remote-tracking branch 'upstream/master' into Hive2

947e500

clean

febb392

add the conf when creating clients in versionSuite

9ea5850

gatorsmile commented Feb 25, 2017

View reviewed changes

gatorsmile changed the title ~~[SPARK-13446] [SQL] Support reading data from Hive 2.0.1 metastore [WIP]~~ [SPARK-13446] [SQL] Support reading data from Hive 2.0.1 metastore Feb 26, 2017

cloud-fan reviewed Mar 3, 2017

View reviewed changes

address comments.

b713a81

asfgit closed this in f5fdbe0 Mar 4, 2017

[SPARK-13446] [SQL] Support reading data from Hive 2.0.1 metastore #17061

[SPARK-13446] [SQL] Support reading data from Hive 2.0.1 metastore #17061

Uh oh!

Conversation

gatorsmile commented Feb 25, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What changes were proposed in this pull request?

How was this patch tested?

Uh oh!

SparkQA commented Feb 25, 2017

Uh oh!

gatorsmile Feb 25, 2017

Choose a reason for hiding this comment

Uh oh!

SparkQA commented Feb 25, 2017

Uh oh!

SparkQA commented Feb 25, 2017

Uh oh!

gatorsmile commented Feb 26, 2017

Uh oh!

vanzin commented Mar 3, 2017

Uh oh!

gatorsmile commented Mar 3, 2017

Uh oh!

gatorsmile commented Mar 3, 2017

Uh oh!

cloud-fan Mar 3, 2017

Choose a reason for hiding this comment

Uh oh!

vanzin Mar 3, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

cloud-fan commented Mar 3, 2017

Uh oh!

cloud-fan Mar 3, 2017

Choose a reason for hiding this comment

Uh oh!

gatorsmile Mar 3, 2017

Choose a reason for hiding this comment

Uh oh!

gatorsmile Mar 3, 2017

Choose a reason for hiding this comment

Uh oh!

cloud-fan Mar 3, 2017

Choose a reason for hiding this comment

Uh oh!

gatorsmile Mar 3, 2017

Choose a reason for hiding this comment

Uh oh!

SparkQA commented Mar 3, 2017

Uh oh!

cloud-fan commented Mar 3, 2017

Uh oh!

SparkQA commented Mar 4, 2017

Uh oh!

cloud-fan commented Mar 4, 2017

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

gatorsmile commented Feb 25, 2017 •

edited

Loading

vanzin Mar 3, 2017 •

edited

Loading