Skip to content

Conversation

@gatorsmile
Copy link
Member

@gatorsmile gatorsmile commented Feb 25, 2017

What changes were proposed in this pull request?

This PR is to make Spark work with Hive 2.0's metastores. Compared with Hive 1.2, Hive 2.0's metastore has an API update due to removal of HOLD_DDLTIME in https://issues.apache.org/jira/browse/HIVE-12224. Based on the following Hive JIRA description, HOLD_DDLTIME should be removed from our internal API too. (#17063 was submitted for it):

This arcane feature was introduced long ago via HIVE-1394 It was broken as soon as it landed, HIVE-1442 and is thus useless. Fact that no one has fixed it since informs that its not really used by anyone. Better is to remove it so no one hits the bug of HIVE-1442

In the next PR, we will support 2.1.0 metastore, whose APIs were changed due to https://issues.apache.org/jira/browse/HIVE-12730. However, before that, we need a code cleanup for stats collection and setting.

How was this patch tested?

Added test cases to VersionsSuite.scala

@gatorsmile gatorsmile changed the title [SPARK-13446] [SQL] Support reading data from Hive 2.0.0 metastore [WIP] [SPARK-13446] [SQL] Support reading data from Hive 2.0.1 metastore [WIP] Feb 25, 2017
@SparkQA
Copy link

SparkQA commented Feb 25, 2017

Test build #73458 has finished for PR 17061 at commit 60af17f.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

if (version == "2.0") {
hadoopConf.set("datanucleus.schema.autoCreateAll", "true")
}
client = buildClient(version, hadoopConf, HiveUtils.hiveClientConfigurations(hadoopConf))
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Previously, we did not add the configuration generated by HiveUtils.hiveClientConfigurations. Here, we added it to see whether it could cause any test failure.

@SparkQA
Copy link

SparkQA commented Feb 25, 2017

Test build #73469 has finished for PR 17061 at commit febb392.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Feb 25, 2017

Test build #73470 has finished for PR 17061 at commit 9ea5850.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@gatorsmile gatorsmile changed the title [SPARK-13446] [SQL] Support reading data from Hive 2.0.1 metastore [WIP] [SPARK-13446] [SQL] Support reading data from Hive 2.0.1 metastore Feb 26, 2017
@gatorsmile
Copy link
Member Author

cc @cloud-fan @yhuai @sameeragarwal

@vanzin
Copy link
Contributor

vanzin commented Mar 3, 2017

LGTM.

@gatorsmile
Copy link
Member Author

Thank you! @vanzin

@gatorsmile
Copy link
Member Author

retest this please

"net.hydromatic:quidem"))

case object v2_0 extends HiveVersion("2.0.1",
exclusions = Seq("eigenbase:eigenbase-properties",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is the exclusions still useful?

Copy link
Contributor

@vanzin vanzin Mar 3, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was originally added because VersionsSuite would fail without it, since it uses maven/ivy to download the different Hive libraries (because they used to depend on snapshot releases of libraries, or libraries that don't exist anymore on maven central). Perhaps Hive has updated its dependencies enough that some of this can be cleaned up...

@cloud-fan
Copy link
Contributor

we should also update InsertIntoHiveTable, it will be great if we can improve it to fail compilation when adding new hive versions.

hadoopConf.set("test", "success")
client = buildClient(version, hadoopConf)
if (version == "2.0") {
hadoopConf.set("datanucleus.schema.autoCreateAll", "true")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why this?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hive changed the default from true to false since 2.0

datanucleus.autoCreateSchema
Default Value: true
Added In: Hive 0.7.0
Removed In: Hive 2.0.0 with HIVE-6113, replaced by datanucleus.schema.autoCreateAll
Creates necessary schema on a startup if one does not exist. Set this to false, after creating it once.
In Hive 0.12.0 and later releases, datanucleus.autoCreateSchema is disabled if hive.metastore.schema.verification is true.
datanucleus.schema.autoCreateAll
Default Value: false
Added In: Hive 2.0.0 with HIVE-6113, replaces datanucleus.autoCreateSchema (with different default value)
Creates necessary schema on a startup if one does not exist. Reset this to false, after creating it once.
datanucleus.schema.autoCreateAll is disabled if hive.metastore.schema.verification is true.

Without changing the flag, we will get the following error

14:59:39.253 WARN DataNucleus.Query: Query for candidates of org.apache.hadoop.hive.metastore.model.MDatabase and subclasses resulted in no possible candidates
Required table missing : "DBS" in Catalog "" Schema "". DataNucleus requires this table to perform its persistence operations. Either your MetaData is incorrect, or you need to enable "datanucleus.schema.autoCreateTables"
org.datanucleus.store.rdbms.exceptions.MissingTableException: Required table missing : "DBS" in Catalog "" Schema "". DataNucleus requires this table to perform its persistence operations. Either your MetaData is incorrect, or you need to enable "datanucleus.schema.autoCreateTables"
	at org.datanucleus.store.rdbms.table.AbstractTable.exists(AbstractTable.java:606)
	at org.datanucleus.store.rdbms.RDBMSStoreManager$ClassAdder.performTablesValidation(RDBMSStoreManager.java:3365)
	at org.datanucleus.store.rdbms.RDBMSStoreManager$ClassAdder.run(RDBMSStoreManager.java:2877)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's add a comment so that future readers can know this

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure. Will do.

@SparkQA
Copy link

SparkQA commented Mar 3, 2017

Test build #73860 has finished for PR 17061 at commit 9ea5850.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@cloud-fan
Copy link
Contributor

LGTM, pending tests

@SparkQA
Copy link

SparkQA commented Mar 4, 2017

Test build #73872 has finished for PR 17061 at commit b713a81.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@cloud-fan
Copy link
Contributor

thanks, merging to master!

@asfgit asfgit closed this in f5fdbe0 Mar 4, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants