Skip to content

Conversation

@liancheng
Copy link
Contributor

This PR tries to remove the Metastore Parquet conversion hack by moving the conversion logic to HiveMetastoreCatalog, i.e., convert MetastoreRelation to ParquetRelation when we find that the Hive table is backed by Parquet, and make the converted ParquetRelation inherit the original Hive schema. As a side effect, spark.sql.parquet.binaryAsString can be removed since Metastore field types are now reserved.

TODO

  • Add tests
  • Fix partitioning tests failure
  • Code cleanup
  • Partition pruning

Review on Reviewable

@SparkQA
Copy link

SparkQA commented Nov 25, 2014

Test build #23813 has started for PR 3441 at commit 681cf87.

  • This patch merges cleanly.

@SparkQA
Copy link

SparkQA commented Nov 25, 2014

Test build #23813 has finished for PR 3441 at commit 681cf87.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/23813/
Test FAILed.

@SparkQA
Copy link

SparkQA commented Dec 1, 2014

Test build #23987 has started for PR 3441 at commit 630330a.

  • This patch merges cleanly.

@SparkQA
Copy link

SparkQA commented Dec 1, 2014

Test build #23987 has finished for PR 3441 at commit 630330a.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/23987/
Test PASSed.

@SparkQA
Copy link

SparkQA commented Dec 1, 2014

Test build #23992 has started for PR 3441 at commit 27963d1.

  • This patch merges cleanly.

@SparkQA
Copy link

SparkQA commented Dec 1, 2014

Test build #23992 has finished for PR 3441 at commit 27963d1.

  • This patch fails to build.
  • This patch merges cleanly.
  • This patch adds no public classes.

@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/23992/
Test FAILed.

@SparkQA
Copy link

SparkQA commented Dec 1, 2014

Test build #23993 has started for PR 3441 at commit 8d5a820.

  • This patch does not merge cleanly.

@SparkQA
Copy link

SparkQA commented Dec 1, 2014

Test build #23995 has started for PR 3441 at commit f066fc0.

  • This patch merges cleanly.

@liancheng liancheng changed the title [SPARK-3575][SQL][WIP] Removes the Metastore Parquet conversion hack [SPARK-3575][SQL] Removes the Metastore Parquet conversion hack Dec 1, 2014
@liancheng liancheng changed the title [SPARK-3575][SQL] Removes the Metastore Parquet conversion hack [SPARK-3575][SQL] Removes the Metastore Parquet table conversion hack Dec 1, 2014
@SparkQA
Copy link

SparkQA commented Dec 1, 2014

Test build #23995 has finished for PR 3441 at commit f066fc0.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/23995/
Test PASSed.

@SparkQA
Copy link

SparkQA commented Dec 1, 2014

Test build #23993 has finished for PR 3441 at commit 8d5a820.

  • This patch passes all tests.
  • This patch does not merge cleanly.
  • This patch adds no public classes.

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/23993/
Test PASSed.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are you removing this configuration option? Don't we still need it to read data that was written by old versions of Spark SQL / impala / hive?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In particular I'm thinking of using parquetFile to read data from impala where there is no metastore schema

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh yes... Reverted this change.

@SparkQA
Copy link

SparkQA commented Dec 2, 2014

Test build #24019 has started for PR 3441 at commit f6a587f.

  • This patch merges cleanly.

@SparkQA
Copy link

SparkQA commented Dec 2, 2014

Test build #24019 has finished for PR 3441 at commit f6a587f.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/24019/
Test PASSed.

asfgit pushed a commit that referenced this pull request Dec 3, 2014
…ough Hive

This is a very small fix that catches one specific exception and returns an empty table.  #3441 will address this in a more principled way.

Author: Michael Armbrust <[email protected]>

Closes #3586 from marmbrus/fixEmptyParquet and squashes the following commits:

2781d9f [Michael Armbrust] Handle empty lists for newParquet
04dd376 [Michael Armbrust] Avoid exception when reading empty parquet data through Hive

(cherry picked from commit 513ef82)
Signed-off-by: Michael Armbrust <[email protected]>
asfgit pushed a commit that referenced this pull request Dec 3, 2014
…ough Hive

This is a very small fix that catches one specific exception and returns an empty table.  #3441 will address this in a more principled way.

Author: Michael Armbrust <[email protected]>

Closes #3586 from marmbrus/fixEmptyParquet and squashes the following commits:

2781d9f [Michael Armbrust] Handle empty lists for newParquet
04dd376 [Michael Armbrust] Avoid exception when reading empty parquet data through Hive
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indent two spaces?

@liancheng liancheng changed the title [SPARK-3575][SQL] Removes the Metastore Parquet table conversion hack [SPARK-3575][SQL][WIP] Removes the Metastore Parquet table conversion hack Dec 11, 2014
@liancheng
Copy link
Contributor Author

Closing this as I'm working on another branch that removes the old Parquet implementation, and will cover the topic of this PR.

@liancheng liancheng closed this Jan 28, 2015
@liancheng liancheng deleted the fix-parquet-hack branch January 28, 2015 21:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants