[SPARK-3575][SQL][WIP] Removes the Metastore Parquet table conversion hack #3441

liancheng · 2014-11-25T03:46:20Z

This PR tries to remove the Metastore Parquet conversion hack by moving the conversion logic to HiveMetastoreCatalog, i.e., convert MetastoreRelation to ParquetRelation when we find that the Hive table is backed by Parquet, and make the converted ParquetRelation inherit the original Hive schema. As a side effect, spark.sql.parquet.binaryAsString can be removed since Metastore field types are now reserved.

TODO

Add tests
Fix partitioning tests failure
Code cleanup
Partition pruning

SparkQA · 2014-11-25T03:50:22Z

Test build #23813 has started for PR 3441 at commit 681cf87.

This patch merges cleanly.

SparkQA · 2014-11-25T04:26:21Z

Test build #23813 has finished for PR 3441 at commit 681cf87.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

AmplabJenkins · 2014-11-25T04:26:24Z

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/23813/
Test FAILed.

SparkQA · 2014-12-01T10:42:49Z

Test build #23987 has started for PR 3441 at commit 630330a.

This patch merges cleanly.

SparkQA · 2014-12-01T11:46:33Z

Test build #23987 has finished for PR 3441 at commit 630330a.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

AmplabJenkins · 2014-12-01T11:46:36Z

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/23987/
Test PASSed.

SparkQA · 2014-12-01T17:15:35Z

Test build #23992 has started for PR 3441 at commit 27963d1.

This patch merges cleanly.

SparkQA · 2014-12-01T17:19:05Z

Test build #23992 has finished for PR 3441 at commit 27963d1.

This patch fails to build.
This patch merges cleanly.
This patch adds no public classes.

AmplabJenkins · 2014-12-01T17:19:06Z

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/23992/
Test FAILed.

SparkQA · 2014-12-01T17:32:52Z

Test build #23993 has started for PR 3441 at commit 8d5a820.

This patch does not merge cleanly.

SparkQA · 2014-12-01T17:40:37Z

Test build #23995 has started for PR 3441 at commit f066fc0.

This patch merges cleanly.

SparkQA · 2014-12-01T18:47:39Z

Test build #23995 has finished for PR 3441 at commit f066fc0.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

AmplabJenkins · 2014-12-01T18:47:43Z

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/23995/
Test PASSed.

SparkQA · 2014-12-01T19:22:33Z

Test build #23993 has finished for PR 3441 at commit 8d5a820.

This patch passes all tests.
This patch does not merge cleanly.
This patch adds no public classes.

AmplabJenkins · 2014-12-01T19:22:37Z

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/23993/
Test PASSed.

marmbrus · 2014-12-01T20:17:46Z

sql/core/src/main/scala/org/apache/spark/sql/SQLConf.scala

Why are you removing this configuration option? Don't we still need it to read data that was written by old versions of Spark SQL / impala / hive?

In particular I'm thinking of using parquetFile to read data from impala where there is no metastore schema

Oh yes... Reverted this change.

SparkQA · 2014-12-02T03:15:09Z

Test build #24019 has started for PR 3441 at commit f6a587f.

This patch merges cleanly.

SparkQA · 2014-12-02T04:16:42Z

Test build #24019 has finished for PR 3441 at commit f6a587f.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

AmplabJenkins · 2014-12-02T04:16:45Z

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/24019/
Test PASSed.

…ough Hive This is a very small fix that catches one specific exception and returns an empty table. #3441 will address this in a more principled way. Author: Michael Armbrust <[email protected]> Closes #3586 from marmbrus/fixEmptyParquet and squashes the following commits: 2781d9f [Michael Armbrust] Handle empty lists for newParquet 04dd376 [Michael Armbrust] Avoid exception when reading empty parquet data through Hive (cherry picked from commit 513ef82) Signed-off-by: Michael Armbrust <[email protected]>

…ough Hive This is a very small fix that catches one specific exception and returns an empty table. #3441 will address this in a more principled way. Author: Michael Armbrust <[email protected]> Closes #3586 from marmbrus/fixEmptyParquet and squashes the following commits: 2781d9f [Michael Armbrust] Handle empty lists for newParquet 04dd376 [Michael Armbrust] Avoid exception when reading empty parquet data through Hive

marmbrus · 2014-12-11T00:57:53Z

sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveMetastoreCatalog.scala

Indent two spaces?

liancheng · 2015-01-28T21:18:57Z

Closing this as I'm working on another branch that removes the old Parquet implementation, and will cover the topic of this PR.

liancheng added 4 commits December 2, 2014 01:30

Removes the Parquet hacks

ab2d921

Fixes Parquet partitioning

fe31d51

Fixed ParquetMetastoreSuite

a578f1b

Code cleanup

f6a587f

liancheng force-pushed the fix-parquet-hack branch from 8d5a820 to f066fc0 Compare December 1, 2014 17:36

liancheng changed the title ~~[SPARK-3575][SQL][WIP] Removes the Metastore Parquet conversion hack~~ [SPARK-3575][SQL] Removes the Metastore Parquet conversion hack Dec 1, 2014

liancheng changed the title ~~[SPARK-3575][SQL] Removes the Metastore Parquet conversion hack~~ [SPARK-3575][SQL] Removes the Metastore Parquet table conversion hack Dec 1, 2014

marmbrus reviewed Dec 1, 2014
View reviewed changes

This was referenced Dec 1, 2014

[SPARK-4552][SQL] Query for empty parquet table in spark sql hive get IllegalArgumentException #3413

Closed

[SPARK-4553][SQL] Query for parquet table with string fields in spark sql hive get binary result #3414

Closed

liancheng force-pushed the fix-parquet-hack branch from f066fc0 to f6a587f Compare December 2, 2014 03:10

marmbrus mentioned this pull request Dec 3, 2014

[SPARK-4552][SQL] Avoid exception when reading empty parquet data through Hive #3586

Closed

marmbrus reviewed Dec 11, 2014
View reviewed changes

sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveMetastoreCatalog.scala

Copy link

Contributor

marmbrus Dec 11, 2014

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indent two spaces?

liancheng changed the title ~~[SPARK-3575][SQL] Removes the Metastore Parquet table conversion hack~~ [SPARK-3575][SQL][WIP] Removes the Metastore Parquet table conversion hack Dec 11, 2014

liancheng closed this Jan 28, 2015

liancheng deleted the fix-parquet-hack branch January 28, 2015 21:19

[SPARK-3575][SQL][WIP] Removes the Metastore Parquet table conversion hack #3441

[SPARK-3575][SQL][WIP] Removes the Metastore Parquet table conversion hack #3441

Uh oh!

Conversation

liancheng commented Nov 25, 2014

Uh oh!

SparkQA commented Nov 25, 2014

Uh oh!

SparkQA commented Nov 25, 2014

Uh oh!

AmplabJenkins commented Nov 25, 2014

Uh oh!

SparkQA commented Dec 1, 2014

Uh oh!

SparkQA commented Dec 1, 2014

Uh oh!

AmplabJenkins commented Dec 1, 2014

Uh oh!

SparkQA commented Dec 1, 2014

Uh oh!

SparkQA commented Dec 1, 2014

Uh oh!

AmplabJenkins commented Dec 1, 2014

Uh oh!

SparkQA commented Dec 1, 2014

Uh oh!

SparkQA commented Dec 1, 2014

Uh oh!

SparkQA commented Dec 1, 2014

Uh oh!

AmplabJenkins commented Dec 1, 2014

Uh oh!

SparkQA commented Dec 1, 2014

Uh oh!

AmplabJenkins commented Dec 1, 2014

Uh oh!

marmbrus Dec 1, 2014

Choose a reason for hiding this comment

Uh oh!

marmbrus Dec 2, 2014

Choose a reason for hiding this comment

Uh oh!

liancheng Dec 2, 2014

Choose a reason for hiding this comment

Uh oh!

SparkQA commented Dec 2, 2014

Uh oh!

SparkQA commented Dec 2, 2014

Uh oh!

AmplabJenkins commented Dec 2, 2014

Uh oh!

marmbrus Dec 11, 2014

Choose a reason for hiding this comment

Uh oh!

liancheng commented Jan 28, 2015

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants