-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-3575][SQL][WIP] Removes the Metastore Parquet table conversion hack #3441
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Test build #23813 has started for PR 3441 at commit
|
|
Test build #23813 has finished for PR 3441 at commit
|
|
Test FAILed. |
|
Test build #23987 has started for PR 3441 at commit
|
|
Test build #23987 has finished for PR 3441 at commit
|
|
Test PASSed. |
|
Test build #23992 has started for PR 3441 at commit
|
|
Test build #23992 has finished for PR 3441 at commit
|
|
Test FAILed. |
|
Test build #23993 has started for PR 3441 at commit
|
8d5a820 to
f066fc0
Compare
|
Test build #23995 has started for PR 3441 at commit
|
|
Test build #23995 has finished for PR 3441 at commit
|
|
Test PASSed. |
|
Test build #23993 has finished for PR 3441 at commit
|
|
Test PASSed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why are you removing this configuration option? Don't we still need it to read data that was written by old versions of Spark SQL / impala / hive?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In particular I'm thinking of using parquetFile to read data from impala where there is no metastore schema
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh yes... Reverted this change.
f066fc0 to
f6a587f
Compare
|
Test build #24019 has started for PR 3441 at commit
|
|
Test build #24019 has finished for PR 3441 at commit
|
|
Test PASSed. |
…ough Hive This is a very small fix that catches one specific exception and returns an empty table. #3441 will address this in a more principled way. Author: Michael Armbrust <[email protected]> Closes #3586 from marmbrus/fixEmptyParquet and squashes the following commits: 2781d9f [Michael Armbrust] Handle empty lists for newParquet 04dd376 [Michael Armbrust] Avoid exception when reading empty parquet data through Hive (cherry picked from commit 513ef82) Signed-off-by: Michael Armbrust <[email protected]>
…ough Hive This is a very small fix that catches one specific exception and returns an empty table. #3441 will address this in a more principled way. Author: Michael Armbrust <[email protected]> Closes #3586 from marmbrus/fixEmptyParquet and squashes the following commits: 2781d9f [Michael Armbrust] Handle empty lists for newParquet 04dd376 [Michael Armbrust] Avoid exception when reading empty parquet data through Hive
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indent two spaces?
|
Closing this as I'm working on another branch that removes the old Parquet implementation, and will cover the topic of this PR. |
This PR tries to remove the Metastore Parquet conversion hack by moving the conversion logic to
HiveMetastoreCatalog, i.e., convertMetastoreRelationtoParquetRelationwhen we find that the Hive table is backed by Parquet, and make the convertedParquetRelationinherit the original Hive schema. As a side effect,spark.sql.parquet.binaryAsStringcan be removed since Metastore field types are now reserved.TODO