[SPARK-38215][SQL] InsertIntoHiveDir should use data source if it's convertible#35528
Closed
AngersZhuuuu wants to merge 4 commits intoapache:masterfrom
Closed
[SPARK-38215][SQL] InsertIntoHiveDir should use data source if it's convertible#35528AngersZhuuuu wants to merge 4 commits intoapache:masterfrom
AngersZhuuuu wants to merge 4 commits intoapache:masterfrom
Conversation
Contributor
Author
|
Gentle ping @cloud-fan @viirya Could you take a review? it's a useful feature. |
Contributor
Author
|
Also ping @dongjoon-hyun @HyukjinKwon |
cloud-fan
reviewed
Feb 17, 2022
|
|
||
| private def convertProvider(storage: CatalogStorageFormat): String = { | ||
| val serde = storage.serde.getOrElse("").toLowerCase(Locale.ROOT) | ||
| Some("parquet").filter(serde.contains).getOrElse("orc") |
Contributor
There was a problem hiding this comment.
nit:
if (serde.contains("parquet")) parquet else orc
is much simpler
Contributor
Author
There was a problem hiding this comment.
if (serde.contains("parquet")) parquet else orc
updated
cloud-fan
approved these changes
Feb 17, 2022
AngersZhuuuu
commented
Feb 18, 2022
| * - When writing to partitioned Hive-serde Parquet/Orc tables when | ||
| * `spark.sql.hive.convertInsertingPartitionedTable` is true | ||
| * - When writing to directory with Hive-serde | ||
| * - When writing to non-partitioned Hive-serde Parquet/ORC tables using CTAS |
Contributor
Author
There was a problem hiding this comment.
@cloud-fan Update the comment of this rule, also add comment about CTAS
Contributor
Author
|
Gentle ping @cloud-fan GA passed. |
cloud-fan
approved these changes
Feb 18, 2022
Contributor
|
thanks, merging to master! |
|
@AngersZhuuuu Hi,in the case of inserted dir has same path as selected table location, this may cause error. https://issues.apache.org/jira/browse/SPARK-38215 |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What changes were proposed in this pull request?
Currently spark sql
can't be converted to use InsertIntoDataSourceCommand, still use Hive SerDe to write data, this cause we can't use feature provided by new parquet/orc version, such as zstd compress.
Why are the changes needed?
Convert InsertIntoHiveDirCommand to InsertIntoDataSourceCommand can support more features of parquet/orc
Does this PR introduce any user-facing change?
No
How was this patch tested?
Added UT