-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-18659] [SQL] Incorrect behaviors in overwrite table for datasource tables #16088
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Test build #69432 has finished for PR 16088 at commit
|
| // partial partition spec. | ||
| partSpecs.foreach { p => | ||
| if (existingParts.contains(p) && shouldRemovePartitionLocation) { | ||
| if (existingParts.contains(p) && shouldRemovePartitionLocation && !retainData) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
shall we put the !retainData in the definition of shouldRemovePartitionLocation? which looks more logical
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
| ignoreIfNotExists: Boolean, | ||
| purge: Boolean): Unit = { | ||
| purge: Boolean, | ||
| retainData: Boolean): Unit = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
shall we provide a default value? looks like most of the time we want it to be false
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seemed a little safer to make it mandatory to avoid some component forgetting to propagate it.
| ignoreIfNotExists: Boolean, | ||
| purge: Boolean): Unit = withClient { | ||
| purge: Boolean, | ||
| deleteFiles: Boolean): Unit = withClient { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why change the parameter name here? it's called retainData right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed
|
Test build #69504 has finished for PR 16088 at commit
|
| AlterTableDropPartitionCommand( | ||
| l.catalogTable.get.identifier, deletedPartitions.toSeq, | ||
| ifExists = true, purge = true).run(t.sparkSession) | ||
| ifExists = true, purge = true, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
an unrelated question, when dropping partitions here, do we have to set purge = true?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it doesn't matter since we don't delete files.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
then can we set it to false? DROP PARTITION ... PURGE is not supported in hive 0.13, setting it to false can make the partition related functions still work in older hive versions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
|
LGTM pending jenkins |
|
Test build #69532 has finished for PR 16088 at commit
|
|
retest this please |
|
Test build #69537 has finished for PR 16088 at commit
|
|
Test build #69543 has finished for PR 16088 at commit
|
…rce tables ## What changes were proposed in this pull request? Two bugs are addressed here 1. INSERT OVERWRITE TABLE sometime crashed when catalog partition management was enabled. This was because when dropping partitions after an overwrite operation, the Hive client will attempt to delete the partition files. If the entire partition directory was dropped, this would fail. The PR fixes this by adding a flag to control whether the Hive client should attempt to delete files. 2. The static partition spec for OVERWRITE TABLE was not correctly resolved to the case-sensitive original partition names. This resulted in the entire table being overwritten if you did not correctly capitalize your partition names. cc yhuai cloud-fan ## How was this patch tested? Unit tests. Surprisingly, the existing overwrite table tests did not catch these edge cases. Author: Eric Liang <[email protected]> Closes #16088 from ericl/spark-18659. (cherry picked from commit 7935c84) Signed-off-by: Wenchen Fan <[email protected]>
|
thanks, merging to master/2.1! |
…rce tables ## What changes were proposed in this pull request? Two bugs are addressed here 1. INSERT OVERWRITE TABLE sometime crashed when catalog partition management was enabled. This was because when dropping partitions after an overwrite operation, the Hive client will attempt to delete the partition files. If the entire partition directory was dropped, this would fail. The PR fixes this by adding a flag to control whether the Hive client should attempt to delete files. 2. The static partition spec for OVERWRITE TABLE was not correctly resolved to the case-sensitive original partition names. This resulted in the entire table being overwritten if you did not correctly capitalize your partition names. cc yhuai cloud-fan ## How was this patch tested? Unit tests. Surprisingly, the existing overwrite table tests did not catch these edge cases. Author: Eric Liang <[email protected]> Closes apache#16088 from ericl/spark-18659.
…rce tables ## What changes were proposed in this pull request? Two bugs are addressed here 1. INSERT OVERWRITE TABLE sometime crashed when catalog partition management was enabled. This was because when dropping partitions after an overwrite operation, the Hive client will attempt to delete the partition files. If the entire partition directory was dropped, this would fail. The PR fixes this by adding a flag to control whether the Hive client should attempt to delete files. 2. The static partition spec for OVERWRITE TABLE was not correctly resolved to the case-sensitive original partition names. This resulted in the entire table being overwritten if you did not correctly capitalize your partition names. cc yhuai cloud-fan ## How was this patch tested? Unit tests. Surprisingly, the existing overwrite table tests did not catch these edge cases. Author: Eric Liang <[email protected]> Closes apache#16088 from ericl/spark-18659.
What changes were proposed in this pull request?
Two bugs are addressed here
cc @yhuai @cloud-fan
How was this patch tested?
Unit tests. Surprisingly, the existing overwrite table tests did not catch these edge cases.