Skip to content

Conversation

@szehon-ho
Copy link
Member

@szehon-ho szehon-ho commented Aug 3, 2021

  • From https://mail-archives.apache.org/mod_mbox/iceberg-dev/202107.mbox/%3CCAMwmD1_c4u%2BwDSOLtScCyid%2BfHicBnKdmyTuKqFTG-uaE9eq6A%40mail.gmail.com%3E and offline discussion with Ryan
  • This change will make "insert overwrite" with mode=dynamic support serializable isolation (a ValidationException will be thrown if a conflicting change is made) in the case where it is not selecting from the same table.
  • This will catch cases where two insert overwrite happen at the same time, both can potentially succeed and both datas land.
  • "insert overwrite" remains snapshot isolation in the case where the select is from the same table. For that to support serializable, we need to plug the snapshotId from initial Spark compile, which could be looked at on spark-extension side.

@szehon-ho szehon-ho changed the title Support serializable isolation for ReplacePartitions Core: Support serializable isolation for ReplacePartitions Aug 3, 2021
@szehon-ho
Copy link
Member Author

@rdblue if you could look when you have time, thanks

@szehon-ho szehon-ho force-pushed the serializable_replace branch from 57af2be to 2406ab9 Compare August 11, 2021 05:32
@szehon-ho
Copy link
Member Author

Updated, and also turned off serialzable isolation by default (can turn it on as well)

Copy link
Member Author

@szehon-ho szehon-ho left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rdblue thanks a lot for taking a look. I rewrote it based on your suggestion (from using Expression to using a filter of partitionSet::contains). Also added a toString method to PartitionSet to make the user-facing message.

@szehon-ho
Copy link
Member Author

@rdblue thanks for the suggestions. I put validate-from-snapshot as a SparkWriteOption, and added a Spark level test for it based on it, if you want to take another look.

@szehon-ho szehon-ho force-pushed the serializable_replace branch from e2c0b19 to df7d8aa Compare December 23, 2021 20:36
@szehon-ho szehon-ho force-pushed the serializable_replace branch from cc7043f to f6d2c97 Compare March 3, 2022 01:33
@szehon-ho szehon-ho force-pushed the serializable_replace branch from f6d2c97 to 4dbc301 Compare March 3, 2022 01:41
@szehon-ho
Copy link
Member Author

@aokolnychyi thanks for the review, addressed the comments and implemented snapshot isolation + tests based on the general agreement. Will also change the name of the pr to reflect

@szehon-ho szehon-ho changed the title Core: Support serializable isolation for ReplacePartitions Core: Support serializable and snapshot isolation for ReplacePartitions Mar 3, 2022
@aokolnychyi aokolnychyi merged commit 01c62cb into apache:master Mar 3, 2022
@aokolnychyi
Copy link
Contributor

Thanks, @szehon-ho! This is awesome work! Thanks others for reviewing too.

asfgit pushed a commit to apache/impala that referenced this pull request Jun 20, 2022
This commit bumps the CDP Build number to 27992803 and versions to
7.2.16.0-77. The new version includes
apache/iceberg#2925, an Iceberg library
improvement to support serializable and snapshot isolation for
ReplacePartitions. This is a prereuquisite to safely execute
INSERT OVERWRITE entire partitions.

Testing:
- Ran a build and verified that the downloads succeed.

Change-Id: I9fcef254a5f845540e09e0d8d2111f305fb2fea2
Reviewed-on: http://gerrit.cloudera.org:8080/18613
Reviewed-by: Tamas Mate <[email protected]>
Tested-by: Tamas Mate <[email protected]>
sunchao pushed a commit to sunchao/iceberg that referenced this pull request May 9, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants