-
Notifications
You must be signed in to change notification settings - Fork 3k
Spark: Add isolation level support to DataFrame.overwrite(filter) API #4293
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Spark: Add isolation level support to DataFrame.overwrite(filter) API #4293
Conversation
spark/v3.2/spark/src/main/java/org/apache/iceberg/spark/source/SparkWrite.java
Show resolved
Hide resolved
|
Let me take a look. Sorry for the delay. |
That's a bug. Let me fix it. |
aokolnychyi
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This change looks correct to me. I had a few nits.
I'll fix OverwriteFiles and it would be great to submit changes to the dynamic overwrites in a separate PR to limit this one to static overwrites.
...ark-extensions/src/test/java/org/apache/iceberg/spark/extensions/TestConflictValidation.java
Outdated
Show resolved
Hide resolved
...ark-extensions/src/test/java/org/apache/iceberg/spark/extensions/TestConflictValidation.java
Outdated
Show resolved
Hide resolved
...ark-extensions/src/test/java/org/apache/iceberg/spark/extensions/TestConflictValidation.java
Outdated
Show resolved
Hide resolved
...ark-extensions/src/test/java/org/apache/iceberg/spark/extensions/TestConflictValidation.java
Outdated
Show resolved
Hide resolved
...ark-extensions/src/test/java/org/apache/iceberg/spark/extensions/TestConflictValidation.java
Outdated
Show resolved
Hide resolved
...ark-extensions/src/test/java/org/apache/iceberg/spark/extensions/TestConflictValidation.java
Show resolved
Hide resolved
spark/v3.2/spark/src/main/java/org/apache/iceberg/spark/source/SparkWrite.java
Outdated
Show resolved
Hide resolved
spark/v3.2/spark/src/main/java/org/apache/iceberg/spark/source/SparkWrite.java
Outdated
Show resolved
Hide resolved
spark/v3.2/spark/src/main/java/org/apache/iceberg/spark/source/SparkWrite.java
Show resolved
Hide resolved
d4e6bae to
4f793e7
Compare
|
@aokolnychyi thanks, will create another pr soon for the refactoring/cleanup of existing code to be consistent with this one. Added another test for deleted data files after fix of #3581 , works now. |
|
Restarting as it looks like a temporary build artifact download failure |
|
Thanks, @szehon-ho! Sorry it took so long to get back to this PR. |
#2925 adds isolation level support to the core Iceberg API ReplacePartitions, and also exposes it via Spark DataFrame.overwritePartitions() API.
This change is to extend isolation level support to the Spark DataFrame.overwrite(filter) API for symmetry. The underlying core Iceberg API (OverwriteFiles) already supported isolation level validation in this case, so the change is smaller.
One observation, DF.overwrite(filter) will be less aggressive than DF.overwritePartitions() in concurrent validation due to the two different API code paths. OverwriteFiles checks exactly for the file that will be re-written, so it will not throw an exception if another file was deleted in the same partition. This is unlike ReplacePartitions API which throws exception if any file was deleted in the same partition, as it does not keep track of files but rather whole partitions.