-
Notifications
You must be signed in to change notification settings - Fork 3k
Core: Classify RowDelta with data files only as APPEND #14581
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
When RowDelta only adds data files without adding delete files or deleting data files, it should be classified as an APPEND operation instead of OVERWRITE. This is similiar to the existing logic in OverwriteFiles: https://github.com/apache/iceberg/blame/main/core/src/main/java/org/apache/iceberg/BaseOverwriteFiles.java#L56
|
|
||
| @TestTemplate | ||
| public void addOnlyDataFilesProducesAppendOperation() { | ||
| SnapshotUpdate<?> rowDelta = table.newRowDelta().addRows(FILE_A).addRows(FILE_B); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how about adding a case simulating this as well : https://github.com/apache/iceberg/pull/14581/files#r2524111562
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 on adding this test case
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added. Please see dcd7e73.
huaxingao
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
|
Merged to main. |
|
Thank you everyone for the reviews! And thanks for merging @pvary. |
When RowDelta only adds data files without adding delete files or deleting data files, the snapshot should be classified as an APPEND operation instead of OVERWRITE. The resulting operation is quite important because readers which read only append snapshots won't process overwrite snapshots.
The patch here is similar to the existing logic in OverwriteFiles: https://github.com/apache/iceberg/blame/main/core/src/main/java/org/apache/iceberg/BaseOverwriteFiles.java#L56
Merging this would avoid patches like #14559, where separate code paths are required for the overwrite and the append path.