-
Notifications
You must be signed in to change notification settings - Fork 3k
Core: Support rewriting delete files. #2294
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 6 commits
370a365
06492ad
0f4edf3
3475382
6f7ede4
fe98e01
e84022e
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -40,19 +40,55 @@ protected String operation() { | |
| return DataOperations.REPLACE; | ||
| } | ||
|
|
||
| private void verifyInputAndOutputFiles(Set<DataFile> dataFilesToDelete, Set<DeleteFile> deleteFilesToDelete, | ||
| Set<DataFile> dataFilesToAdd, Set<DeleteFile> deleteFilesToAdd) { | ||
| int filesToDelete = 0; | ||
| if (dataFilesToDelete != null) { | ||
| filesToDelete += dataFilesToDelete.size(); | ||
| } | ||
|
|
||
| if (deleteFilesToDelete != null) { | ||
| filesToDelete += deleteFilesToDelete.size(); | ||
| } | ||
|
|
||
| Preconditions.checkArgument(filesToDelete > 0, "Files to delete cannot be null or empty"); | ||
|
|
||
| if (deleteFilesToDelete == null || deleteFilesToDelete.isEmpty()) { | ||
| // When there is no delete files in the rewrite action, data files to add cannot be null or empty. | ||
| Preconditions.checkArgument(dataFilesToAdd != null && dataFilesToAdd.size() > 0, | ||
| "Data files to add can not be null or empty because there's no delete file to be rewritten"); | ||
| Preconditions.checkArgument(deleteFilesToAdd == null || deleteFilesToAdd.isEmpty(), | ||
| "Delete files to add must be null or empty because there's no delete file to be rewritten"); | ||
| } | ||
| } | ||
|
|
||
| @Override | ||
| public RewriteFiles rewriteFiles(Set<DataFile> filesToDelete, Set<DataFile> filesToAdd) { | ||
| Preconditions.checkArgument(filesToDelete != null && !filesToDelete.isEmpty(), | ||
| "Files to delete cannot be null or empty"); | ||
| Preconditions.checkArgument(filesToAdd != null && !filesToAdd.isEmpty(), | ||
|
Comment on lines
-45
to
-47
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Looks like we are changing the logic now to not enforce input sets to be non-nullable? I think for the new code we can do a precondition check on the four input sets to ensure they are all non-null, to save all the null check everywhere
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Yes, in current version we are required to pass non-empty and non-null
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I agree that they can be empty, I guess my major point is that since it is the new API we introduced here that allows empty input, we can enforce inputs to be not null by adding a precondition check at the beginning of the method to fail if null is passed in, so that we don't have to do all the
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. OK, I think it's good to simplify this null check, just updated this patch. |
||
| "Files to add can not be null or empty"); | ||
|
|
||
| for (DataFile toDelete : filesToDelete) { | ||
| delete(toDelete); | ||
| public RewriteFiles rewriteFiles(Set<DataFile> dataFilesToDelete, Set<DeleteFile> deleteFilesToDelete, | ||
| Set<DataFile> dataFilesToAdd, Set<DeleteFile> deleteFilesToAdd) { | ||
| verifyInputAndOutputFiles(dataFilesToDelete, deleteFilesToDelete, dataFilesToAdd, deleteFilesToAdd); | ||
|
|
||
| if (dataFilesToDelete != null) { | ||
| for (DataFile dataFile : dataFilesToDelete) { | ||
| delete(dataFile); | ||
| } | ||
| } | ||
|
|
||
| if (deleteFilesToDelete != null) { | ||
| for (DeleteFile deleteFile : deleteFilesToDelete) { | ||
| delete(deleteFile); | ||
| } | ||
| } | ||
|
|
||
| if (dataFilesToAdd != null) { | ||
| for (DataFile dataFile : dataFilesToAdd) { | ||
| add(dataFile); | ||
| } | ||
| } | ||
|
|
||
| for (DataFile toAdd : filesToAdd) { | ||
| add(toAdd); | ||
| if (deleteFilesToAdd != null) { | ||
| for (DeleteFile deleteFile : deleteFilesToAdd) { | ||
| add(deleteFile); | ||
| } | ||
| } | ||
|
|
||
| return this; | ||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about adding
RewriteFiles rewriteDeletes(Set<DeleteFile> fileToDelete, Set<DeleteFile> filesToAddfor rewrite deletes?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you want to expose this API for replacing equality deletions with positional deletions, for me that seems like an internal usage we may don't have to define such a specific API for end users. I prefer to expose the following common API to end users, for our internal usage we could rewrite files based on that one.