-
Notifications
You must be signed in to change notification settings - Fork 3k
API: Access deleted and added delete files in Snapshot #5105
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
| * @return all data files added to the table in this snapshot. | ||
| */ | ||
| Iterable<DataFile> addedFiles(FileIO io); | ||
| Iterable<DataFile> addedDataFiles(FileIO io); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Renamed methods haven't been released yet so we won't break anyone.
| manifest -> Objects.equal(manifest.snapshotId(), snapshotId)); | ||
|
|
||
| for (ManifestFile manifest : changedManifests) { | ||
| try (ManifestReader<DeleteFile> reader = ManifestFiles.readDeleteManifest(manifest, fileIO, null)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Warning: the spec map is null.
If I remember correctly, ManifestReader will use the Avro header metadata to parse the schema and spec. While it is generally not reliable as the schema may be old, I think it should be fine as long as we don't do any binding or filtering (like in this case).
e6e3669 to
3c4fc53
Compare
| * @param io a {@link FileIO} instance used for reading files from storage | ||
| * @return all delete files deleted from the table in this snapshot | ||
| */ | ||
| default Iterable<DeleteFile> deletedDeleteFiles(FileIO io) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we use removed instead of deleted so we don't have deleted-delete-...?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This question came up a few times. I went with deleted for consistency cause we use that for data files in this class and other places like deletedPositionDeleteFilesCount. We do use removed in snapshot summary but that seems to be the only place.
If we decide to use removed, I think we should update the naming for data file methods too. They haven't been released yet. I'd say let's either keep things as they are or switch to this:
removedDataFiles(FileIO io);
removedDeleteFiles(FileIO io);
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@rdblue, what do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I prefer removed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good, I'll update both.
rdblue
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1. Merge when you're ready, but I had a question on a method name.
6e11433 to
c262b5a
Compare
|
Thanks, @rdblue! |
This PR uses the work in #4873 as an opportunity to support accessing delete files in
Snapshot. That PR added new methods for accessing data files with an explicitFileIO, which haven't been released yet. I propose to rename the newly added methods toaddedDataFilesanddeletedDataFilesand also add similar methods for delete files. I feel it is important to do before 1.0.0.