Skip to content

Conversation

@aokolnychyi
Copy link
Contributor

@aokolnychyi aokolnychyi commented Jun 21, 2022

This PR uses the work in #4873 as an opportunity to support accessing delete files in Snapshot. That PR added new methods for accessing data files with an explicit FileIO, which haven't been released yet. I propose to rename the newly added methods to addedDataFiles and deletedDataFiles and also add similar methods for delete files. I feel it is important to do before 1.0.0.

* @return all data files added to the table in this snapshot.
*/
Iterable<DataFile> addedFiles(FileIO io);
Iterable<DataFile> addedDataFiles(FileIO io);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Renamed methods haven't been released yet so we won't break anyone.

manifest -> Objects.equal(manifest.snapshotId(), snapshotId));

for (ManifestFile manifest : changedManifests) {
try (ManifestReader<DeleteFile> reader = ManifestFiles.readDeleteManifest(manifest, fileIO, null)) {
Copy link
Contributor Author

@aokolnychyi aokolnychyi Jun 21, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Warning: the spec map is null.

If I remember correctly, ManifestReader will use the Avro header metadata to parse the schema and spec. While it is generally not reliable as the schema may be old, I think it should be fine as long as we don't do any binding or filtering (like in this case).

@aokolnychyi aokolnychyi force-pushed the snapshot-delete-files branch 2 times, most recently from e6e3669 to 3c4fc53 Compare June 21, 2022 17:45
@aokolnychyi
Copy link
Contributor Author

* @param io a {@link FileIO} instance used for reading files from storage
* @return all delete files deleted from the table in this snapshot
*/
default Iterable<DeleteFile> deletedDeleteFiles(FileIO io) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we use removed instead of deleted so we don't have deleted-delete-...?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This question came up a few times. I went with deleted for consistency cause we use that for data files in this class and other places like deletedPositionDeleteFilesCount. We do use removed in snapshot summary but that seems to be the only place.

If we decide to use removed, I think we should update the naming for data file methods too. They haven't been released yet. I'd say let's either keep things as they are or switch to this:

removedDataFiles(FileIO io);
removedDeleteFiles(FileIO io);

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rdblue, what do you think?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I prefer removed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good, I'll update both.

Copy link
Contributor

@rdblue rdblue left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1. Merge when you're ready, but I had a question on a method name.

@rdblue rdblue added this to the Iceberg 0.14.0 Release milestone Jun 29, 2022
@aokolnychyi aokolnychyi force-pushed the snapshot-delete-files branch from 6e11433 to c262b5a Compare June 30, 2022 17:05
@aokolnychyi aokolnychyi merged commit 342ec3c into apache:master Jun 30, 2022
@aokolnychyi
Copy link
Contributor Author

Thanks, @rdblue!

namrathamyske pushed a commit to namrathamyske/iceberg that referenced this pull request Jul 10, 2022
namrathamyske pushed a commit to namrathamyske/iceberg that referenced this pull request Jul 10, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants