Skip to content

Conversation

@aokolnychyi
Copy link
Contributor

This PR adds content and delete file counts fields to manifests and all_manifests metadata tables.
Existing tests were adapted to cover the new functionality.

*/
public class AllManifestsTable extends BaseMetadataTable {
private static final Schema MANIFEST_FILE_SCHEMA = new Schema(
Types.NestedField.required(14, "content", Types.IntegerType.get()),
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using fresh IDs but adding columns in places where they make sense instead of adding to the end.

Types.NestedField.optional(5, "added_data_files_count", Types.IntegerType.get()),
Types.NestedField.optional(6, "existing_data_files_count", Types.IntegerType.get()),
Types.NestedField.optional(7, "deleted_data_files_count", Types.IntegerType.get()),
Types.NestedField.required(15, "added_delete_files_count", Types.IntegerType.get()),
Copy link
Contributor Author

@aokolnychyi aokolnychyi May 13, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we did this initially, we'd probably just have one set of counts with generic names but this is public and widely used.

@aokolnychyi
Copy link
Contributor Author

@szehon-ho
Copy link
Member

szehon-ho commented May 13, 2022

Looks good to me, maybe we should add a test for case of non-0 delete counts in manifest table?

@aokolnychyi
Copy link
Contributor Author

aokolnychyi commented May 13, 2022

Yeah, let me add one. I did not realize we did not have a test for that.

@aokolnychyi aokolnychyi force-pushed the add-manifest-content branch from df2c74e to a81cc8e Compare May 14, 2022 00:19
@aokolnychyi
Copy link
Contributor Author

@szehon-ho, I added a test for Spark 3.2. I made older Spark versions compile but did not add the new test logic there.

Copy link
Member

@szehon-ho szehon-ho left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me, thanks for the test

for (PositionDelete<InternalRow> delete : deletes) {
writer.write(delete);
}
} catch (IOException e) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For test method I usually think to declare "throws Exception" instead of writing catches to reduce lines of code, but it may be just personal preference.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Me too. In this case, the method definition will be a bit ugly as it won't fit on one line and I'd have to put throws on a separate line. It is also part of try with resources which is a bit of a special case.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah makes sense.

@aokolnychyi aokolnychyi merged commit 897e4d5 into apache:master May 16, 2022
szehon-ho added a commit to szehon-ho/iceberg that referenced this pull request May 18, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants