-
Notifications
You must be signed in to change notification settings - Fork 2.9k
Core: Add content and delete file counts to manifest tables #4764
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
| */ | ||
| public class AllManifestsTable extends BaseMetadataTable { | ||
| private static final Schema MANIFEST_FILE_SCHEMA = new Schema( | ||
| Types.NestedField.required(14, "content", Types.IntegerType.get()), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using fresh IDs but adding columns in places where they make sense instead of adding to the end.
| Types.NestedField.optional(5, "added_data_files_count", Types.IntegerType.get()), | ||
| Types.NestedField.optional(6, "existing_data_files_count", Types.IntegerType.get()), | ||
| Types.NestedField.optional(7, "deleted_data_files_count", Types.IntegerType.get()), | ||
| Types.NestedField.required(15, "added_delete_files_count", Types.IntegerType.get()), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we did this initially, we'd probably just have one set of counts with generic names but this is public and widely used.
|
Looks good to me, maybe we should add a test for case of non-0 delete counts in manifest table? |
|
Yeah, let me add one. I did not realize we did not have a test for that. |
df2c74e to
a81cc8e
Compare
|
@szehon-ho, I added a test for Spark 3.2. I made older Spark versions compile but did not add the new test logic there. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me, thanks for the test
| for (PositionDelete<InternalRow> delete : deletes) { | ||
| writer.write(delete); | ||
| } | ||
| } catch (IOException e) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For test method I usually think to declare "throws Exception" instead of writing catches to reduce lines of code, but it may be just personal preference.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Me too. In this case, the method definition will be a bit ugly as it won't fit on one line and I'd have to put throws on a separate line. It is also part of try with resources which is a bit of a special case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah makes sense.
This PR adds content and delete file counts fields to
manifestsandall_manifestsmetadata tables.Existing tests were adapted to cover the new functionality.