-
Notifications
You must be signed in to change notification settings - Fork 2.9k
Core: Add content field to ManifestTable Schema. #2510
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
| Types.NestedField.required(11, "contains_nan", Types.BooleanType.get()), | ||
| Types.NestedField.optional(12, "lower_bound", Types.StringType.get()), | ||
| Types.NestedField.optional(13, "upper_bound", Types.StringType.get()) | ||
| Types.NestedField.optional(4, "content", Types.IntegerType.get()), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The correct way to add a new field in this manifest table is : add the content field with an auto-increment field id (which should be 14 in this case) at the tail of those nested field list. We cannot change the existing field ids in this schema because them are mapping to the underlying rows in generated avro files.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, I will fix it. Thanks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Prefer to add it as the last entry so that ID of other fields are not modified for backwards compatibility.
openinx
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@coolderli I'm still not quite understand why you want to add the manifest content id in those two table schema. Would you mind to provide more background about this changes ? What's the specific behavior that we could not accomplish in the current table schema ?
|
So why need to add the extra nested field in table schema ? I still not get the point. |
|
Sorry, I didn't make it clear. This patch is actually a part of #2518. |
| Types.NestedField.required(11, "contains_nan", Types.BooleanType.get()), | ||
| Types.NestedField.optional(12, "lower_bound", Types.StringType.get()), | ||
| Types.NestedField.optional(13, "upper_bound", Types.StringType.get()) | ||
| Types.NestedField.optional(4, "content", Types.IntegerType.get()), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Prefer to add it as the last entry so that ID of other fields are not modified for backwards compatibility.
jackye1995
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add some tests if possible, thank you!
| ); | ||
| ))), | ||
| Types.NestedField.optional(14, "content", Types.StringType.get()) | ||
| ); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: should not change space for unmodified lines.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
| ); | ||
| ))), | ||
| Types.NestedField.required(14, "content", Types.StringType.get()) | ||
| ); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: should not change space for unmodified lines.
| ); | ||
| partitionSummariesToRows(spec, manifest.partitions()), | ||
| manifest.content().name() | ||
| ); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: should not change space for unmodified lines.
|
Sorry for the back and forth, now I think about it, the files table has this in schema: optional(134, "content", IntegerType.get(), "Contents of the file: 0=data, 1=position deletes, 2=equality deletes");So it would make more sense to use int also for manifest, but we should add a doc similar to the files table like: optional(14, "content", IntegerType.get(), "Contents of the manifest: 0=data, 1=deletes");so that people know what the values mean. Apart from that, I don't have any further comments, thanks for the work. |
|
@jackye1995 Thank you for reviewing the code. |
| } | ||
|
|
||
| @Test | ||
| public void testReadDataManifestTable() throws IOException { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think that these tests should be in this class. This suite tests the manifest reader, not the manifest metadata tables. Why not put these tests in TestIcebergSourceTablesBase?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this test has covered the use cases.
|
This pull request has been marked as stale due to 30 days of inactivity. It will be closed in 1 week if no further activity occurs. If you think that’s incorrect or this pull request requires a review, please simply write any comment. If closed, you can revive the PR at any time and @mention a reviewer or discuss it on the [email protected] list. Thank you for your contributions. |
|
This pull request has been closed due to lack of activity. This is not a judgement on the merit of the PR in any way. It is just a way of keeping the PR queue manageable. If you think that is incorrect, or the pull request requires review, you can revive the PR at any time. |
We need to add the content field to the schema of AllManifestsTable, so that we can get the type of manifest for expiration processing.