Skip to content

Conversation

@coolderli
Copy link
Contributor

We need to add the content field to the schema of AllManifestsTable, so that we can get the type of manifest for expiration processing.

@coolderli coolderli changed the title Core: Add content to ManifestTable Schema. Core: Add content field to ManifestTable Schema. Apr 25, 2021
@coolderli
Copy link
Contributor Author

@rdblue @openinx could you help me review it? thanks

Types.NestedField.required(11, "contains_nan", Types.BooleanType.get()),
Types.NestedField.optional(12, "lower_bound", Types.StringType.get()),
Types.NestedField.optional(13, "upper_bound", Types.StringType.get())
Types.NestedField.optional(4, "content", Types.IntegerType.get()),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The correct way to add a new field in this manifest table is : add the content field with an auto-increment field id (which should be 14 in this case) at the tail of those nested field list. We cannot change the existing field ids in this schema because them are mapping to the underlying rows in generated avro files.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, I will fix it. Thanks.

Copy link
Contributor

@jackye1995 jackye1995 Apr 27, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Prefer to add it as the last entry so that ID of other fields are not modified for backwards compatibility.

Copy link
Member

@openinx openinx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@coolderli I'm still not quite understand why you want to add the manifest content id in those two table schema. Would you mind to provide more background about this changes ? What's the specific behavior that we could not accomplish in the current table schema ?

@coolderli
Copy link
Contributor Author

@openinx I want to support the expiring snapshot of the V2 table
After we rewrite the data file and the delete file in #2303, we can delete the corresponding data files and delete files through manifest when expiring the snapshot.

@openinx
Copy link
Member

openinx commented Apr 26, 2021

So why need to add the extra nested field in table schema ? I still not get the point.

@coolderli
Copy link
Contributor Author

Sorry, I didn't make it clear. This patch is actually a part of #2518.

Types.NestedField.required(11, "contains_nan", Types.BooleanType.get()),
Types.NestedField.optional(12, "lower_bound", Types.StringType.get()),
Types.NestedField.optional(13, "upper_bound", Types.StringType.get())
Types.NestedField.optional(4, "content", Types.IntegerType.get()),
Copy link
Contributor

@jackye1995 jackye1995 Apr 27, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Prefer to add it as the last entry so that ID of other fields are not modified for backwards compatibility.

Copy link
Contributor

@jackye1995 jackye1995 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add some tests if possible, thank you!

);
))),
Types.NestedField.optional(14, "content", Types.StringType.get())
);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: should not change space for unmodified lines.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

);
))),
Types.NestedField.required(14, "content", Types.StringType.get())
);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: should not change space for unmodified lines.

);
partitionSummariesToRows(spec, manifest.partitions()),
manifest.content().name()
);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: should not change space for unmodified lines.

@coolderli coolderli marked this pull request as draft April 27, 2021 07:03
@coolderli coolderli marked this pull request as ready for review April 27, 2021 09:03
@jackye1995
Copy link
Contributor

Sorry for the back and forth, now I think about it, the files table has this in schema:

optional(134, "content", IntegerType.get(), "Contents of the file: 0=data, 1=position deletes, 2=equality deletes");

So it would make more sense to use int also for manifest, but we should add a doc similar to the files table like:

optional(14, "content", IntegerType.get(), "Contents of the manifest: 0=data, 1=deletes");

so that people know what the values mean.

Apart from that, I don't have any further comments, thanks for the work.

@coolderli coolderli marked this pull request as draft April 28, 2021 02:12
@coolderli coolderli marked this pull request as ready for review April 28, 2021 03:06
@coolderli
Copy link
Contributor Author

@jackye1995 Thank you for reviewing the code.

}

@Test
public void testReadDataManifestTable() throws IOException {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think that these tests should be in this class. This suite tests the manifest reader, not the manifest metadata tables. Why not put these tests in TestIcebergSourceTablesBase?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this test has covered the use cases.

@github-actions
Copy link

This pull request has been marked as stale due to 30 days of inactivity. It will be closed in 1 week if no further activity occurs. If you think that’s incorrect or this pull request requires a review, please simply write any comment. If closed, you can revive the PR at any time and @mention a reviewer or discuss it on the [email protected] list. Thank you for your contributions.

@github-actions github-actions bot added the stale label Jul 17, 2024
@github-actions
Copy link

This pull request has been closed due to lack of activity. This is not a judgement on the merit of the PR in any way. It is just a way of keeping the PR queue manageable. If you think that is incorrect, or the pull request requires review, you can revive the PR at any time.

@github-actions github-actions bot closed this Jul 24, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants