Skip to content

Comments

[Iceberg] Add Iceberg metadata table $ref#23503

Merged
agrawalreetika merged 1 commit intoprestodb:masterfrom
agrawalreetika:iceberg-ref-table
Aug 28, 2024
Merged

[Iceberg] Add Iceberg metadata table $ref#23503
agrawalreetika merged 1 commit intoprestodb:masterfrom
agrawalreetika:iceberg-ref-table

Conversation

@agrawalreetika
Copy link
Member

@agrawalreetika agrawalreetika commented Aug 23, 2024

Description

Add Iceberg metadata table $ref

Motivation and Context

Add Iceberg metadata table $ref
provides details around tags & branches on the Iceberg table - https://iceberg.apache.org/docs/nightly/branching

Impact

Iceberg Connector
Provides details around tags & branches on the Iceberg table - https://iceberg.apache.org/docs/nightly/branching

Test Plan

Contributor checklist

  • Please make sure your submission complies with our development, formatting, commit message, and attribution guidelines.
  • PR description addresses the issue accurately and concisely. If the change is non-trivial, a GitHub Issue is referenced.
  • Documented new properties (with its default value), SQL syntax, functions, or other functionality.
  • If release notes are required, they follow the release notes guidelines.
  • Adequate tests were added if applicable.
  • CI passed.

Release Notes

== RELEASE NOTES ==

Iceberg Connector Changes
* Add Iceberg metadata table $ref :pr:`23503`

Copy link
Contributor

@steveburnett steveburnett left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the doc! A single tiny nit about punctuation, everything else looks good.

Copy link
Contributor

@tdcmeehan tdcmeehan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice change! Just a couple of nits.

return new FixedPageSource(buildPages(tableMetadata, icebergTable));
}

private static boolean checkNonNull(Object object, PageListBuilder pagesBuilder)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems to only be used for bigint values, so I'd just call this something like appendLongValue and append null if it's null, otherwise append the long value.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actaully I just noticed value.minSnapshotsToKeep() is an Integer. I can add it as appendLongValue itself and use appendLongValue method. Or I can keep this method like earlier and append it with pagesBuilder.appendInteger LMK?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm just wondering if we can encapsulate the setting of values, either null or bigint, into one method. So perhaps just appendValue?


.. code-block:: text

name | type | snapshot_id | max_reference_age_in_ms | min_snapshots_to_keep | max_snapshot_age_in_ms
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be nice to include a non-main branch as well.

Copy link
Member

@hantangwangd hantangwangd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for adding this metadata table, the change looks good to me. Should we add a test case to show that there will always be a main branch by default?

@agrawalreetika
Copy link
Member Author

Thanks for adding this metadata table, the change looks good to me. Should we add a test case to show that there will always be a main branch by default?

Thanks for your review @hantangwangd , I have added this. please review.

steveburnett
steveburnett previously approved these changes Aug 26, 2024
Copy link
Contributor

@steveburnett steveburnett left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! (docs)

Pull updated branch, new local doc build, looks good. Thanks!

@Override
public Distribution getDistribution()
{
return Distribution.SINGLE_COORDINATOR;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: static import

icebergTable.refs().forEach((key, value) -> {
pagesBuilder.beginRow();
pagesBuilder.appendVarchar(key);
pagesBuilder.appendVarchar(value.isTag() ? "TAG" : "BRANCH");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you think about doing String.valueOf(value.type())? This way it handles updates to the SnapshotRefType enum.


// Check main branch entry
assertQuery("SELECT count(*) FROM test_schema.\"test_table$refs\"", "VALUES 1");
assertQuery("SELECT name FROM test_schema.\"test_table$refs\"", "VALUES 'main'");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would also be prudent to add assertions on values in the row of the table matching the SnapshotRef object.

You could add or refactor the loadTable method from IcebergDistributedTestBase to get a reference to the Iceberg table, then re-create the values and run assertQuery

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added tests inIcebergDistributedTestBase which I could reuse in my subsequent PR around adding support for querying tags & branch. Please review

ZacBlanco
ZacBlanco previously approved these changes Aug 27, 2024
Copy link
Member

@hantangwangd hantangwangd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the added test case, overall looks good to me, one little nit.

Comment on lines +1680 to +1684
assertEquals(icebergTable.refs().size(), 2);
icebergTable.manageSnapshots().createTag("testTag", icebergTable.currentSnapshot().snapshotId()).commit();

assertEquals(icebergTable.refs().size(), 3);
assertQuery("SELECT count(*) FROM \"test_table_references$refs\"", "VALUES 3");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we can add some more detailed assertions like follows:

assertQuery("SELECT * from \"test_table_references$refs\" where name = 'main' and type = 'BRANCH'",
                format("VALUES('%s', '%s', %s, %s, %s, %s)",
                        "main",
                        "BRANCH",
                        icebergTable.refs().get("main").snapshotId(),
                        icebergTable.refs().get("main").maxRefAgeMs(),
                        icebergTable.refs().get("main").minSnapshotsToKeep(),
                        icebergTable.refs().get("main").maxSnapshotAgeMs()));

        assertQuery("SELECT * from \"test_table_references$refs\" where type = 'TAG'",
                format("VALUES('%s', '%s', %s, %s, %s, %s)",
                        "testTag",
                        "TAG",
                        icebergTable.refs().get("testTag").snapshotId(),
                        icebergTable.refs().get("testTag").maxRefAgeMs(),
                        icebergTable.refs().get("testTag").minSnapshotsToKeep(),
                        icebergTable.refs().get("testTag").maxSnapshotAgeMs()));

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, I have added detailed assertions for testTag & testBranch instead of the main branch. Please LMK if that's fine

@agrawalreetika agrawalreetika merged commit 225b206 into prestodb:master Aug 28, 2024
@agrawalreetika agrawalreetika deleted the iceberg-ref-table branch September 2, 2024 05:20
@jaystarshot jaystarshot mentioned this pull request Nov 1, 2024
25 tasks
@tdcmeehan tdcmeehan added the from:IBM PR from IBM label Dec 13, 2024
@prestodb-ci prestodb-ci requested review from a team, auden-woolfson and imsayari404 and removed request for a team December 13, 2024 15:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

from:IBM PR from IBM

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants