Skip to content

Add register_table procedure support for iceberg using existing metadata#14325

Closed
krvikash wants to merge 0 commit intotrinodb:masterfrom
krvikash:iceberg-support-register-table-procedure
Closed

Add register_table procedure support for iceberg using existing metadata#14325
krvikash wants to merge 0 commit intotrinodb:masterfrom
krvikash:iceberg-support-register-table-procedure

Conversation

@krvikash
Copy link
Copy Markdown
Contributor

Description

Adding register_table procedure for iceberg connector to register table using existing metadata.

  • Iceberg table will be created using provided table location
  • Look for the latest metadata file ending with .metadata.json inside the metadata folder and use it for creating the table
  • If provided location is invalid/does not exist then throw an exception
  • If no metadata file exists inside provided table location then throw an exception
  • If more than one metadata file exists with the latest sequence number then throw an exception
  • schema_name, table_name, and table_location should be not-null and valid, Otherwise, the exception will be thrown
  • User can optionally provide the valid metadata_file_location (See the usage below)

Valid usages:

  • register_table(schema_name => ..., table_name => ..., table_location => ...)
  • register_table(schema_name => ..., table_name => ..., table_location => ..., metadata_file_location => ...)

Sample Queries:
CALL iceberg.system.register_table('default', 'src_22', 'hdfs://hadoop-master:9000/user/hive/warehouse/orders_5-581fad8517934af6be1857a903559d44');

CALL iceberg.system.register_table('default', 'src_22', 'hdfs://hadoop-master:9000/user/hive/warehouse/orders_5-581fad8517934af6be1857a903559d44', null);

CALL iceberg.system.register_table('default', 'src_22', 'hdfs://hadoop-master:9000/user/hive/warehouse/orders_5-581fad8517934af6be1857a903559d44', '00003-409702ba-4735-4645-8f14-09537cc0b2c8.metadata.json');

TODO:
Add test cases

Non-technical explanation

NA

Release notes

( ) This is not user-visible or docs only and no release notes are required.
(X) Release notes are required, please propose a release note for me.
( ) Release notes are required, with the following suggested text:

# Section
* Fix some things. ({issue}`issuenumber`)

@findepi
Copy link
Copy Markdown
Member

findepi commented Sep 28, 2022

Added empty commit to retrigger the build after #14323 merged.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Move the constants to IcebergUtil and make them private.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need this check?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Point out to the user which are the duplicated metadata files in such a situation.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably ICEBERG_INVALID_METADATA is the right error code in such a situation.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove the javadoc about the params and return. Given the description of the method, these aspects are obvious.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use Optional<String> for metadataFilename.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add a comment stating that this is done in the same fashion as in Iceberg.
I found the method in BaseMetastoreTableOperations#parseVersion.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to verify whether the schema in which we want to register the table exists?

Do we need to verify if a table with the same name already exists in the specified schema?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch. Yes both the points need to be checked. I will make the changes.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The error code is not appropriate. The file system is fine, is just that the file at the location provided is missing.

@findinpath
Copy link
Copy Markdown
Contributor

Let's add some tests.

Note that io.trino.plugin.iceberg.catalog.TrinoCatalog#dropTable implementation do delete the table data (and metadata) as well.

The metadata files contain full path names for the manifest-list property.
One way to test this functionality is to create a table via SQL and programatically drop its content from the metastore so that we avoid to delete the content of the table . In this fashion, the table data remains intact and we can proceed to register it again and eventually make queries on the table.

You can find inspiration on how to get access within the integration tests to the metastore / aws client directly by looking at these tests:

@krvikash krvikash force-pushed the iceberg-support-register-table-procedure branch from f8e951d to 8f65f29 Compare September 29, 2022 11:37
@krvikash krvikash closed this Sep 29, 2022
@krvikash krvikash force-pushed the iceberg-support-register-table-procedure branch from 8f65f29 to bc4e79a Compare September 29, 2022 11:42
@krvikash
Copy link
Copy Markdown
Contributor Author

krvikash commented Sep 29, 2022

This PR got closed because of my mistake during running GitHub commands. Reopen a new PR #14375

@findinpath, Addressed your review comments in #14375

@alexjo2144
Copy link
Copy Markdown
Member

This PR got closed because of my mistake during running GitHub commands

For next time, you can reopen the same PR. There's a button at the bottom of the page

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Development

Successfully merging this pull request may close these issues.

4 participants