Skip to content

Add support to redirect table operations from Iceberg to Hive#11356

Merged
findepi merged 1 commit intotrinodb:masterfrom
findinpath:iceberg-redirect-table-to-hive
Apr 6, 2022
Merged

Add support to redirect table operations from Iceberg to Hive#11356
findepi merged 1 commit intotrinodb:masterfrom
findinpath:iceberg-redirect-table-to-hive

Conversation

@findinpath
Copy link
Copy Markdown
Contributor

Description

New property introduced for the Iceberg connector: iceberg.hive-catalog-name

Is this change a fix, improvement, new feature, refactoring, or other?

New feature

Is this a change to the core query engine, a connector, client library, or the SPI interfaces? (be specific)

The changes in this PR affect mainly the Iceberg connector.

How would you describe this change to a non-technical end user or system administrator?

In an environment which makes use of a shared metastore it may come in handy to have table redirects to automatically allow Trino to translate a table name like iceberg.default.hive_table_name towards the name hive.default.hive_table_name.

Note that the translation can happen quite transparently when the user connects to a predefined catalog and schema (e.g. : iceberg.default) and the table operation looks like: SELECT * FROM hive_table_name.

Related issues, pull requests, and links

Documentation

(x) No documentation is needed.
( ) Sufficient documentation is included in this PR.
( ) Documentation PR is available with #prnumber.
( ) Documentation issue #issuenumber is filed, and can be handled later.

Release notes

( ) No release notes entries required.
(x) Release notes entries required with the following suggested text:

# Iceberg
* Add support to redirect table operations from Iceberg to Hive

@findinpath findinpath marked this pull request as draft March 8, 2022 11:28
@findepi findepi requested review from alexjo2144 and homar March 9, 2022 10:10
@findinpath findinpath force-pushed the iceberg-redirect-table-to-hive branch 2 times, most recently from 9501f3e to ebd2fbb Compare March 9, 2022 16:10
@findinpath findinpath marked this pull request as ready for review March 9, 2022 16:13
@findinpath findinpath requested a review from jklamer March 9, 2022 16:13
@findinpath findinpath requested review from findepi and phd3 March 11, 2022 10:26
@findinpath findinpath force-pushed the iceberg-redirect-table-to-hive branch from 0d33751 to acc1604 Compare March 11, 2022 10:27
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't seem like an Iceberg-catalog level method. Can we narrow this down to just the things that are dependent on the catalog implementation and move the rest up to IcebergMetadata?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't seem like an Iceberg-catalog level method.

Initially I created a TableRedirectionHandler abstraction which was created at the same time as the TrinoCatalog instance, but I dropped afterwards the idea because the redirection is tightly linked to the metastore implementation (hive/glue).

The same boilerplate code used for creating the TrinoCatalog would need to be done for redirection handling as well.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On a second thought, it probably makes sense to take out this operation from TrinoCatalog in hindsight of JDBC / REST catalogs which will not have anymore Hive related content in them.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've modified the code again so that the redirectTable method lives in the TrinoHiveCatalog and TrinoGlueCatalog, but it is not exposed in the TrinoCatalog interface in order to keep the interface free from the concept of table redirection, but still be able to provide this functionality for hive & glue.

@findinpath findinpath force-pushed the iceberg-redirect-table-to-hive branch from acc1604 to 6b75248 Compare March 14, 2022 12:08
@findinpath findinpath requested a review from alexjo2144 March 14, 2022 12:08
@findinpath findinpath force-pushed the iceberg-redirect-table-to-hive branch from 6b75248 to d863ff6 Compare March 14, 2022 14:31
@findinpath findinpath force-pushed the iceberg-redirect-table-to-hive branch 6 times, most recently from 152eafb to 909f1cc Compare March 17, 2022 10:25
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note the change in functionality here for the Iceberg connector. Please provide feedback whether the provided functionality is incorrect from your perspective.

@findinpath findinpath force-pushed the iceberg-redirect-table-to-hive branch from 1058105 to 50e3430 Compare March 31, 2022 08:51
@findinpath
Copy link
Copy Markdown
Contributor Author

Rebased on master to adapt the redirect test io.trino.tests.product.iceberg.TestIcebergRedirectionToHive#testAlterTableRename

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When the table redirection towards Hive connector is not enabled,
in case of trying to query on the Iceberg connector
a metadata table of a Hive connector table,
the user will receive a table not found exception.

From io.trino.tests.product.iceberg.TestIcebergHiveTablesCompatibility#testIcebergSelectFromHiveTable

        assertQueryFailure(() -> onTrino().executeQuery("SELECT * FROM iceberg.default.\"" + tableName + "$data\""))
                .hasMessageMatching("Query failed \\(#\\w+\\):\\Q Not an Iceberg table: default." + tableName);

        assertQueryFailure(() -> onTrino().executeQuery("SELECT * FROM iceberg.default.\"" + tableName + "$files\""))
                .hasMessageMatching("Query failed \\(#\\w+\\):\\Q line 1:15: Table 'iceberg.default." + tableName + "$files' does not exist");

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

keeping verify in such case seems "OK" too, wdyt?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Keeping verify would issue for the statement:

onTrino().executeQuery("SELECT * FROM iceberg.default.\"" + hiveTableName + "$files\"")

error messages like the following:

Wrong table type: test_iceberg_select_from_hive_63u5u11q3c70$files

I tend to say that the error message

Table 'iceberg.default." + hiveTableName + "$files' does not exist"

fits better (not ideal, but better).

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This requires code comment

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to make it clear from the code perspective:

io.trino.plugin.iceberg.IcebergMetadata#getTableHandle is reached by a select from hive_table_name$files because there is no system table in Iceberg found for the hive_table_name (IcebergMetadata#getSystemTable returns Optional.empty() in such cases).

In StatementAnalyzer#visitTable the logic of the method assumes that if the identifier doesn't correspond to a MV or to a view, it is certainly a table. For this reason IcebergMetadata#getTableHandle is called with the rather unexpected argument hive_table_name$files.

Probably a refactoring of StatementAnalyzer#visitTable method which verifies in the beginning whether we're dealing with an redirected table and acts accordingly would be the right way to go, but such a change (if it makes sense) would rather fit in a different PR.

Comment on lines 480 to 613
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should redirect non-Iceberg tables, rather than redirect Iceberg tables

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I missed this bug because I didn't know I can run the Glue tests locally.
@findepi Once the PR is in a good shape, please run it against the CI with AWS GLue secrets.

@findinpath findinpath force-pushed the iceberg-redirect-table-to-hive branch from 50e3430 to 13196c8 Compare April 1, 2022 10:17
@findinpath
Copy link
Copy Markdown
Contributor Author

Rebased on master to make use of b395286

@findinpath findinpath force-pushed the iceberg-redirect-table-to-hive branch 2 times, most recently from 863c172 to bc81a1a Compare April 1, 2022 10:47
@findinpath findinpath requested a review from findepi April 1, 2022 10:50
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit, I'd use get instead of map here, makes it clear that an empty catalog name shouldn't ever show up at this point.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We do have a few lines above:

        Optional<String> targetCatalogName = getHiveCatalogName(session);
        if (targetCatalogName.isEmpty()) {
            return Optional.empty();
        }

Also note that the method returns an Optional, so I'd have to do .get() and then wrap the result back to Optional.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now that Glue views are merged we should probably match what the Hive catalog has for this line: (table.isEmpty() || VIRTUAL_VIEW.name().equals(table.get().getTableType()))

@findinpath findinpath force-pushed the iceberg-redirect-table-to-hive branch from bc81a1a to 58426ff Compare April 4, 2022 10:18
@findinpath
Copy link
Copy Markdown
Contributor Author

Rebased on top of master to make use of the changes from #11617

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

// Pretend the table does not exist to produce better message in case of redirects to Hive

@findinpath findinpath force-pushed the iceberg-redirect-table-to-hive branch 2 times, most recently from fd3366f to 101291d Compare April 5, 2022 12:36
@findinpath
Copy link
Copy Markdown
Contributor Author

Rebased on top of master to deal with the refactoring of the glue tests.

The Iceberg connector can make use of the `iceberg.hive-catalog-name`
configuration property for enable table redirects towards the Hive tables.

When the table redirection towards Hive connector is not enabled,
in case of trying to query on the Iceberg connector
a metadata table of a Hive connector table,
the user will receive a table not found exception.
@findinpath findinpath force-pushed the iceberg-redirect-table-to-hive branch from 101291d to b2a73ff Compare April 5, 2022 13:34
@findepi findepi merged commit 7562be6 into trinodb:master Apr 6, 2022
@findepi findepi mentioned this pull request Apr 6, 2022
@github-actions github-actions bot added this to the 376 milestone Apr 6, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Development

Successfully merging this pull request may close these issues.

4 participants