Skip to content

Add support to redirect from a Hive catalog to an Iceberg catalog#4704

Closed
lxynov wants to merge 2 commits intotrinodb:masterfrom
lxynov:catalog-redirection
Closed

Add support to redirect from a Hive catalog to an Iceberg catalog#4704
lxynov wants to merge 2 commits intotrinodb:masterfrom
lxynov:catalog-redirection

Conversation

@lxynov
Copy link
Copy Markdown
Member

@lxynov lxynov commented Aug 5, 2020

Add catalog redirection support to SPI
Add a catalog redirection method to connector SPI. It allows a connector
to redirect to another catalog which may or may not use the same
connector.

Catalog redirection is tried whenever a table or view is accessed, e.g.,
in SELECT, INSERT, DROP TABLE, ALTER TABLE, GRANT, etc., queries. However,
it's not tried with procedures.

Multi-stop redirection without loop is allowed within a max depth.

Add support to redirect from Hive to an Iceberg catalog
When enabled, Hive Connector would redirect a table access to an Iceberg
catalog when the table is an Iceberg table.


Issue: #4442
Umbrella issue: #1324

Some notes:

  • Access control is enforced on the eventual table.
  • Information schema queries (show tables, select * from information_schema.columns) stay the same. They won't include results from other catalogs. Having said that, in the Hive/Iceberg case, a Hive catalog does include results of Iceberg tables inherently.
  • I thought about adding warning messages for users. Can work on that as a follow-up if it makes sense.

@cla-bot cla-bot bot added the cla-signed label Aug 5, 2020
@lxynov lxynov force-pushed the catalog-redirection branch 2 times, most recently from 2b1d876 to 2832980 Compare August 6, 2020 21:08
@lxynov lxynov requested a review from electrum August 7, 2020 00:32
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could be written as

return getOptionalCatalogMetadata(session, tableName.getCatalogName())
        .map(catalog -> {
            CatalogName catalogName = catalog.getConnectorId(session, tableName);
            ConnectorMetadata metadata = catalog.getMetadataFor(catalogName);
            ConnectorSession connectorSession = session.toConnectorSession(catalogName);
            return metadata.redirectCatalog(connectorSession, tableName.asSchemaTableName());
        });

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

metadata.redirectCatalog(connectorSession, tableName.asSchemaTableName()) returns Optional<String> instead of String, making it difficult to follow the pattern here.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Try flatMap?

@findepi
Copy link
Copy Markdown
Member

findepi commented Aug 21, 2020

This shouldn't be Iceberg specific. One could want to redirect to some other catalog for some other reasons.
eg HBase table to Phoenix, Hive JDBC table to respective JDBC connector, etc.

Add a catalog redirection method to connector SPI. It allows a connector
to redirect to another catalog which may or may not use the same
connector.

Catalog redirection is tried whenever a table or view is accessed, e.g.,
in SELECT, INSERT, DROP TABLE, ALTER TABLE, GRANT, etc., queries. However,
it's not tried with procedures.

Multi-stop redirection without loop is allowed within a max depth.
@lxynov lxynov force-pushed the catalog-redirection branch from 2832980 to 90e02da Compare August 26, 2020 00:40
@lxynov
Copy link
Copy Markdown
Member Author

lxynov commented Aug 26, 2020

@electrum Thanks for the review. I've updated the PR

@lxynov
Copy link
Copy Markdown
Member Author

lxynov commented Aug 26, 2020

This shouldn't be Iceberg specific. One could want to redirect to some other catalog for some other reasons.
eg HBase table to Phoenix, Hive JDBC table to respective JDBC connector, etc.

@findepi People can implement the ConnectorMetadata:: redirectCatalog method to achieve these

@findepi
Copy link
Copy Markdown
Member

findepi commented Aug 26, 2020

@lxynov great! I actually mean redirecting from hive to other connectors.
This needs to be extensible. Therefore, we should not have iceberg specific
configuration properties like hive.redirecting-to-iceberg-enabled, as they
clearly prevent extensibility.

@lxynov
Copy link
Copy Markdown
Member Author

lxynov commented Aug 26, 2020

@findepi oh I see your point. That makes sense. We can name it hive.catalog-redirection-enabled. @electrum what do you think?

@electrum
Copy link
Copy Markdown
Member

I'm not sure it makes sense to be extensible at the user level. The redirection here is specific to Iceberg based on the Hive connector's knowledge that the table is an Iceberg table. If we had an HBase connector in addition to a Phoenix connector, it seems that the same level of specific knowledge would apply.

When enabled, Hive Connector would redirect a table access to an Iceberg
catalog when the table is an Iceberg table.
@lxynov lxynov force-pushed the catalog-redirection branch from 90e02da to aea5fdb Compare August 26, 2020 21:45
@lxynov
Copy link
Copy Markdown
Member Author

lxynov commented Aug 27, 2020

@electrum Thanks for the review! PR updated. The failing test seems unrelated.

@sopel39 sopel39 requested a review from martint September 2, 2020 19:57
Copy link
Copy Markdown
Member

@dain dain left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there going to be a redirect Iceberg to Hive feature also?

requireNonNull(tableName, "tableName is null");

if (tableName.getCatalogName().isEmpty() || tableName.getSchemaName().isEmpty() || tableName.getObjectName().isEmpty()) {
return Optional.empty();
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would expect this to be an error. Do we have cases where we do not have a fully qualified name?

}

@Override
public Optional<String> redirectCatalog(Session session, QualifiedObjectName tableName)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this name might be too generic. I believe this only applies to the table namespace, and not procedures or functions. I suggest we make this clear, so we can add those is the future if needed.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Try flatMap?

return new QualifiedObjectName(catalogName, schemaName, objectName);
}

public static QualifiedObjectName redirectToNewCatalogIfNecessary(Session session, QualifiedObjectName tableName, Metadata metadata)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not add this directly to Metadata and make redirectCatalog a private method on Metadata? I ask because I don't see any direct usages of redirectCatalog.

@findepi findepi mentioned this pull request Oct 12, 2020
@electrum
Copy link
Copy Markdown
Member

Replaced by #5160

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Development

Successfully merging this pull request may close these issues.

4 participants