Skip to content

Add support to redirect table reads from Hive to Iceberg#10173

Merged
findepi merged 1 commit intotrinodb:masterfrom
findepi:findepi/redirect-hive-iceberg/there-and-back-again
Dec 6, 2021
Merged

Add support to redirect table reads from Hive to Iceberg#10173
findepi merged 1 commit intotrinodb:masterfrom
findepi:findepi/redirect-hive-iceberg/there-and-back-again

Conversation

@findepi
Copy link
Copy Markdown
Member

@findepi findepi commented Dec 3, 2021

Hive Connector redirects Iceberg table access to the configured
Iceberg catalog.

This change adds implementation of HiveTableRedirectionsProvider,
plugging in Hive->Iceberg redirects and leveraging existing framework.

This is based on work from several authors. In particular, this is based
on #8340 with conflicts resolved, and retrofit to current APIs in Hive connector.

As tests show, not all the statements support redirects properly and
this needs to be followed up upon.

@findepi findepi added enhancement New feature or request tests:hive labels Dec 3, 2021
@cla-bot cla-bot bot added the cla-signed label Dec 3, 2021
@findepi findepi force-pushed the findepi/redirect-hive-iceberg/there-and-back-again branch from f4a3f99 to 708cdf5 Compare December 3, 2021 16:52
@findepi findepi force-pushed the findepi/redirect-hive-iceberg/there-and-back-again branch from 708cdf5 to d12502a Compare December 3, 2021 19:24
@findepi
Copy link
Copy Markdown
Member Author

findepi commented Dec 3, 2021

per @ssheikin's #8340 (comment), added @MiguelWeezardo as co-author.

@findepi findepi force-pushed the findepi/redirect-hive-iceberg/there-and-back-again branch from ceae60d to 7b9d7ba Compare December 4, 2021 11:24
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason why we do not care about properties which are set to null in events?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not aware of any consumer of these events. This must be some extension point for someone.
The only usage i found looks like a dummy consumer, at least this is what the commit message says.

log.debug("File created: query: %s, schema: %s, table: %s, partition: '%s', format: %s, size: %s, path: %s",
writeCompletedEvent.getQueryId(),
writeCompletedEvent.getSchemaName(),
writeCompletedEvent.getTableName(),
writeCompletedEvent.getPartitionName(),
writeCompletedEvent.getStorageFormat(),
writeCompletedEvent.getBytes(),
writeCompletedEvent.getPath());

Including null values could be a breaking change to downstream even consumers (e.g. if the [Immutable]Map.copyOf(event.getSessionProperties()). Converting values to Optional would be a breaking change to. In the absence of more information, i deemed that omitting null values is the least breaking of all options.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This probably does not matter here but I heard many times in the past that we should refrain from using singlenode envs as they provide worse test coverage than multinode ones.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for reminding me. We have 27 or something singlenode environments. And yes, here it doesn't matter, as tested functionality is really coordinator-only. Let me keep it as is.

Copy link
Copy Markdown
Member

@losipiuk losipiuk Dec 6, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you create a github issue which lists missing functionalities?
And tag relevant TODOs with an issue

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think it's necessary in this case. #8340 is going to fix this problems (once #8340 (comment) is resolved).

Copy link
Copy Markdown
Member

@hashhar hashhar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want to test SHOW STATS redirection too?

What about a non-default hive.timestamp-precision? Redirects will return different results.
Same for the use_column_name configs in tables which have different schema across partitions.

@findepi
Copy link
Copy Markdown
Member Author

findepi commented Dec 6, 2021

What about a non-default hive.timestamp-precision? Redirects will return different results.

hive.timestamp-precision doesn't matter, since there is no Hive table. Hive doesn't know the table schema, so it doesn't know whether there are any timestamps.

Same for the use_column_name configs in tables which have different schema across partitions.

same

Hive Connector redirects Iceberg table access to the configured
Iceberg catalog.

This change adds implementation of `HiveTableRedirectionsProvider`,
plugging in Hive->Iceberg redirects and leveraging existing framework.

This is based on work from several authors, mentioned at the end of
commit message.

As tests show, not all the statements support redirects properly and
this needs to be followed up upon.

Co-authored-by: Xingyuan Lin <linxingyuan1102@gmail.com>
Co-authored-by: Pratham Desai <prathamd94@gmail.com>
Co-authored-by: Sasha Sheikin <myminitrue@gmail.com>
Co-authored-by: Michał Ślizak <michal.slizak+github@gmail.com>
Co-authored-by: Łukasz Osipiuk <lukasz@osipiuk.net>
@findepi findepi force-pushed the findepi/redirect-hive-iceberg/there-and-back-again branch from 7b9d7ba to 8feb622 Compare December 6, 2021 11:17
@findepi
Copy link
Copy Markdown
Member Author

findepi commented Dec 6, 2021

Do we want to test SHOW STATS redirection too?

added

Also, split the test class into two (#10173 (comment)).

@findepi findepi merged commit d79f6ea into trinodb:master Dec 6, 2021
@findepi findepi deleted the findepi/redirect-hive-iceberg/there-and-back-again branch December 6, 2021 15:34
@findepi findepi mentioned this pull request Dec 6, 2021
10 tasks
@aierate
Copy link
Copy Markdown

aierate commented Oct 27, 2022

Do I have to use like
USE a-catalog.myschema;
SELECT * FROM mytable;
or
SELECT * FROM a-catalog.myschema.mytable;
?
Instead of using SELECT * FROM myschema.mytable without using a-catalog or USE a-catalog?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cla-signed enhancement New feature or request

Development

Successfully merging this pull request may close these issues.

4 participants