Refactor Iceberg connector to support Iceberg native catalogs by jackye1995 · Pull Request #16612 · prestodb/presto

jackye1995 · 2021-08-16T05:41:59Z

Had some offline discussions with the maintainers of the Iceberg connector @zhenxiao @ChunxuTang @beinan , the direction that the community would like to go is to gradually remove its Hive dependency and make a pure Iceberg connector that can evolve independently.

Based on this ask, this PR introduces a nativeMode that user can turn on in order to use a different ConnectorMetadata implementation called IcebergNativeMetadata. This implementation has no dependency on the Hive connector. The plan is to first develop all the features using this switch, and once it is feature complete and stable, we can drop all the presto-hive related dependencies and deprecate the old implementations.

Test plan: added distributed and smoke tests running using Iceberg's HadoopCatalog, all tests pass.

== RELEASE NOTES ==
Iceberg Changes
* Iceberg connector now supports a native mode that can be used without a Hive installation to run queries against Iceberg  native catalogs.

cc @pettyjamesm

jackye1995 · 2021-08-16T05:43:21Z

The rollback code does nothing anyway, so I just removed it.

Just wanna double confirm, both the rollback of IcebergMetadata and IcebergNativeMetadata do nothing?

jackye1995 · 2021-08-16T05:46:14Z

Iceberg in general uses the pattern of dynamic plugin, users can plugin their own implementations of Catalog, FileIO and a couple of other important classes. If this is an anti-pattern for Presto, we can move to a Enum type. This feature is mostly oriented for customization of enterprise users, they also need to load in their own jar containing the implementation class during installation time.

I might prefer to move it to an enum, we can config it in the config file of the presto-iceberg connector, and then initialized the catalog impl in a module. Let me know if you wanna see some similar examples in presto.

jackye1995 · 2021-08-16T05:48:32Z

This is basically to get a string representation of the user identity. As long as the identity does not change, the same catalog implementation can be used. This technically should not need any cache expiration, because the Catalog instance itself does not check access, but when any catalog action is called the access of the identity is checked, so the cache can still work when the access policy of the identity changes at server side.

jackye1995 · 2021-08-16T05:48:56Z

FileIO related caching is not added yet, just to keep the initial implementation simple.

jackye1995 · 2021-08-16T05:51:18Z

although this is reloading the Iceberg table, because we always input the actual snapshot ID, snapshot isolation is still guaranteed. Some optimizations can be made to cache the table object, but I am not adding that here to keep things simple.

beinan

Awesome PR! I heard quite a few giant companies are waiting for this one. Really appreciate your contribution! Just put a few comments, I will continue review it tomorrow.

beinan · 2021-08-17T01:23:23Z

I might prefer to move it to an enum, we can config it in the config file of the presto-iceberg connector, and then initialized the catalog impl in a module. Let me know if you wanna see some similar examples in presto.

beinan · 2021-08-17T01:27:41Z

this name is a little bit vague, cache-size sounds like the number of bytes. thinking if we can use cached_catalog_num or something? your call.

sure, I will change to cached-catalog-num instead, using - to match the convention of Presto config keys.

beinan · 2021-08-17T01:33:33Z

Just wanna double confirm, both the rollback of IcebergMetadata and IcebergNativeMetadata do nothing?

beinan · 2021-08-17T01:45:11Z

I know we're doing something similar every where -- return a null when we got NoSuchTableException. Thinking if we could put a comment to explain why we return a null rather than throw an exception?

I think returning null if table not exist in getTableHandle seems like the consistent behavior across all connectors, so it's probably fine to not give an explanation.

beinan · 2021-08-17T01:48:43Z

why do we need extract getRawSystemTable as a separate private method? shall we just inline the code into getSystemTable? I didn't see any other caller of this method.

good point, this was mostly following the structure of IcebergMetadata, let me update.

beinan · 2021-08-17T01:58:37Z

nit: we do not need this case actually. but I'm ok if you would like put it explicitly here -- add a comment to explain why we do not have system table for DATA case?

sure, I moved it together with the default case with a comment.

beinan · 2021-08-17T02:01:16Z

nit: hmmm, I'm not sure if we should set a null or use orElse("") here. why do you use an empty string rather than null?

good point, this is to match the behavior of IcebergMetadata, but I agree it seems awkward to return empty string. That constructor is deprecated anyway, I will switch to use the builder.

beinan · 2021-08-17T02:09:06Z

"return" is not necessary I think. can we just write ".map(column -> new ColumnMetadata(column.name(), toPrestoType(column.type(), typeManager), column.doc(), false))"

Similar to the last one, this is to match the behavior of IcebergMetadata, but that constructor is deprecated anyway, I will switch to use the builder.

ChunxuTang

@jackye1995 Thanks so much for your work! Generally, the PR looks good to me. We can iterate on the work here for further improvement.
I noticed that there're some compilation errors. At your convenience, could you fix them and re-run the tests? Thanks!

jackye1995 · 2021-09-30T18:58:01Z

@ChunxuTang Thanks for the review! The error was due to rebase, I have fixed it and ran tests locally to make sure it passes. I have also fixed based on your comments.

beinan · 2021-10-04T19:20:23Z

rerun the test jobs

beinan

lgtm, great contribution for the presto+iceberg use cases. thanks a lot!

beinan · 2021-10-04T22:42:59Z

@ChunxuTang Thanks for the review! The error was due to rebase, I have fixed it and ran tests locally to make sure it passes. I have also fixed based on your comments.

@jackye1995 could you rebase master again? looks like some of the tests are still failing. Thank you!

jackye1995 commented Aug 16, 2021

View reviewed changes

beinan self-assigned this Aug 17, 2021

beinan reviewed Aug 17, 2021

View reviewed changes

jackye1995 force-pushed the multi-catalog branch from d18743f to d6b2b8e Compare September 29, 2021 06:13

ChunxuTang reviewed Sep 30, 2021

View reviewed changes

jackye1995 force-pushed the multi-catalog branch from d6b2b8e to e6e5986 Compare September 30, 2021 18:57

beinan approved these changes Oct 4, 2021

View reviewed changes

Refactor Iceberg connector to support Iceberg native catalogs

69e9906

jackye1995 force-pushed the multi-catalog branch from e6e5986 to 69e9906 Compare October 5, 2021 17:20

beinan merged commit b92550b into prestodb:master Oct 5, 2021

prithvip mentioned this pull request Oct 19, 2021

Add release notes for 0.264 #16893

Merged

5 tasks

rohanpednekar added the iceberg Apache Iceberg related label Nov 18, 2021

ChunxuTang mentioned this pull request Jan 27, 2022

Consolidate iceberg native-mode into the catalog type #17233

Merged

hantangwangd mentioned this pull request Oct 16, 2024

[Iceberg]Enable test cases for rename table on REST and NESSIE catalog #23837

Merged

6 tasks

Conversation

jackye1995 commented Aug 16, 2021

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

beinan left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ChunxuTang left a comment

Choose a reason for hiding this comment

Uh oh!

jackye1995 commented Sep 30, 2021

Uh oh!

beinan commented Oct 4, 2021

Uh oh!

beinan left a comment

Choose a reason for hiding this comment

Uh oh!

beinan commented Oct 4, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants