Add Iceberg RESTSessionCatalog Implementation#13294
Conversation
42b060b to
8fe1db8
Compare
88be805 to
692d065
Compare
194b117 to
7b02c1b
Compare
a9f5498 to
1369e25
Compare
alexjo2144
left a comment
There was a problem hiding this comment.
Overall, looks pretty good. Mostly style/nit pick comments
...ino-iceberg/src/main/java/io/trino/plugin/iceberg/catalog/rest/IcebergRESTCatalogModule.java
Outdated
Show resolved
Hide resolved
plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/catalog/rest/RESTIcebergConfig.java
Outdated
Show resolved
Hide resolved
plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/catalog/rest/RESTIcebergConfig.java
Outdated
Show resolved
Hide resolved
...eberg/src/main/java/io/trino/plugin/iceberg/catalog/rest/TrinoIcebergRESTCatalogFactory.java
Outdated
Show resolved
Hide resolved
...eberg/src/main/java/io/trino/plugin/iceberg/catalog/rest/TrinoIcebergRESTCatalogFactory.java
Outdated
Show resolved
Hide resolved
plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/catalog/rest/TrinoRESTCatalog.java
Outdated
Show resolved
Hide resolved
plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/catalog/rest/TrinoRESTCatalog.java
Outdated
Show resolved
Hide resolved
...ocker/presto-product-tests/conf/environment/singlenode-spark-iceberg-rest/iceberg.properties
Outdated
Show resolved
Hide resolved
testing/trino-product-tests/src/main/java/io/trino/tests/product/TestGroups.java
Outdated
Show resolved
Hide resolved
...roduct-tests/src/main/java/io/trino/tests/product/iceberg/TestIcebergSparkCompatibility.java
Outdated
Show resolved
Hide resolved
|
Not having View or MV support is pretty limiting. Is that a restriction from the backend, or can we implement them as follow up? |
We definitely want to add view support as a follow-up. However, the iceberg view spec is still in review, so we should wait until that is finalized. |
e9f524b to
f8f7d32
Compare
ebyhr
left a comment
There was a problem hiding this comment.
Lett initial comments. Is it possible to add a test for iceberg.rest.token and iceberg.rest.credential?
Also, could you update documentation iceberg.rst?
...ino-iceberg/src/main/java/io/trino/plugin/iceberg/catalog/rest/IcebergRESTCatalogConfig.java
Outdated
Show resolved
Hide resolved
...ino-iceberg/src/main/java/io/trino/plugin/iceberg/catalog/rest/IcebergRESTCatalogConfig.java
Outdated
Show resolved
Hide resolved
...ino-iceberg/src/main/java/io/trino/plugin/iceberg/catalog/rest/IcebergRESTCatalogConfig.java
Outdated
Show resolved
Hide resolved
...ino-iceberg/src/main/java/io/trino/plugin/iceberg/catalog/rest/IcebergRESTCatalogConfig.java
Outdated
Show resolved
Hide resolved
...ino-iceberg/src/main/java/io/trino/plugin/iceberg/catalog/rest/IcebergRESTCatalogConfig.java
Outdated
Show resolved
Hide resolved
...rg/src/test/java/io/trino/plugin/iceberg/catalog/rest/TestTrinoRESTCatalogConnectorTest.java
Outdated
Show resolved
Hide resolved
...main/java/io/trino/tests/product/launcher/env/environment/EnvSinglenodeSparkIcebergRest.java
Outdated
Show resolved
Hide resolved
...main/java/io/trino/tests/product/launcher/env/environment/EnvSinglenodeSparkIcebergRest.java
Outdated
Show resolved
Hide resolved
...cker/presto-product-tests/conf/environment/singlenode-spark-iceberg-rest/spark-defaults.conf
Outdated
Show resolved
Hide resolved
...roduct-tests/src/main/java/io/trino/tests/product/iceberg/TestIcebergSparkCompatibility.java
Outdated
Show resolved
Hide resolved
f7b1f5d to
fe10a21
Compare
@ebyhr It may be possible to add tests to validate that the token and credential values are passed through, but would require a lot of additional infrastructure to support actually testing the OAuth2 flows. The logic for all of that is already handled by the underlying Iceberg library and there are extensive tests to validate the flow behaviors. |
8acf3b7 to
fdc6f6a
Compare
testing/trino-testing/src/main/java/io/trino/testing/BaseConnectorTest.java
Outdated
Show resolved
Hide resolved
90b0522 to
1aea778
Compare
1aea778 to
0c8eb2c
Compare
plugin/trino-iceberg/pom.xml
Outdated
There was a problem hiding this comment.
Why do we need this? If we're going to enable preview features, seems like we should make that decision for all of Trino and set it at the top level.
There was a problem hiding this comment.
I can't recall whether we added that or whether this was a leftover after a rebase. I've removed it
There was a problem hiding this comment.
Agreed, I like the iceberg.rest-catalog prefix
There was a problem hiding this comment.
Is this required? Can we file an issue in Iceberg to make this not depend on Hadoop? We're trying to eliminate Hadoop as a required dependency for Iceberg and other connectors.
There was a problem hiding this comment.
I think we were asked to remove the FileIO implementations other than HadoopFileIO, which requires a Configuration. Should we add the iceberg-aws dependency back and make this optional?
There was a problem hiding this comment.
This is required unfortunately for ResolvingFileIO here
There was a problem hiding this comment.
Just to clarify the other responses, the REST Catalog does not have any runtime Hadoop dependencies. However, Trino performs IO operations using TrinoFileSystem (via HadoopFileIO), which does requires a Hadoop configuration. I don't think we want to add scope to this PR by trying to change that path, but whenever alternatives are available, this won't be a blocker.
...eberg/src/main/java/io/trino/plugin/iceberg/catalog/rest/TrinoIcebergRestCatalogFactory.java
Outdated
Show resolved
Hide resolved
plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/catalog/rest/TrinoRestCatalog.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Do we need a new test group? What is different between ICEBERG and ICEBERG_REST?
There was a problem hiding this comment.
we mainly introduced a new test group that allows running only a subset of tests, otherwise we'd have to make more changes to existing product tests to make them work across different environments.
In the long run (and in a follow-up PR) however I think we could aim for removing that particular test group if that's desired.
plugin/trino-iceberg/src/test/java/org/apache/iceberg/rest/RESTCatalogAdapter.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Is source generic in the REST catalog? How might it be used?
While it might be be useful for informational or auditing purposes, we need to ensure that it won't be used for access control, returning different information depending on the value, or that caching results with different source values is ok. Same concerns as the user.
plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/catalog/rest/TrinoRestCatalog.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Passing the user is impersonation, which has various implications. We need a config for this which is disabled by default. When disabled, we need a config to specify the shared system user to pass here.
If we're going to support impersonation, I think we need some guarantees in the REST catalog specification that will not prevent caching:
- The user is only used for access control, not auditing.
- Name mappings and table data are the same for all users. In other words,
loadTable()will return the same information for all users, or will be denied. It won't return different information depending on the user.
|
I think that it will be good to add some explanation about how the REST Catalog works. 3-4 sentences would help people very much with adopting this. General explanations of the implications of using this would also be a good addition (e.g. does the usage of this enables us to deploy Trino with Iceberg connector without Hive Metastore?). Otherwise, seems awesome |
|
@jebnix I think the current docs are OK. @danielcweeks correct me if I'm wrong but there is no need for Hive Metastore when configuring Iceberg with the REST Catalog, but it is required to use some Catalog behind the scenes (e.g. Nessie / JDBC) |
That's correct. There's no dependency on Hive, but you need a server side implementation of the REST spec, which can just be a proxied version of one of the existing Iceberg catalog implementations. |
|
@danielcweeks @nastra Do you think this can get merged soon? Trino still cannot use Iceberg without a Hive Metastore. That's the most important feature of the Iceberg Connector by far. |
|
I believe nothing is left to be done from our side, it is mainly waiting for a final approval |
|
@bitsondatadev Why doesn't this get merged? |
Co-authored-by: Eduard Tudenhoefner <etudenhoefner@gmail.com> Co-authored-by: bryanck <bryanck@gmail.com>
I'm gonna tag in @electrum and @findepi here. I'm seeing activity in general start to slow down for development as we approach the holidays so I wouldn't hold your breathe for too much happening until after new years. Just like Santa 🎅 I'm making a list but my list contains PRs I'll be checking twice when I get back from break. |
I stand corrected! Happy holidays! |
Description
This PR adds and the REST Catalog implementation for Iceberg.
New Feature / Improvement
Connector: Iceberg
This allows Trino to access any data source that supports the Iceberg REST Catalog spec.
Related issues, pull requests, and links
Documentation
( ) No documentation is needed.
( ) Sufficient documentation is included in this PR.
( ) Documentation PR is available with #prnumber.
( ) Documentation issue #issuenumber is filed, and can be handled later.
Release notes
( ) No release notes entries required.
() Release notes entries required with the following suggested text: