Add support for Nessie Catalog in Iceberg Connector#17719
Add support for Nessie Catalog in Iceberg Connector#17719beinan merged 1 commit intoprestodb:masterfrom
Conversation
1a0b454 to
ec00cc4
Compare
|
@ChunxuTang could you review this please? |
|
@nastra Thanks for your work! Sure, happy to help review it soon this week. |
There was a problem hiding this comment.
this requires apache/iceberg#4700 to work as expected
There was a problem hiding this comment.
this and the one below will be changed to assertThat(computeActual(sessionOne, "SHOW SCHEMAS FROM iceberg LIKE 'namespace_one'").getMaterializedRows()).hasSize(1); once Iceberg 0.14.0 is out, since it requires apache/iceberg#4700 to properly work
ChunxuTang
left a comment
There was a problem hiding this comment.
@nastra Thanks so much for your nice contribution! This is a very helpful feature!
Just left some minor suggestions. Feel free to take a look.
...o-iceberg/src/test/java/com/facebook/presto/iceberg/nessie/TestIcebergDistributedNessie.java
Outdated
Show resolved
Hide resolved
presto-iceberg/src/test/java/com/facebook/presto/iceberg/nessie/TestIcebergSmokeNessie.java
Outdated
Show resolved
Hide resolved
...-iceberg/src/test/java/com/facebook/presto/iceberg/nessie/TestIcebergSystemTablesNessie.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Just curious: is iceberg.catalog.warehouse required for the Nessie catalog?
There was a problem hiding this comment.
yes this is in fact required, see also https://iceberg.apache.org/docs/latest/nessie/#nessie-catalog
ec00cc4 to
1def224
Compare
ChunxuTang
left a comment
There was a problem hiding this comment.
LGTM! @nastra Thanks again for your nice work!
Hi @beinan and @kewang1024, would you also like to take a review of this PR? Thanks!
beinan
left a comment
There was a problem hiding this comment.
lgtm except a couple of very minor issues.
presto-iceberg/src/main/java/com/facebook/presto/iceberg/IcebergResourceFactory.java
Outdated
Show resolved
Hide resolved
presto-iceberg/src/main/java/com/facebook/presto/iceberg/IcebergResourceFactory.java
Outdated
Show resolved
Hide resolved
presto-iceberg/src/main/java/com/facebook/presto/iceberg/IcebergResourceFactory.java
Outdated
Show resolved
Hide resolved
1def224 to
c899aff
Compare
|
Hey sorry for the delay, will finish review by the end of today (5/17) |
kewang1024
left a comment
There was a problem hiding this comment.
For all the four tests files, they share the same init/teardown/createQueryRunner function, can we extract those out to a common place so we don't need to create repeated logic
presto-iceberg/src/main/java/com/facebook/presto/iceberg/IcebergResourceFactory.java
Outdated
Show resolved
Hide resolved
presto-iceberg/src/main/java/com/facebook/presto/iceberg/IcebergResourceFactory.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Why don't we have config for NESSIE_REFERENCE_HASH?
There was a problem hiding this comment.
because this is generally something a user would only specify at runtime when actually reading data at a particular hash. Having a config option for that would just be confusing to users because they would have to know the hash upfront.
There was a problem hiding this comment.
So does it mean usually different users would use different hash and they don't usually share? and thus we don't have any need to set a default value on a cluster level
There was a problem hiding this comment.
Should we have isNullOrEmpty check here?
There was a problem hiding this comment.
I don't see this kind of validation being done in other Config classes, so I was assuming that the @NotNull annotation on getDefaultReferenceName() would take care of this.
There was a problem hiding this comment.
I think we do have null check examples, StaticCatalogStoreConfig
There was a problem hiding this comment.
added a null/empty check + added a test for it
There was a problem hiding this comment.
Just for compleness: I went over the config code to understand when things are being validated. I used the below test and temporarily removed the null/empty check in setDefaultReferenceName. Note that we provide an empty string to iceberg.nessie.ref in this example.
@Test
public void testNessieMetastore()
{
getOnlyElement(new IcebergPlugin().getConnectorFactories()).create(
"test",
ImmutableMap.of(
"hive.metastore.uri", "thrift://foo:1234",
"iceberg.catalog.type", "nessie",
"iceberg.nessie.ref", ""),
new TestingConnectorContext())
.shutdown();
}
The validation of the config basically happens when the Plugin is being bootstrapped here and this is when the validation annotations (@NotNull / @NotEmpty/...) are being checked/verified. The startup of the plugin then correctly failed with the following exception:
com.google.inject.CreationException: Unable to create injector, see the following errors:
1) Error: Invalid configuration property iceberg.nessie.ref: must not be null or empty (for class com.facebook.presto.iceberg.nessie.NessieConfig.defaultReferenceName)
1 error
at com.google.inject.internal.Errors.throwCreationExceptionIfErrorsExist(Errors.java:543)
at com.google.inject.internal.InternalInjectorCreator.initializeStatically(InternalInjectorCreator.java:159)
at com.google.inject.internal.InternalInjectorCreator.build(InternalInjectorCreator.java:106)
at com.google.inject.Guice.createInjector(Guice.java:87)
at com.facebook.airlift.bootstrap.Bootstrap.initialize(Bootstrap.java:257)
at com.facebook.presto.iceberg.InternalIcebergConnectorFactory.createConnector(InternalIcebergConnectorFactory.java:86)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at com.facebook.presto.iceberg.IcebergConnectorFactory.create(IcebergConnectorFactory.java:49)
at com.facebook.presto.iceberg.TestIcebergPlugin.testNessieMetastore(TestIcebergPlugin.java:42)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.testng.internal.MethodInvocationHelper.invokeMethod(MethodInvocationHelper.java:104)
at org.testng.internal.Invoker.invokeMethod(Invoker.java:645)
at org.testng.internal.Invoker.invokeTestMethod(Invoker.java:851)
at org.testng.internal.Invoker.invokeTestMethods(Invoker.java:1177)
at org.testng.internal.TestMethodWorker.invokeTestMethods(TestMethodWorker.java:129)
at org.testng.internal.TestMethodWorker.run(TestMethodWorker.java:112)
at org.testng.TestRunner.privateRun(TestRunner.java:756)
at org.testng.TestRunner.run(TestRunner.java:610)
at org.testng.SuiteRunner.runTest(SuiteRunner.java:387)
at org.testng.SuiteRunner.runSequentially(SuiteRunner.java:382)
at org.testng.SuiteRunner.privateRun(SuiteRunner.java:340)
at org.testng.SuiteRunner.run(SuiteRunner.java:289)
at org.testng.SuiteRunnerWorker.runSuite(SuiteRunnerWorker.java:52)
This would explain why the config classes don't perform validation checks in the setters but completely rely on the Validation annotations on the getters.
There was a problem hiding this comment.
Thanks for the fact check on the config validation!! Then looks like the null check could actually be safely removed since the getter would perform the null check?
There was a problem hiding this comment.
yep I removed the null check now
I moved out the required connector properties for Nessie into a util method but I'm not sure it's worth extracting the init/teardown to a common place, since we're essentially really only calling start/stop on the container. There's currently no other complex init/teardown required for Nessie so imo it seems (at least for now) to be better to leave the start/stop of the container in the respective test classes. |
c899aff to
2df6cef
Compare
|
Thanks for your reviews @kewang1024 @beinan @ChunxuTang. I rebased the PR and I hope I addressed everything that's relevant. Please take another look if you can. |
So essentially what I meant was creating a parent class that has init/teardown/createQueryRunner function for nessie and then all your four tests files can only focus on its own test logic |
beinan
left a comment
There was a problem hiding this comment.
lgtm, just one non-blocking minor issue. could please address all the comments from @kewang1024 , then we could merge your PR. Thanks a lot!
There was a problem hiding this comment.
Nit: to be more consistent, we could put all the cases in alphabetical order
There was a problem hiding this comment.
that's what I had initially, but Intellij would then complain due to duplicate case branches. So I figured that it's better to merge HADOOP + NESSIE together
There was a problem hiding this comment.
@beinan let me know if you would like me to still change this and introduce duplicate case branches
@kewang1024 that would be ideal if we could have a base class, but all of those 4 test classes are already extending different other test classes to test different things so I don't see how we could have a base class for this unfortunately. |
2df6cef to
341a322
Compare
341a322 to
4c294cc
Compare
This PR integrates the (Nessie catalog functionality)[https://github.com/apache/iceberg/tree/master/nessie/src/main/java/org/apache/iceberg/nessie] to the Iceberg connector. Roughtly, it adds the following new things: * adds a new `NESSIE` catalog type, that uses `NessieCatalog` * renames `IcebergHadoopMetadata` to `IcebergNativeMetadata` and adds the particular `CatalogType`. This allows reusing that class for both `HADOOP` + `NESSIE`. * adds 2 new `IcebergSessionProperties` to control which referenceName / referenceHash to use with Nessie * some minor adjustments in `IcebergResourceFactory` that make sure to pass the correct properties to the `NessieCatalog` * adds `NessieConfig` that controls different Nessie settings and passes those to the underyling `NessieCatalog` * adds `NessieContainer` that allows to bring up a Nessie server for testing
4c294cc to
c06d08e
Compare
|
@beinan @ChunxuTang is this good to be merged? |
This PR integrates the Nessie catalog functionality to the Iceberg connector.
Roughtly, it adds the following new things:
NESSIEcatalog type, that usesNessieCatalogIcebergHadoopMetadatatoIcebergNativeMetadataand adds the particularCatalogType. This allows reusing that class for bothHADOOP+NESSIE.IcebergSessionPropertiesto control which referenceName / referenceHash to use with NessieIcebergResourceFactorythat make sure to pass the correct properties to theNessieCatalogNessieConfigthat controls different Nessie settings and passes those to the underylingNessieCatalogNessieContainerthat allows to bring up a Nessie server for testing