Skip to content

Properly initialize Iceberg REST catalog Configuration#16455

Closed
alexjo2144 wants to merge 2 commits intotrinodb:masterfrom
alexjo2144:iceberg/config-initializers
Closed

Properly initialize Iceberg REST catalog Configuration#16455
alexjo2144 wants to merge 2 commits intotrinodb:masterfrom
alexjo2144:iceberg/config-initializers

Conversation

@alexjo2144
Copy link
Copy Markdown
Member

Description

Without these configurationInitializers catalog properties are not properly sent to the file system implementation.

Additional context and related issues

#13294

Release notes

( ) This is not user-visible or docs only and no release notes are required.
(x) Release notes are required, please propose a release note for me.
( ) Release notes are required, with the following suggested text:

@cla-bot cla-bot bot added the cla-signed label Mar 9, 2023
@alexjo2144 alexjo2144 requested review from electrum and findepi March 9, 2023 02:50
@github-actions github-actions bot added the iceberg Iceberg connector label Mar 9, 2023
Copy link
Copy Markdown
Member

@electrum electrum left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need to give it Hadoop properties? The setConf() calls are removed entirely here: #16454

@alexjo2144
Copy link
Copy Markdown
Member Author

alexjo2144 commented Mar 9, 2023

We either need to set it or change the FileIO implementation that's being used. Because we delegate to the core Iceberg library for the RESTTableOperations, the FileIO that gives us back right now is a HadoopFileIO which needs the Configuration.

In the other TrinoCatalog implementations we use ForwardingFileIO but that's not plumbed in here yet.

@electrum
Copy link
Copy Markdown
Member

electrum commented Mar 9, 2023

We should fix it to use ForwardingFileIO.

@electrum
Copy link
Copy Markdown
Member

electrum commented Mar 9, 2023

I looked through the code and there doesn't seem to be an obvious way to add it here. We might be able to do it by subclassing RESTSessionCatalog and making tableFileIO() protected, but that will require a new release of Iceberg.

For now, we might need to fork RESTSessionCatalog. It would be good for us to implement RESTClient using Airlift HTTP client so that we get the standard functionality and configuration.

TrinoJdbcCatalog using JdbcCatalog has the same problem. This seems fixable by subclassing JdbcCatalog and overriding newTableOps().

@alexjo2144
Copy link
Copy Markdown
Member Author

alexjo2144 commented Mar 9, 2023

@electrum how would you feel about merging this fix for now and getting ForwardingFileIO into here and JDBC after? The rest catalog is completely broken as is.

@danielcweeks
Copy link
Copy Markdown
Contributor

@alexjo2144 You can set the Catalog FileIO via the RESTSessionCatalog initialization here

properties.put(CatalogProperties.FILE_IO_IMPL, ForwardingFileIO.class.getName())

@alexjo2144
Copy link
Copy Markdown
Member Author

Thanks Dan, I saw that but the ForwardingFileIO takes a TrinoFileSystem argument in the constructor and I couldn't figure out how to get that in there.

@alexjo2144
Copy link
Copy Markdown
Member Author

I see, we can use ForwardingFileIO today if we implement HadoopConfigurable

If the `hive.fs.cache.max-size` property is set to zero the first time a
FileSystem is loaded an exception is thrown
"FileSystem max cache size has been reached"
@alexjo2144 alexjo2144 force-pushed the iceberg/config-initializers branch from 7f5a4e5 to af3c168 Compare March 9, 2023 20:16
@electrum
Copy link
Copy Markdown
Member

electrum commented Mar 9, 2023

@dain and I had a call with @rdblue and @danielcweeks this morning. Ryan is going to propose an API change Iceberg 1.2.0 for the catalogs that lets us provide a FileIO for a given configuration.

@github-actions github-actions bot added hive Hive connector tests:hive labels Mar 9, 2023
@electrum
Copy link
Copy Markdown
Member

We'll be able to fix this with Iceberg 1.2.0 thanks to apache/iceberg#7064

@alexjo2144 alexjo2144 force-pushed the iceberg/config-initializers branch from af3c168 to 944892a Compare March 10, 2023 15:23
@alexjo2144
Copy link
Copy Markdown
Member Author

Thanks David, if you'd rather wait for that fix in 1.2.0 I'll go ahead and close this

@alexjo2144 alexjo2144 closed this Mar 10, 2023
@alexjo2144 alexjo2144 deleted the iceberg/config-initializers branch July 12, 2023 18:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cla-signed hive Hive connector iceberg Iceberg connector

Development

Successfully merging this pull request may close these issues.

3 participants