Skip to content

Core: Allow customizing FileIO in REST catalog#7064

Merged
rdblue merged 5 commits intoapache:masterfrom
rdblue:rest-customize-file-io
Mar 10, 2023
Merged

Core: Allow customizing FileIO in REST catalog#7064
rdblue merged 5 commits intoapache:masterfrom
rdblue:rest-customize-file-io

Conversation

@rdblue
Copy link
Copy Markdown
Contributor

@rdblue rdblue commented Mar 9, 2023

This adds setFileIOBuilder(Function<Map<String, String>, FileIO) to the RESTSessionCatalog. This allows another way to customize the FileIO instances used by the REST catalog. For example, Trino has custom FileIO implementations that manage resources differently than S3FileIO.

The function accepts a string map of config properties, which allows passing context from the catalog and table level for the IO, including token for remote S3 signing as well as the s3.* properties for access to S3.

@github-actions github-actions bot added the core label Mar 9, 2023
@rdblue
Copy link
Copy Markdown
Contributor Author

rdblue commented Mar 9, 2023

@electrum can you take a look and see if this would meet your needs?

@electrum
Copy link
Copy Markdown
Contributor

I tested this out and it works for us. The simple integration where we ignore config is good enough to have everything go through TrinoFileSystem and avoid the need for HadoopFileIO:

icebergCatalogInstance.setFileIOBuilder(config -> new ForwardingFileIo(fileSystemFactory.create(identity)));

We can extend this to capture the config for overriding S3 credentials, signer, etc.

@electrum
Copy link
Copy Markdown
Contributor

We'll need the same thing for JdbcCatalog.

Copy link
Copy Markdown
Contributor

@nastra nastra left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM once checkstyle/spotless is happy

String ioImpl = properties.get(CatalogProperties.FILE_IO_IMPL);
return
CatalogUtil.loadFileIO(
ioImpl != null ? ioImpl : ResolvingFileIO.class.getName(), properties, conf);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: maybe worth inlining this, so that we don't need the null check on ioImpl:

CatalogUtil.loadFileIO(
          properties.getOrDefault(CatalogProperties.FILE_IO_IMPL, ResolvingFileIO.class.getName()),
          properties,
          conf);

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I updated this to be more like the JDBC catalog, where we use getOrDefault on the line above to clean it up.

@rdblue rdblue force-pushed the rest-customize-file-io branch from 32ad140 to 5c388c9 Compare March 10, 2023 18:06
@rdblue rdblue force-pushed the rest-customize-file-io branch from 5c388c9 to 33ad0be Compare March 10, 2023 18:07
}
}

public void setFileIOBuilder(Function<Map<String, String>, FileIO> ioBuilder) {
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I looked into adding this to BaseMetastoreCatalog but it didn't make sense. That class is not Configurable and creating an IO requires a Configuration when the subclasses implement it. As a result, this class would need to override both this method and a newFileIO method so it isn't worth it.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seeing as how we're duplicating this API across some catalogs, should we pull the set/create into an interface?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd like to see whether we need this more broadly first. It's a pretty narrow use case at the moment.

Copy link
Copy Markdown
Contributor

@danielcweeks danielcweeks left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 (pending checks)

@rdblue rdblue merged commit fa6fa23 into apache:master Mar 10, 2023
krvikash pushed a commit to krvikash/iceberg that referenced this pull request Mar 16, 2023
zhongyujiang pushed a commit to zhongyujiang/iceberg that referenced this pull request Apr 16, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants