Skip to content

Support reading Iceberg S3 paths with double slashes#11998

Merged
electrum merged 1 commit intotrinodb:masterfrom
electrum:iceberg-s3
Apr 21, 2022
Merged

Support reading Iceberg S3 paths with double slashes#11998
electrum merged 1 commit intotrinodb:masterfrom
electrum:iceberg-s3

Conversation

@electrum
Copy link
Member

@electrum electrum commented Apr 18, 2022

Description

Allow reading Iceberg tables written by Glue that have locations containing double slashes. This type of path is not possible to represent by the Hadoop Path object since it normalizes paths, so we can hide the original path in the URI fragment.

Related issues, pull requests, and links

Fixes #11964

Documentation

(x) No documentation is needed.

Release notes

(x) Release notes entries required with the following suggested text:

# Iceberg connector
* Allow reading Iceberg tables written by Glue that have locations containing double slashes. ({issue}`11964`)

@electrum
Copy link
Member Author

@rdblue this fixes the issue you encountered

@hashhar hashhar requested a review from findepi April 19, 2022 06:55
@findepi
Copy link
Member

findepi commented Apr 19, 2022

What problem is it solving?
Is it testable?

Allow reading Iceberg tables written by Glue that have locations containing double slashes. This type of path is not possible to represent by the Hadoop Path object since it normalizes paths

thanks for context. What if the normalized path was used later? it should just work, shouldn't it?
or is it about delete deltas not being applied? (cc @alexjo2144)

cc @homar as it may affect vacuuming logic.

@electrum
Copy link
Member Author

This has nothing to do with delete deltas. The issue is that Glue writes S3 key names that contain double slashes, which is perfectly legal for an S3 key, but cannot be transported through an HDFS path.

@electrum electrum changed the title Support reading non-normalized paths for Iceberg S3 Support reading Iceberg S3 paths with double slashes Apr 20, 2022
@electrum
Copy link
Member Author

I don't know a good way to test this, since we can't write such paths with Trino. I don't feel that it's worth the effort to try to construct such a table manually for a test.

@electrum electrum merged commit 226c412 into trinodb:master Apr 21, 2022
@electrum electrum deleted the iceberg-s3 branch April 21, 2022 00:51
@github-actions github-actions bot added this to the 378 milestone Apr 21, 2022
@findepi
Copy link
Member

findepi commented Apr 21, 2022

S3 key names that contain double slashes, which is perfectly legal for an S3 key, but cannot be transported through an HDFS path.

thanks for clarifying

This has nothing to do with delete deltas.

Sounds like it may or may not be related. Deletion deltas have paths included, so if something was replacing repeated slashes (eg because of going via some intermediate representation), the deletion deltas wouldn't apply.
I hope this is not the case though.
cc @alexjo2144 @homar

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Development

Successfully merging this pull request may close these issues.

Unable to read AWS Glue created iceberg tables from S3

4 participants