Skip to content

Fix double slash issue in Iceberg metadata location#14389

Merged
ebyhr merged 2 commits intotrinodb:masterfrom
ebyhr:ebi/iceberg-metadata-location-double-slash
Oct 11, 2022
Merged

Fix double slash issue in Iceberg metadata location#14389
ebyhr merged 2 commits intotrinodb:masterfrom
ebyhr:ebi/iceberg-metadata-location-double-slash

Conversation

@ebyhr
Copy link
Copy Markdown
Member

@ebyhr ebyhr commented Sep 30, 2022

Description

Fixes #14299

Release notes

( ) This is not user-visible or docs only and no release notes are required.
( ) Release notes are required, please propose a release note for me.
(x) Release notes are required, with the following suggested text:

# Iceberg
* Fix failure when reading a table with leading double slash in the metadata location. ({issue}`14299`)

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you link the PRs that added this issue in 393 and which one fixed it in 395?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated the code comment.

@ebyhr ebyhr force-pushed the ebi/iceberg-metadata-location-double-slash branch from 4287d0d to 4156863 Compare October 6, 2022 21:27
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a +1,580 −814 PR. Can we have a link to a specific code lines in 393 or 394 that were causing this?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed a link URL.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is it possible that the metadata path really contains two slashes?
for example, what if table was created with location = 's3://bucket/my_table/////'?

also, is it possible that //metadata/ substring occurs somewhere else within the location?
for example, what if my bucket is called metadata?
or, i have a table created with location = 's3://bucket/i/love/slashes/and//////metadata/my_schema/my_table'?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, that was concern when I was writing this method. Fixed the logic to replace only when the location ends with //metadata/{json file name}. Please take another look.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add

// Simulate corrupted metadata location as Trino 393-394 was doing

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"Remove redundant slash" sounds a bit like "remove trailing slash" -- some operation that you do when manipulating paths.
Let's have the name express the fact this is a workaround for bad writers, not something you'd normally have to do.

fixBrokenMetadataLocation

@ebyhr ebyhr force-pushed the ebi/iceberg-metadata-location-double-slash branch from 4156863 to 9f642cb Compare October 11, 2022 07:38
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's inconceivable that location contains multiple occurrences of brokenSuffix, but why would we burden the reader with having to think about that?

Use substring (or replaceFirst(Patter.quote(...) + "$", Matcher.quotereplacement(...)))

@ebyhr ebyhr force-pushed the ebi/iceberg-metadata-location-double-slash branch from 9f642cb to bf0d2a7 Compare October 11, 2022 09:13
@ebyhr
Copy link
Copy Markdown
Member Author

ebyhr commented Oct 11, 2022

CI hit #14568

Comment on lines +637 to +638
String correctSuffix = "/metadata/" + fileName;
String brokenSuffix = "//metadata/" + fileName;
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could this have happened with files in the //data/ directory?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I temporary reverted #13984 and confirmed a data directory is fine. We may need additional handling if other query engines generates such directories though. Let's keep as-is since we haven't received issues about a data directory for now.

@ebyhr ebyhr merged commit 0c81592 into trinodb:master Oct 11, 2022
@ebyhr ebyhr deleted the ebi/iceberg-metadata-location-double-slash branch October 11, 2022 23:55
@ebyhr ebyhr mentioned this pull request Oct 11, 2022
@github-actions github-actions bot added this to the 400 milestone Oct 12, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Development

Successfully merging this pull request may close these issues.

Iceberg metadata location gets changed with newer Trino versions

3 participants