Fix reading from Hive table with location being S3 bucket itself#17848
Merged
findepi merged 3 commits intotrinodb:masterfrom Jun 15, 2023
Merged
Conversation
findinpath
commented
Jun 12, 2023
plugin/trino-hive/src/test/java/io/trino/plugin/hive/BaseTestHiveOnDataLake.java
Outdated
Show resolved
Hide resolved
0e172dd to
f472bf9
Compare
f472bf9 to
e7c9a04
Compare
findepi
reviewed
Jun 13, 2023
plugin/trino-hive/src/main/java/io/trino/plugin/hive/fs/DirectoryListingFilter.java
Outdated
Show resolved
Hide resolved
9dae8cd to
d80159f
Compare
1 task
31362c8 to
15a34ef
Compare
15a34ef to
628c763
Compare
Co-authored-by: Assaf Bern <assaf.bern@starburstdata.com>
628c763 to
8ebb3e6
Compare
findepi
approved these changes
Jun 14, 2023
Member
|
/test-with-secrets sha=8ebb3e6a06a635958dcf826c40a35f514328817d
|
Member
|
Some CI jobs about Hive were cancelled due to timeout. Let me retrigger. |
Member
|
/test-with-secrets sha=8ebb3e6a06a635958dcf826c40a35f514328817d
|
marcinsbd
reviewed
Jun 15, 2023
| String transactionLogDir = getTransactionLogDir(tableLocation); | ||
| TrinoFileSystem fileSystem = fileSystemFactory.create(session); | ||
| String commonPathPrefix = tableLocation + "/"; | ||
| String commonPathPrefix = tableLocation.endsWith("/") ? tableLocation : tableLocation + "/"; |
Contributor
There was a problem hiding this comment.
How about adding clarifying parentheses?
tableLocation.endsWith("/") ? tableLocation : (tableLocation + "/");
marcinsbd
reviewed
Jun 15, 2023
| assertThat(getAllDataFilesFromTableDirectory(tableName)).isEqualTo(union(initialFiles, updatedFiles)); | ||
|
|
||
| // vacuum with low retention period | ||
| MILLISECONDS.sleep(1_000 - timeSinceUpdate.elapsed(MILLISECONDS) + 1); |
Contributor
There was a problem hiding this comment.
Shouldn't be divided to two tests?
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description
In case that a user decides to dedicate an S3 bucket for a Hive table, this PR adds the required handling to support reading from such a table.
The hadoop
Pathcorresponding to 's3://myhivetable/' has the string representation's3://myhivetable/'.The hadoop
Pathcorresponding to's3://myhivetable/somedir/' has the string representation's3://myhivetable/somedir' (note that in this case the/at the end of the string representation is not present).Specific handling has to be done in such a case to find the data files at the top level of the bucket.
Not present in the tests provided with this PR, but still a valid scenario is the usage of
hive.recursive-directories=truehive setting. Tested manually so far.Additional context and related issues
Release notes
( ) This is not user-visible or docs only and no release notes are required.
( ) Release notes are required, please propose a release note for me.
(x) Release notes are required, with the following suggested text: