Skip to content

Fix S3 directory detection based on ContentType header#6992

Merged
findepi merged 1 commit intotrinodb:masterfrom
MiguelWeezardo:trino_s3_filesystem_directory_detection
Feb 24, 2021
Merged

Fix S3 directory detection based on ContentType header#6992
findepi merged 1 commit intotrinodb:masterfrom
MiguelWeezardo:trino_s3_filesystem_directory_detection

Conversation

@MiguelWeezardo
Copy link
Member

@MiguelWeezardo MiguelWeezardo commented Feb 22, 2021

It seems S3 is now returning a charset in the Content-Type header (e.g. application/x-directory; charset=UTF-8) for directories uploaded through S3 GUI.
This breaks directory detection.

@cla-bot cla-bot bot added the cla-signed label Feb 22, 2021
@MiguelWeezardo MiguelWeezardo force-pushed the trino_s3_filesystem_directory_detection branch from c1cf0db to e2bf0db Compare February 23, 2021 08:09
Copy link
Member

@findepi findepi Feb 23, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this perhaps be case insensitive?
or perhaps parse this using a proper utility class?

MediaType.parse(metadata.getContentType()).is(MediaType.create("application", "x-directory"))

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(edited)

@MiguelWeezardo MiguelWeezardo force-pushed the trino_s3_filesystem_directory_detection branch from e2bf0db to 255b1d9 Compare February 23, 2021 11:02
@MiguelWeezardo
Copy link
Member Author

I also noticed this code above my change, which seems dead, since the metadata object is present even for a directory path:

ObjectMetadata metadata = getS3ObjectMetadata(path);

if (metadata == null) {
    // check if this path is a directory
    Iterator<LocatedFileStatus> iterator = listPrefix(path, OptionalInt.of(1), ListingMode.SHALLOW_ALL);
    if (iterator.hasNext()) {
        return new FileStatus(0, true, 1, 0, 0, qualifiedPath(path));
    }
    throw new FileNotFoundException("File does not exist: " + path);
}

return new FileStatus(
        getObjectSize(path, metadata),
        // Some directories (e.g. uploaded through S3 GUI) return a charset in the Content-Type header
        MediaType.parse(metadata.getContentType()).is(DIRECTORY_MEDIA_TYPE),
        1,
        BLOCK_SIZE.toBytes(),
        lastModifiedTime(metadata),
        qualifiedPath(path));

Is it OK that directories on S3 report a size of 32MB (BLOCK_SIZE)?

@findepi
Copy link
Member

findepi commented Feb 23, 2021

cc @electrum

@findepi findepi merged commit 67e8982 into trinodb:master Feb 24, 2021
@findepi findepi added this to the 353 milestone Feb 24, 2021
@findepi findepi mentioned this pull request Feb 24, 2021
10 tasks
nmahadevuni added a commit to nmahadevuni/presto that referenced this pull request Apr 28, 2021
Cherry pick of trinodb/trino#6992

Co-authored-by: Michał Ślizak <michal.slizak+github@gmail.com>
rschlussel pushed a commit to prestodb/presto that referenced this pull request Apr 28, 2021
Cherry pick of trinodb/trino#6992

Co-authored-by: Michał Ślizak <michal.slizak+github@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Development

Successfully merging this pull request may close these issues.

2 participants