Enabled hive splits for uncompressed CSV files with S3 Select pushdown#13754
Merged
arhimondr merged 1 commit intotrinodb:masterfrom Aug 30, 2022
dnanuti:master
Merged
Enabled hive splits for uncompressed CSV files with S3 Select pushdown#13754arhimondr merged 1 commit intotrinodb:masterfrom dnanuti:master
arhimondr merged 1 commit intotrinodb:masterfrom
dnanuti:master
Conversation
arhimondr
reviewed
Aug 19, 2022
plugin/trino-hive/src/test/java/io/trino/plugin/hive/s3select/TestS3SelectPushdown.java
Outdated
Show resolved
Hide resolved
plugin/trino-hive/src/test/java/io/trino/plugin/hive/s3select/TestS3SelectPushdown.java
Outdated
Show resolved
Hide resolved
plugin/trino-hive/src/test/java/io/trino/plugin/hive/s3select/TestS3SelectPushdown.java
Outdated
Show resolved
Hide resolved
plugin/trino-hive/src/test/java/io/trino/plugin/hive/s3select/TestS3SelectPushdown.java
Outdated
Show resolved
Hide resolved
plugin/trino-hive/src/test/java/io/trino/plugin/hive/s3select/TestS3SelectPushdown.java
Outdated
Show resolved
Hide resolved
...rc/test/java/io/trino/plugin/hive/s3select/TestHiveFileSystemS3SelectPushdownWithSplits.java
Outdated
Show resolved
Hide resolved
findinpath
reviewed
Aug 23, 2022
plugin/trino-hive/src/main/java/io/trino/plugin/hive/s3select/S3SelectPushdown.java
Outdated
Show resolved
Hide resolved
...rc/test/java/io/trino/plugin/hive/s3select/TestHiveFileSystemS3SelectPushdownWithSplits.java
Outdated
Show resolved
Hide resolved
...rc/test/java/io/trino/plugin/hive/s3select/TestHiveFileSystemS3SelectPushdownWithSplits.java
Outdated
Show resolved
Hide resolved
arhimondr
approved these changes
Aug 23, 2022
...rino-hive-hadoop2/src/test/java/io/trino/plugin/hive/s3select/S3SelectDefaultTestConfig.java
Outdated
Show resolved
Hide resolved
...rino-hive-hadoop2/src/test/java/io/trino/plugin/hive/s3select/S3SelectDefaultTestConfig.java
Outdated
Show resolved
Hide resolved
...rino-hive-hadoop2/src/test/java/io/trino/plugin/hive/s3select/S3SelectDefaultTestConfig.java
Outdated
Show resolved
Hide resolved
...rc/test/java/io/trino/plugin/hive/s3select/TestHiveFileSystemS3SelectPushdownWithSplits.java
Outdated
Show resolved
Hide resolved
arhimondr
approved these changes
Aug 24, 2022
Contributor
|
nit: Please keep the number of chars per line in the commit detail less than 80 (as described in https://github.com/trinodb/trino/blob/master/.github/DEVELOPMENT.md#format-git-commit-messages) |
Member
Author
Totally missed that, thanks a lot for flagging this, updated! |
findinpath
reviewed
Aug 24, 2022
plugin/trino-hive/src/main/java/io/trino/plugin/hive/s3select/S3SelectPushdown.java
Outdated
Show resolved
Hide resolved
plugin/trino-hive/src/main/java/io/trino/plugin/hive/BackgroundHiveSplitLoader.java
Outdated
Show resolved
Hide resolved
plugin/trino-hive/src/main/java/io/trino/plugin/hive/BackgroundHiveSplitLoader.java
Outdated
Show resolved
Hide resolved
plugin/trino-hive/src/main/java/io/trino/plugin/hive/BackgroundHiveSplitLoader.java
Outdated
Show resolved
Hide resolved
plugin/trino-hive/src/main/java/io/trino/plugin/hive/BackgroundHiveSplitLoader.java
Outdated
Show resolved
Hide resolved
plugin/trino-hive-hadoop2/src/test/java/io/trino/plugin/hive/s3select/S3SelectTestHelper.java
Outdated
Show resolved
Hide resolved
findinpath
reviewed
Aug 24, 2022
...rc/test/java/io/trino/plugin/hive/s3select/TestHiveFileSystemS3SelectPushdownWithSplits.java
Outdated
Show resolved
Hide resolved
arhimondr
approved these changes
Aug 25, 2022
plugin/trino-hive/src/test/java/io/trino/plugin/hive/HiveFileSystemTestUtils.java
Outdated
Show resolved
Hide resolved
findinpath
reviewed
Aug 25, 2022
...rc/test/java/io/trino/plugin/hive/s3select/TestHiveFileSystemS3SelectPushdownWithSplits.java
Outdated
Show resolved
Hide resolved
findinpath
approved these changes
Aug 26, 2022
Scan range allows S3 Select to query uncompressed files at a finer granularity than the entire object, by providing a byte range to SelectObjectContent requests. This change enables hive internal splits for S3 Select by sending scan range requests for uncompressed CSV files.
arhimondr
approved these changes
Aug 30, 2022
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description
Scan range allows S3 Select to query uncompressed files at a finer granularity than the entire object, by providing a byte range to SelectObjectContent requests. This change enables hive internal splits for S3 Select by sending scan range requests for uncompressed CSV files.
This PR is a performance optimization for Hive S3 Select connector with uncompressed CSV input, leveraging the scan range feature of the service. JSON support will be added in a separate PR.
File splitting is configurable on the client side through the already existing session properties, such as:
Hive S3 Select connector
Trino client will return results faster when S3 Select pushdown is enabled for uncompressed CSV files:
set SESSION hive.s3_select_pushdown_enabled=true;Related issues, pull requests, and links
Accidentally closed previous PR: #13417 with a wrong fork sync.
Documentation
( ) No documentation is needed.
(x) Sufficient documentation is included in this PR.
( ) Documentation PR is available with #prnumber.
( ) Documentation issue #issuenumber is filed, and can be handled later.
Release notes
( ) No release notes entries required.
(x) Release notes entries required with the following suggested text: