-
Notifications
You must be signed in to change notification settings - Fork 3.6k
Enable Select pushdown on uncompressed files #12633
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,3 @@ | ||
| 7,1 | ||
| 19,10 | ||
| 1,345 |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,3 @@ | ||
| 1|2 | ||
| 3|4 | ||
| 55|66 |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -106,9 +106,10 @@ private static boolean isInputFormatSupported(Properties schema) | |
| public static boolean isCompressionCodecSupported(InputFormat<?, ?> inputFormat, Path path) | ||
| { | ||
| if (inputFormat instanceof TextInputFormat) { | ||
| // S3 Select supports the following formats: uncompressed, GZIP and BZIP2. | ||
| return getCompressionCodec((TextInputFormat) inputFormat, path) | ||
| .map(codec -> (codec instanceof GzipCodec) || (codec instanceof BZip2Codec)) | ||
| .orElse(false); // TODO (https://github.com/trinodb/trino/issues/2475) fix S3 Select when file not compressed | ||
| .orElse(true); | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. question: What if a file is compressed, but with a codec that is not supported? Maybe something like Also I wonder how safe is to assume that a file is uncompressed if a codec is not found for a given extension? (I guess it is as expected as I see similar assumptions being made in other parts of the code, but want to clarify).
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Hi Andrii, if a file is compressed with a codec that isn't supported, the codec would be different and this would return false: The default Good point about the null codec assumption, though. I think it's reasonable- if there is no codec defined when a Hive table is created, it doesn't depend on codecs and is expected to have uncompressed files. This method internally uses Hadoop's CompressionCodecFactory, which I think is the standard. Thank you so much for looking into this PR!
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Oh, right. I totally misread it. Sounds good. |
||
| } | ||
|
|
||
| return false; | ||
|
|
||
Uh oh!
There was an error while loading. Please reload this page.