Propagate selected compression codec to Avro file writer#12639
Propagate selected compression codec to Avro file writer#12639losipiuk merged 3 commits intotrinodb:masterfrom
Conversation
2a3bd2c to
c5ed456
Compare
|
@findepi reworked toward what we dicussed. So we throw if unsupported compression codec is selected for Avro. |
There was a problem hiding this comment.
Why none for unknown format?
i think it's a behavioral change. if so, should be code-commented and highlited in the commit message
There was a problem hiding this comment.
Yeah - it is a behavioral change. I think it is fine though. This code path is used only for empty files. And compressing those does not make much sense anyway. I will add comment and change commit message.
Do you think it should make it to RN?
There was a problem hiding this comment.
I wrote a bunch of BS comments here, which I deleted later on (e.g. this code path is not really empty-files only)
Current state is that I make it not change behaviour for now.
It may be considered to disable compression for empty files as a followup but it does not happen so far.
plugin/trino-hive/src/main/java/io/trino/plugin/hive/HiveCompressionOption.java
Outdated
Show resolved
Hide resolved
plugin/trino-hive/src/main/java/io/trino/plugin/hive/HiveWriterFactory.java
Outdated
Show resolved
Hide resolved
plugin/trino-hive/src/main/java/io/trino/plugin/hive/HiveConfig.java
Outdated
Show resolved
Hide resolved
plugin/trino-hive/src/main/java/io/trino/plugin/hive/HiveCompressionCodecs.java
Outdated
Show resolved
Hide resolved
plugin/trino-hive/src/main/java/io/trino/plugin/hive/HiveCompressionCodecs.java
Outdated
Show resolved
Hide resolved
c5ed456 to
284774d
Compare
|
AC. |
|
PTAL @findepi |
when i introduced compression in Detla, the hive compression mechanics and semantics made sense for Delta (and Iceberg) too. with logical compression in Hive, this is no longer the case, we should decouple the two. |
284774d to
584dc1e
Compare
@findepi This was a test issue. The session based on |
cf24199 to
1350aa0
Compare
.../src/test/java/io/trino/plugin/deltalake/transactionlog/checkpoint/TestCheckpointWriter.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Yeah - I know. This is why I am in rush.
plugin/trino-hive/src/main/java/io/trino/plugin/hive/HiveCompressionCodecs.java
Outdated
Show resolved
Hide resolved
plugin/trino-hive/src/main/java/io/trino/plugin/hive/HiveCompressionCodec.java
Outdated
Show resolved
Hide resolved
plugin/trino-hive/src/test/java/io/trino/plugin/hive/TestHivePageSink.java
Outdated
Show resolved
Hide resolved
plugin/trino-hive/src/test/java/io/trino/plugin/hive/TestHivePageSink.java
Outdated
Show resolved
Hide resolved
This will breaks the 1-1 mapping between compression codec used when writing data files via Hive connector, and value passed as hive.compression-codec configuration property or compression_codec session property. Now when choosing final compression codec to be used file format is also taken into account. Actual codec selection logic is not changed as part of this commit. New mechanics will be exploited in a following ones.
1350aa0 to
a5d1de0
Compare
Description
Related issues, pull requests, and links
Documentation
(x) No documentation is needed.
( ) Sufficient documentation is included in this PR.
( ) Documentation PR is available with #prnumber.
( ) Documentation issue #issuenumber is filed, and can be handled later.
Release notes
( ) No release notes entries required.
( ) Release notes entries required with the following suggested text: