Add ZSTD support for writing ORC and DWRF tables#14113
Add ZSTD support for writing ORC and DWRF tables#14113mbasmanova merged 1 commit intoprestodb:masterfrom jainxrohit:rj_zstdjni_write
Conversation
|
It would be very helpful to have an option of writing ZSTD-compressed ORC data. To move this forward, I suggest add fail-fast logic for queries that attempt to write ZSTD compressed Parquet files by introducing a check in HiveWritableTableHandle's constructor: This check is supported by a new method in HiveCompressionCodec enum: TestHivePageSink would use the same check: Finally, I'd add ZSTD to OrcTester to make sure we get good test coverage. @highker @wenleix @arhimondr @jainxrohit @bhhari Thoughts? |
mbasmanova
left a comment
There was a problem hiding this comment.
@jainxrohit LGMT.
CC: @zhenxiao
presto-hive/src/main/java/com/facebook/presto/hive/HiveCompressionCodec.java
Outdated
Show resolved
Hide resolved
presto-hive/src/main/java/com/facebook/presto/hive/HiveCompressionCodec.java
Outdated
Show resolved
Hide resolved
presto-hive/src/main/java/com/facebook/presto/hive/HiveCompressionCodec.java
Outdated
Show resolved
Hide resolved
presto-hive/src/main/java/com/facebook/presto/hive/HiveCompressionCodec.java
Outdated
Show resolved
Hide resolved
|
@jainxrohit There is a typo in commit message: prism.compression_codec -> hive.compression_codec . Same for PR description. Also, it would be nice to add a release note. |
presto-hive/src/main/java/com/facebook/presto/hive/HiveCompressionCodec.java
Outdated
Show resolved
Hide resolved
@mbasmanova I will have to check this. In my local testing hive.compression_codec does not work, only prism.compression_codec works. |
|
looks good to me. We will add ZSTD for Parquet later :) |
I have fixed it. |
presto-hive/src/main/java/com/facebook/presto/hive/util/ConfigurationUtils.java
Outdated
Show resolved
Hide resolved
|
@jainxrohit Commit message contains some duplicate sentences. Perhaps, update to something like: |
To enable ZSTD compression, use session property hive.compression_codec='ZSTD'.
| NONE(null, CompressionKind.NONE, CompressionCodecName.UNCOMPRESSED), | ||
| SNAPPY(SnappyCodec.class, CompressionKind.SNAPPY, CompressionCodecName.SNAPPY), | ||
| GZIP(GzipCodec.class, CompressionKind.ZLIB, CompressionCodecName.GZIP); | ||
| NONE(null, CompressionKind.NONE, CompressionCodecName.UNCOMPRESSED, f -> true), |
There was a problem hiding this comment.
nit: what about format -> true ? ditto for other lambdas.
|
@wenleix Wenlei, do you have any further comments? Otherwise, I'll merge. |
Add ZSTD support for writing ORC and DWRF tables
To enable ZSTD compression, use session property hive.compression_codec='ZSTD'.
== RELEASE NOTES ==
Hive Changes