Use byte arrays for encoding short decimals in native hive parquet writer#12658
Merged
findepi merged 6 commits intotrinodb:masterfrom Jun 10, 2022
Merged
Use byte arrays for encoding short decimals in native hive parquet writer#12658findepi merged 6 commits intotrinodb:masterfrom
findepi merged 6 commits intotrinodb:masterfrom
Conversation
86d68d6 to
756fc60
Compare
skrzypo987
approved these changes
Jun 3, 2022
...-parquet/src/main/java/io/trino/parquet/writer/valuewriter/Int32ShortDecimalValueWriter.java
Outdated
Show resolved
Hide resolved
...-parquet/src/main/java/io/trino/parquet/writer/valuewriter/Int64ShortDecimalValueWriter.java
Outdated
Show resolved
Hide resolved
plugin/trino-hive/src/test/java/io/trino/plugin/hive/parquet/ParquetTester.java
Outdated
Show resolved
Hide resolved
756fc60 to
5ed0cb6
Compare
5ed0cb6 to
d2b2ce0
Compare
skrzypo987
approved these changes
Jun 3, 2022
findepi
reviewed
Jun 6, 2022
...ing/trino-product-tests/src/main/java/io/trino/tests/product/hive/TestHiveCompatibility.java
Outdated
Show resolved
Hide resolved
lib/trino-parquet/src/main/java/io/trino/parquet/writer/ParquetSchemaConverter.java
Outdated
Show resolved
Hide resolved
lib/trino-parquet/src/main/java/io/trino/parquet/writer/ParquetSchemaConverter.java
Outdated
Show resolved
Hide resolved
lib/trino-parquet/src/main/java/io/trino/parquet/writer/ParquetSchemaConverter.java
Outdated
Show resolved
Hide resolved
lib/trino-parquet/src/main/java/io/trino/parquet/writer/ParquetSchemaConverter.java
Outdated
Show resolved
Hide resolved
lib/trino-parquet/src/main/java/io/trino/parquet/writer/valuewriter/DecimalValueWriter.java
Outdated
Show resolved
Hide resolved
plugin/trino-hive/src/main/java/io/trino/plugin/hive/parquet/ParquetFileWriterFactory.java
Outdated
Show resolved
Hide resolved
Member
|
I skipped over the actual writers, assuming @skrzypo987 took a look at them. Thanks @skrzypo987 . @alexjo2144 can you PTAL too? |
d2b2ce0 to
b9a05a2
Compare
b9a05a2 to
8b78956
Compare
findepi
reviewed
Jun 9, 2022
lib/trino-parquet/src/test/java/io/trino/parquet/writer/TestParquetSchemaConverter.java
Outdated
Show resolved
Hide resolved
lib/trino-parquet/src/main/java/io/trino/parquet/writer/ParquetSchemaConverter.java
Outdated
Show resolved
Hide resolved
...-parquet/src/main/java/io/trino/parquet/writer/valuewriter/BinaryLongDecimalValueWriter.java
Outdated
Show resolved
Hide resolved
...parquet/src/main/java/io/trino/parquet/writer/valuewriter/BinaryShortDecimalValueWriter.java
Outdated
Show resolved
Hide resolved
Comment on lines
58
to
61
Member
There was a problem hiding this comment.
Maybe unfold the loop with a switch on numBytes
Contributor
There was a problem hiding this comment.
@findepi what is the gain of doing this change?
Member
There was a problem hiding this comment.
i expect them to perform differently.
...-parquet/src/main/java/io/trino/parquet/writer/valuewriter/Int32ShortDecimalValueWriter.java
Outdated
Show resolved
Hide resolved
findinpath
reviewed
Jun 9, 2022
lib/trino-parquet/src/main/java/io/trino/parquet/writer/ParquetSchemaConverter.java
Outdated
Show resolved
Hide resolved
...-parquet/src/main/java/io/trino/parquet/writer/valuewriter/BinaryLongDecimalValueWriter.java
Outdated
Show resolved
Hide resolved
...-parquet/src/main/java/io/trino/parquet/writer/valuewriter/BinaryLongDecimalValueWriter.java
Outdated
Show resolved
Hide resolved
...ing/trino-product-tests/src/main/java/io/trino/tests/product/hive/TestHiveCompatibility.java
Outdated
Show resolved
Hide resolved
Contributor
There was a problem hiding this comment.
This code snipped appears to be repeating (with slight modifications).
Consider creating a method for code reuse.
8b78956 to
62239be
Compare
Member
|
(rebased because of a conflict) |
8aa5304 to
9ad449b
Compare
Avoids wasting space by always using 16 bytes even though lower precision values can be stored using fewer bytes.
9ad449b to
90d7979
Compare
findepi
reviewed
Jun 10, 2022
.../main/java/io/trino/parquet/writer/valuewriter/FixedLenByteArrayShortDecimalValueWriter.java
Outdated
Show resolved
Hide resolved
.../main/java/io/trino/parquet/writer/valuewriter/FixedLenByteArrayShortDecimalValueWriter.java
Outdated
Show resolved
Hide resolved
.../main/java/io/trino/parquet/writer/valuewriter/FixedLenByteArrayShortDecimalValueWriter.java
Outdated
Show resolved
Hide resolved
...c/main/java/io/trino/parquet/writer/valuewriter/FixedLenByteArrayLongDecimalValueWriter.java
Outdated
Show resolved
Hide resolved
findepi
reviewed
Jun 10, 2022
...c/main/java/io/trino/parquet/writer/valuewriter/FixedLenByteArrayLongDecimalValueWriter.java
Outdated
Show resolved
Hide resolved
findepi
reviewed
Jun 10, 2022
.../main/java/io/trino/parquet/writer/valuewriter/FixedLenByteArrayShortDecimalValueWriter.java
Outdated
Show resolved
Hide resolved
Member
|
Build is green (https://github.com/trinodb/trino/runs/6828591664?check_suite_focus=true). squashing and merging. |
Delta Lake will continue to use integer encoding for short decimals. This change fixes compatbility with Apache Hive which expects decimals to be encoded as fixed length byte arrays.
3f5ab6b to
6229d51
Compare
Closed
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description
Apache Hive fails to read short decimals encoded as INT32/INT64 by the optimized parquet hive writer in Trino.
This PR changes the native writer to use fixed length byte arrays for encoding short decimals when writing parquet files in hive connector. This makes the output readable by Apache Hive.
Fix
Parquet writer
Makes output of optimized parquet writer for short decimals compatible with Apache Hive.
Related issues, pull requests, and links
#6377
#10486
Documentation
(x) No documentation is needed.
( ) Sufficient documentation is included in this PR.
( ) Documentation PR is available with #prnumber.
( ) Documentation issue #issuenumber is filed, and can be handled later.
Release notes
( ) No release notes entries required.
(x) Release notes entries required with the following suggested text: