Change native parquet writer to write v1 parquet files#9611
Change native parquet writer to write v1 parquet files#9611findepi merged 1 commit intotrinodb:masterfrom
Conversation
testing/trino-product-tests/src/main/java/io/trino/tests/product/hive/TestHiveCompression.java
Outdated
Show resolved
Hide resolved
...rino-product-tests/src/main/java/io/trino/tests/product/hive/TestHiveSparkCompatibility.java
Outdated
Show resolved
Hide resolved
|
i rerun CI with |
|
There are V2 headers and statistics written in https://github.com/trinodb/trino/blob/master/lib/trino-parquet/src/main/java/io/trino/parquet/writer/PrimitiveColumnWriter.java I'd assume those also need to be changed? |
|
@alexjo2144 If you mean specifically this, I mentioned in #9497 that Spark fails to read a V1 header. Statistics are the same between V1 and V2 according to https://github.com/apache/parquet-format/blob/master/src/main/thrift/parquet.thrift. |
|
Spark is probably failing to read the V1 header because something else needs to be updated. There's a comment here implying that this change is not enough from @anjalinorwood #7953 (comment) Maybe she or @rdblue can point out the spec differences? There's also the EncodingStats I added this week which uses V1 also included the |
|
Comparing the differences between |
65dbf87 to
212f297
Compare
losipiuk
left a comment
There was a problem hiding this comment.
Looks valid. Though I did not follow deeply discussion which led it this stage, so I not comfortable ✅ ing it
findepi
left a comment
There was a problem hiding this comment.
skimmed. lgtm but i'm not a Parquet expert.
lib/trino-parquet/src/main/java/io/trino/parquet/writer/PrimitiveColumnWriter.java
Outdated
Show resolved
Hide resolved
lib/trino-parquet/src/main/java/io/trino/parquet/writer/PrimitiveColumnWriter.java
Outdated
Show resolved
Hide resolved
|
@alexjo2144 @martint @rdblue would you like to take a look? |
212f297 to
df54132
Compare
|
I'm adding the release blocker label so that this gets actually merged prior to the release. |
|
Thanks @alexjo2144 @martint for the review. @joshthoward it cannot be considered a release blocker. |
Cherry-pick of trinodb/trino#9497 and trinodb/trino#9611 co-authored by Josh Howard <joshthoward@gmail.com>
Cherry-pick of trinodb/trino#9497 and trinodb/trino#9611 co-authored by Josh Howard <joshthoward@gmail.com>
Cherry-pick of trinodb/trino#9497 and trinodb/trino#9611 co-authored by Josh Howard <joshthoward@gmail.com>
This PR instead adding a toggle as in #9497, this PR has the native parquet writer write v1 files by default.
It's a longer discussion to determine if we want to support both v1 and v2; this just fixes the known bugs.
fixes #6377