Use REQUIRED repetition level for MAP keys in parquet writer#12808
Use REQUIRED repetition level for MAP keys in parquet writer#12808findepi merged 2 commits intotrinodb:masterfrom
Conversation
22a9ce4 to
2e1d53f
Compare
lib/trino-parquet/src/main/java/io/trino/parquet/writer/ParquetSchemaConverter.java
Outdated
Show resolved
Hide resolved
As per parquet spec, MAP key should be REQUIRED
2e1d53f to
5155e86
Compare
|
Does this affect any connector? how? |
It affects the parquet schema produced by hive and delta lake connectors, makes it compliant with the parquet spec defined for MAP type at https://github.com/apache/parquet-format/blob/master/LogicalTypes.md#maps. |
I agree it is better to adhere to the Parquet spec rather than not to. OTOH, in #12658 @raunaqmorarka explicitly departed from Parquet spec to fix compatibility bug that manifested in real-life scenarios. This suggests the real-life Parquet readers or writers not always are 100% spec compliant. So, what are the real-world consequences of this change? |
They are always not compliant. BTW, what does OTOH stands for? |
"On the other hand" |
I need to know this before i can reasonably approve or disapprove the change. |
I didn't see an issue with hive or spark compatibility when testing it. |
|
Okay, so we want to do this to adhere better to the spec, which is a good thing. Thanks for clarifying. I will tag as no-rn then. |
As per parquet spec, MAP key should be REQUIRED
Fix
Optimized parquet writer
Parquet files produced by optimized writer will comply with spec at https://github.com/apache/parquet-format/blob/master/LogicalTypes.md#maps
Documentation
(x) No documentation is needed.
( ) Sufficient documentation is included in this PR.
( ) Documentation PR is available with #prnumber.
( ) Documentation issue #issuenumber is filed, and can be handled later.
Release notes
(x) No release notes entries required.
( ) Release notes entries required with the following suggested text: