-
Notifications
You must be signed in to change notification settings - Fork 2.9k
Core - Upgrade Parquet to 1.12.3 #4951
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
@kbendick, what's the status of this? Should Iceberg set |
Yes I believe we should set that configuration by default. Either by upgrading the patch version of parquet, or setting it ourselves. If we choose to set it ourselves and not upgrade the parquet library, I'll do a quick pass over the PRs added between 1.12.2 and 1.12.3 to make sure it's not somehow unsafe to add. I think upgrading the parquet patch version would likely be better in the longer term. |
|
For reference, 1.12.3 is the default parquet version in Spark master, although Spark 3.3.0 still uses Parquet 1.12.2. |
.../v3.2/spark/src/jmh/java/org/apache/iceberg/spark/source/IcebergSourceFlatDataBenchmark.java
Outdated
Show resolved
Hide resolved
|
We might also want to bump the Avro version to match what’s used in parquet 1.12.3. |
Is that safe? What is the version change? |
2ae2c7c to
6896510
Compare
The version change is from 1.10.1 to 1.10.2. apache/parquet-java@d96b19b We are on 1.10.1, so we would be following the same upgrade path. I'm not sure which other dependencies rely on avro (I imagine many of them do). |
I have spoken to some users with very high cardinality tables (potentially over 1000 columns) who have told me that enabling the buffer pool via this configuration has resolved OOMs for them, so I believe it will be beneficial to all users. |
|
The Hive tests are failing due to the lack of a method. |
|
Looks like Hive might be bringing in a different copy of Parquet and that is conflicting in the test. We should be able to exclude Hive's Parquet version to work around this. |
I'll give that a try. |
|
Closed in favor of #5188, which gets this working. |
This patch upgrades Parquet from 1.12.2 to 1.12.3.
The change-log between the two can be found here: apache/parquet-java@apache-parquet-1.12.2...apache-parquet-1.12.3
A few notes of particular interest:
parquet.compression.codec.zstd.bufferPool.enabledto true by default