-
Notifications
You must be signed in to change notification settings - Fork 2.9k
Move to Parquet 1.12.0 #2441
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Move to Parquet 1.12.0 #2441
Conversation
|
For #2582 add support for parquet bloom filter, we need to bump parquet version |
|
we also need this for #2639 |
@jshmchenxi - I mostly meant that it should be considered by some of the people who work closest with parquet (and also it should be ensured that it works with the more recent Spark 2.4.x and Spark 3.x versions we support). Fortunately, @\ggershinsky is a very active Parquet committer (or possibly PMC now?) and has been doing some of the very hard work in parquet column level encryption, so if Gideon says we need this, then that's likely enough people to speak to. @\Fokko is also a parquet commiter. Additionally, one of the PMCs / Vice President of this project, @rdblue is also a parquet PMC (in addition to a PMC or committer on what feels like every project and ), so we're in good hands in my book @jshmchenxi 👍. But thank you for considering my input. Just trying to make sure things stay sane, since it seems that some of the changes you've been putting up might not have been discussed on the dev mailing list (though I could totally have missed it). So I'm just ensuring that the right folks see potentially large changes. |
|
@kbendick Sorry for the late reply, I've been busy with work. I'm new to Iceberg and should learn more. And thanks for telling me the right persons to review these! |
|
@ggershinsky @Fokko @rdblue Hi, shold we be moving on to parquet 1.12.0 for iceberg now? Or maybe wait for a new version like parquet 1.12.1? |
|
This looks fine to me, but there's a conflict in versions.props and I guess 1.12.1 is out. @Fokko, can you update this? |
|
@rdblue Sure thing! |
|
Looks like there is a problem finding On maven central, it looks like 1.12.0 is the latest Parquet version. I misinterpreted @jshmchenxi's comment and though that it was released. Can you roll back to 1.12.0? |
No worries. Like I mentioned, this was more of a concern due to some other things I knew were going on. But it's not a concern now. |
|
Thanks, @Fokko! I merged this. |
Merge remote-tracking branch 'upstream/merge-master-20210816' into master ## 该MR主要解决什么? merge upstream/master,引入最近的一些bugFix和优化 ## 该MR的修改是什么? 核心关注PR: > Predicate PushDown 支持,https://github.com/apache/iceberg/pull/2358, https://github.com/apache/iceberg/pull/2926, https://github.com/apache/iceberg/pull/2777/files > Spark场景写入空dataset 报错问题,直接skip掉即可, apache#2960 > Flink UI补充uidPrefix到operator方便跟踪多个iceberg sink任务, apache#288 > Spark 修复nested Struct Pruning问题, apache#2877 > 可以使用Table Properties指定创建v2 format表,apache#2887 > 补充SortRewriteStrategy框架,逐步支持不同rewrite策略, apache#2609 (WIP:apache#2829) > Spark 为catalog配置hadoop属性支持, apache#2792 > Spark 针对timestamps without timezone读写支持, apache#2757 > Spark MicroBatch支持配置属性skip delete snapshots, apache#2752 > Spark V2 RewriteDatafilesAction 支持 > Core: Add validation for row-level deletes with rewrites, apache#2865 > schema time travel 功能相关,补充schema-id, Core: add schema id to snapshot > Spark Extension支持identifier fields操作, apache#2560 > Parquet: Update to 1.12.0, apache#2441 > Hive: Vectorized ORC reads for Hive, apache#2613 > Spark: Add an action to remove all referenced files, apache#2415 ## 该MR是如何测试的? UT
Changelog:
https://github.com/apache/parquet-mr/blob/master/CHANGES.md#version-1120