Skip to content

Conversation

@Fokko
Copy link
Contributor

@Fokko Fokko commented Apr 8, 2021

@github-actions github-actions bot added the build label Apr 8, 2021
@Fokko Fokko force-pushed the fd-bump-parquet-112 branch from 8b6e247 to 5117f87 Compare May 4, 2021 10:03
@jshmchenxi
Copy link
Contributor

For #2582 add support for parquet bloom filter, we need to bump parquet version

@jshmchenxi
Copy link
Contributor

@kbendick Hi, you mentioned in the comment that spark versions should be concerned with parquet version upgrade. Who do you think would be appropriate to review this? The tests all passed though.

@ggershinsky
Copy link
Contributor

we also need this for #2639

@kbendick
Copy link
Contributor

@kbendick Hi, you mentioned in the comment that spark versions should be concerned with parquet version upgrade. Who do you think would be appropriate to review this? The tests all passed though.

@jshmchenxi - I mostly meant that it should be considered by some of the people who work closest with parquet (and also it should be ensured that it works with the more recent Spark 2.4.x and Spark 3.x versions we support).

Fortunately, @\ggershinsky is a very active Parquet committer (or possibly PMC now?) and has been doing some of the very hard work in parquet column level encryption, so if Gideon says we need this, then that's likely enough people to speak to. @\Fokko is also a parquet commiter.

Additionally, one of the PMCs / Vice President of this project, @rdblue is also a parquet PMC (in addition to a PMC or committer on what feels like every project and ), so we're in good hands in my book @jshmchenxi 👍. But thank you for considering my input. Just trying to make sure things stay sane, since it seems that some of the changes you've been putting up might not have been discussed on the dev mailing list (though I could totally have missed it). So I'm just ensuring that the right folks see potentially large changes.

@jshmchenxi
Copy link
Contributor

@kbendick Sorry for the late reply, I've been busy with work. I'm new to Iceberg and should learn more. And thanks for telling me the right persons to review these!

@jshmchenxi
Copy link
Contributor

@ggershinsky @Fokko @rdblue Hi, shold we be moving on to parquet 1.12.0 for iceberg now? Or maybe wait for a new version like parquet 1.12.1?

@rdblue
Copy link
Contributor

rdblue commented Jun 12, 2021

This looks fine to me, but there's a conflict in versions.props and I guess 1.12.1 is out. @Fokko, can you update this?

@Fokko
Copy link
Contributor Author

Fokko commented Jun 14, 2021

@rdblue Sure thing!

@rdblue
Copy link
Contributor

rdblue commented Jun 15, 2021

Looks like there is a problem finding parquet-avro:

Execution failed for task ':iceberg-parquet:compileJava'.
> Could not resolve all files for configuration ':iceberg-parquet:compileClasspath'.
   > Could not find org.apache.parquet:parquet-avro:1.12.1.
     Searched in the following locations:
       - https://repo.maven.apache.org/maven2/org/apache/parquet/parquet-avro/1.12.1/parquet-avro-1.12.1.pom
       - https://repo.maven.apache.org/maven2/org/apache/parquet/parquet-avro/1.12.1/parquet-avro-1.12.1.jar
       - file:/home/runner/.m2/repository/org/apache/parquet/parquet-avro/1.12.1/parquet-avro-1.12.1.pom
       - file:/home/runner/.m2/repository/org/apache/parquet/parquet-avro/1.12.1/parquet-avro-1.12.1.jar
     Required by:
         project :iceberg-parquet

On maven central, it looks like 1.12.0 is the latest Parquet version. I misinterpreted @jshmchenxi's comment and though that it was released. Can you roll back to 1.12.0?

@kbendick
Copy link
Contributor

@kbendick Sorry for the late reply, I've been busy with work. I'm new to Iceberg and should learn more. And thanks for telling me the right persons to review these!

No worries. Like I mentioned, this was more of a concern due to some other things I knew were going on. But it's not a concern now.

@rdblue rdblue merged commit 2366154 into apache:master Jun 16, 2021
@rdblue
Copy link
Contributor

rdblue commented Jun 16, 2021

Thanks, @Fokko! I merged this.

chenjunjiedada pushed a commit to chenjunjiedada/incubator-iceberg that referenced this pull request Oct 20, 2021
Merge remote-tracking branch 'upstream/merge-master-20210816' into master
## 该MR主要解决什么?

merge upstream/master,引入最近的一些bugFix和优化

## 该MR的修改是什么?

核心关注PR:
> Predicate PushDown 支持,https://github.com/apache/iceberg/pull/2358, https://github.com/apache/iceberg/pull/2926, https://github.com/apache/iceberg/pull/2777/files
> Spark场景写入空dataset 报错问题,直接skip掉即可, apache#2960
> Flink UI补充uidPrefix到operator方便跟踪多个iceberg sink任务, apache#288
> Spark 修复nested Struct Pruning问题, apache#2877
> 可以使用Table Properties指定创建v2 format表,apache#2887
> 补充SortRewriteStrategy框架,逐步支持不同rewrite策略, apache#2609 (WIP:apache#2829)
> Spark 为catalog配置hadoop属性支持, apache#2792
> Spark 针对timestamps without timezone读写支持, apache#2757
> Spark MicroBatch支持配置属性skip delete snapshots, apache#2752
> Spark V2 RewriteDatafilesAction 支持
> Core: Add validation for row-level deletes with rewrites, apache#2865 > schema time travel 功能相关,补充schema-id, Core: add schema id to snapshot 
> Spark Extension支持identifier fields操作, apache#2560
> Parquet: Update to 1.12.0, apache#2441
> Hive: Vectorized ORC reads for Hive, apache#2613
> Spark: Add an action to remove all referenced files, apache#2415

## 该MR是如何测试的?

UT
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants