-
Notifications
You must be signed in to change notification settings - Fork 1.4k
feat: Support Iceberg clustered writer #14670
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
feat: Support Iceberg clustered writer #14670
Conversation
Co-authored-by [email protected]
✅ Deploy Preview for meta-velox canceled.
|
65e8480 to
dc64f0a
Compare
dc64f0a to
298c0c4
Compare
Co-authored-by: Chengcheng Jin <[email protected]>
298c0c4 to
f02ffce
Compare
Add Protobuf struct IcebergPartitionField to transfer the iceberg id information, add IcebergPartitionSpec to transfer partition information. Build with test and benchmark in CI and fix IcebergWriteTest build. Set the file format to orc to bypass native parquet write for partitioned tpch iceberg suite, after facebookincubator/velox#14670 which supports fanout false mode merged, we can relax the restriction. Relevant PR: facebookincubator/velox#13874
| std::unique_ptr<DataFileStatsCollector> icebergStatsCollector_; | ||
|
|
||
| // Below are structures for clustered mode writer. | ||
| const bool fanoutEnabled_; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Move const member upfront
|
@PingLiuPing Can you add more details on the iceberg clustered writes?
In the current implementation, who handles the clustering and partitioning? |
|
@majetideepak , thanks for the comment.
It should be processed by the caller to guarantee the input data is partitioned first. For example, for identity partition transform (same with Hive), and suppose the partition column type is integer. The valid input stream is 1,1,1,2,2,2,3,3,3. But if the input stream is 1,1,1,2,2,1, then the cluster mode will report error. |
|
I now see that this PR corresponds to the last commit and the other commits are part of other PRs. |
|
|
||
| namespace facebook::velox::connector::hive::iceberg { | ||
|
|
||
| struct IcebergNestedField { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ColumnIdentity. Both Presto and Trino are using this name.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@yingsu00 Thanks for the comments. This PR contains 7 commits but only the last commit is for clustered writer. I have to do this to pass the CI because it depends the previous commits but they are not merged.
Sorry for the confuse.
|
|
||
| std::function<IcebergDataFileStatsSettings( | ||
| const IcebergNestedField&, const TypePtr&, bool)> | ||
| buildNestedField = [&](const IcebergNestedField& f, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it be possible to make buildNestedField a private function instead of a lambda here? This is hard to read.
| bool skipBounds) -> IcebergDataFileStatsSettings { | ||
| VELOX_CHECK_NOT_NULL(type, "Input column type cannot be null."); | ||
| bool currentSkipBounds = skipBounds || type->isMap() || type->isArray(); | ||
| IcebergDataFileStatsSettings field(f.id, currentSkipBounds); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The name "field" is confusing. Can we just name it statsSettings?
|
@yingsu00 Thanks for the comments, I will address your comments when split these huge PRs to smaller pieces. |
There are two writer mode supported by iceberg, one is fanout (default), the other one clustered.
This PR implements clustered writer mode.
When using clustered input mode, the input data is assumed to be clustered/partitioned beforehand.