Skip to content

Conversation

@PingLiuPing
Copy link
Collaborator

Support Iceberg partition transforms.

Reviewers, please ignore the first commit, that commit is from #10996 and is not merged yet.

@PingLiuPing PingLiuPing self-assigned this Jun 25, 2025
@netlify
Copy link

netlify bot commented Jun 25, 2025

Deploy Preview for meta-velox canceled.

Name Link
🔨 Latest commit f5172db
🔍 Latest deploy log https://app.netlify.com/projects/meta-velox/deploys/68b6cd348c0b630008c23a56

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Jun 25, 2025
@PingLiuPing PingLiuPing force-pushed the lp_iceberg_partition_transforms branch from 7922c5e to 290f28c Compare June 25, 2025 01:23
@PingLiuPing
Copy link
Collaborator Author

CC @zhouyuan

@PingLiuPing PingLiuPing removed the request for review from majetideepak June 25, 2025 06:15
@PingLiuPing PingLiuPing force-pushed the lp_iceberg_partition_transforms branch 3 times, most recently from 4d8b073 to fdccd55 Compare June 27, 2025 15:20
@jinchengchenghh
Copy link
Collaborator

Please copy this implementation for bucket transform. #13174

@PingLiuPing PingLiuPing force-pushed the lp_iceberg_partition_transforms branch from fdccd55 to 0f52d96 Compare July 1, 2025 21:05
@PingLiuPing PingLiuPing force-pushed the lp_iceberg_partition_transforms branch 2 times, most recently from 3b58d4b to 1e587b0 Compare July 2, 2025 21:49
@PingLiuPing PingLiuPing force-pushed the lp_iceberg_partition_transforms branch from 1e587b0 to 9ec820f Compare July 3, 2025 22:10
Copy link
Collaborator

@jinchengchenghh jinchengchenghh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only a small nits

@PingLiuPing PingLiuPing force-pushed the lp_iceberg_partition_transforms branch 5 times, most recently from c6b2574 to 39a67d4 Compare August 6, 2025 10:16
facebook-github-bot pushed a commit that referenced this pull request Aug 7, 2025
Summary:
The iceberg hash use mumur3 hash, which aligns with https://github.com/aappleby/smhasher/blob/master/src/MurmurHash3.cpp, firstly, process every 4 bytes as a chunk, then process remaining bytes by XOR, sparksql also uses this hash algorithm but is different with processing remaining bytes, which combine the remaining bytes. Extract the common function hashInt64 to functions/lib.

This class will be used for iceberg bucket transform and bucket function.
The iceberg mumur3 hash should be strictly with java implementation, then write by iceberg could read with iceberg Java, and the function call can also get the correct result.
The iceberg utility lib `velox_functions_iceberg_hash` will be linked by iceberg connector write to do partition transform. #13874

Pull Request resolved: #14025

Reviewed By: pedroerp

Differential Revision: D79732785

Pulled By: kgpai

fbshipit-source-id: 6122b94673f015dca5c8484722926709a30fe65e
@PingLiuPing PingLiuPing force-pushed the lp_iceberg_partition_transforms branch 2 times, most recently from ebf1437 to 7b14ccc Compare August 14, 2025 10:07
@PingLiuPing PingLiuPing force-pushed the lp_iceberg_partition_transforms branch from 7b14ccc to ff677ae Compare August 18, 2025 12:28
@PingLiuPing PingLiuPing force-pushed the lp_iceberg_partition_transforms branch 4 times, most recently from 39310bd to 9dfefb7 Compare August 21, 2025 10:38
@PingLiuPing PingLiuPing force-pushed the lp_iceberg_partition_transforms branch 2 times, most recently from a7af822 to a04c29e Compare August 22, 2025 13:28
@PingLiuPing PingLiuPing force-pushed the lp_iceberg_partition_transforms branch from a04c29e to e0ae168 Compare September 1, 2025 19:41
@PingLiuPing PingLiuPing force-pushed the lp_iceberg_partition_transforms branch from e0ae168 to f5172db Compare September 2, 2025 10:55
wypb pushed a commit to wypb/velox that referenced this pull request Sep 3, 2025
Summary:
The iceberg hash use mumur3 hash, which aligns with https://github.com/aappleby/smhasher/blob/master/src/MurmurHash3.cpp, firstly, process every 4 bytes as a chunk, then process remaining bytes by XOR, sparksql also uses this hash algorithm but is different with processing remaining bytes, which combine the remaining bytes. Extract the common function hashInt64 to functions/lib.

This class will be used for iceberg bucket transform and bucket function.
The iceberg mumur3 hash should be strictly with java implementation, then write by iceberg could read with iceberg Java, and the function call can also get the correct result.
The iceberg utility lib `velox_functions_iceberg_hash` will be linked by iceberg connector write to do partition transform. facebookincubator#13874

Pull Request resolved: facebookincubator#14025

Reviewed By: pedroerp

Differential Revision: D79732785

Pulled By: kgpai

fbshipit-source-id: 6122b94673f015dca5c8484722926709a30fe65e
jinchengchenghh added a commit to apache/incubator-gluten that referenced this pull request Sep 3, 2025
Add Protobuf struct IcebergPartitionField to transfer the iceberg id information, add IcebergPartitionSpec to transfer partition information.
Build with test and benchmark in CI and fix IcebergWriteTest build.
Set the file format to orc to bypass native parquet write for partitioned tpch iceberg suite, after facebookincubator/velox#14670 which supports fanout false mode merged, we can relax the restriction.

Relevant PR: facebookincubator/velox#13874
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. iceberg

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants