Skip to content

Conversation

@ajantha-bhat
Copy link
Member

@ajantha-bhat ajantha-bhat commented May 1, 2025

a) Deprecate PartitionStatsHandler in iceberg-data
b) Copy PartitionStatsHandler from iceberg-data to iceberg-core and Use InternalData( also package name had to change to use PartitionStatsUtill logic as private code). So, users can add dependency on core and parquet instead of whole data module to use partition stats feature.
c) Deprecate PartitionStatsUtil as no need of too many public interface and we can use that logic as private inside PartitionStatsHandler (comment during #12629)
d) Deprecate TestPartitionStatsUtil as the functionality test is covered from TestPartitionStatsHandler
e) Move benchmarks to iceberg-data as InternalData needs parquet dependency.

* Computes, writes and reads the {@link PartitionStatisticsFile}. Uses generic readers and writers
* to support writing and reading of the stats in table default format.
*/
public class PartitionStatsHandler {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This class was just copied from iceberg-data module (deprecated now) and replace datawriter and reader function with InternalData

return stats;
}

private static Collection<PartitionStats> computeStats(Table table, Snapshot snapshot) {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copied the methods from partitionStatsUtil (perviously public, now deprecated) as got a comment that no need to keep too many public interfaces and compute logic can be internal to compute and write.

@deniskuzZ
Copy link
Member

hi @ajantha-bhat, do you think it would make sense to rename PartitionStatsHandler to PartitionStatsUtil since it's a util class with only static methods?

@ajantha-bhat
Copy link
Member Author

ajantha-bhat commented May 1, 2025

hi @ajantha-bhat, do you think it would make sense to rename PartitionStatsHandler to PartitionStatsUtil since it's a util class with only static methods?

@deniskuzZ: Thanks for the suggestion. Since we already released the public interface as PartitionStatsHandler, I would like to continue this naming. Plus in the talks and blogs we are already using this name.

Copy link
Member

@deniskuzZ deniskuzZ left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM +1
thanks @pvary for the proposal and @ajantha-bhat for refactoring and merging the partition stats functionality. Looking forward for the increment partition stats compute support.

@ajantha-bhat
Copy link
Member Author

@pvary: PR is ready for review.

Copy link
Contributor

@advancedxy advancedxy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM on my side, looking forward to the incremental support.

@github-actions github-actions bot added the ORC label May 6, 2025
@ajantha-bhat
Copy link
Member Author

Rebased the PR due to spark flaky test.

@ajantha-bhat
Copy link
Member Author

@pvary and @gaborkaszab: Anymore comments for this? Thanks for the review.

Copy link
Collaborator

@gaborkaszab gaborkaszab left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the changes @ajantha-bhat !
I left one minor comment, but other than that the PR LGTM!

@pvary pvary merged commit bed88a3 into apache:main May 8, 2025
42 checks passed
@pvary
Copy link
Contributor

pvary commented May 8, 2025

Merged to main.
Thanks for the PR @ajantha-bhat and @deniskuzZ, @advancedxy, @gaborkaszab for the reviews!

anuragmantri added a commit to anuragmantri/iceberg that referenced this pull request Jul 25, 2025
devendra-nr pushed a commit to devendra-nr/iceberg that referenced this pull request Dec 8, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants