Skip to content

Conversation

@aokolnychyi
Copy link
Contributor

@aokolnychyi aokolnychyi commented Sep 19, 2023

This PR drops count metrics for the file and position columns in position delete files. This allows us to reduce the size of persisted delete metadata and to speed up distributed planning where the stats are serialized. Such stats are redundant as the spec does not permit null values in those columns and we have recordCount if the number of records must be known.

We still keep counts for the data columns if deleted rows are persisted. We also keep column sizes for the file and position columns.

* @param excludedFieldIds field IDs for which the counts must be dropped
* @return a new metrics object without counts for given fields
*/
public static Metrics copyWithoutFieldCounts(Metrics metrics, Set<Integer> excludedFieldIds) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I considered creating a builder for coping metrics but I am not sure it would be worth the extra complexity and it is not clear whether there would be future use cases benefiting from it. For now, I simply added another util method.

@aokolnychyi aokolnychyi added this to the Iceberg 1.4.0 milestone Sep 19, 2023
@aokolnychyi
Copy link
Contributor Author

@aokolnychyi aokolnychyi merged commit da2ad38 into apache:master Sep 20, 2023
@aokolnychyi
Copy link
Contributor Author

Thanks, @Fokko @RussellSpitzer!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants