Add function names to ColumnStatisticType#20871
Closed
ZacBlanco wants to merge 1 commit intoprestodb:masterfrom
Closed
Add function names to ColumnStatisticType#20871ZacBlanco wants to merge 1 commit intoprestodb:masterfrom
ZacBlanco wants to merge 1 commit intoprestodb:masterfrom
Conversation
64de279 to
f86ff8c
Compare
This change adds function name parameters to the enum values of ColumnStatisticType. The function names are used when generating ColumnStatisticMetadata for the ANALYZE query. This change is in preparation to allow connectors to override the function used to execute the statistics aggregations. Mainly it is to support connectors that have differing underlying histogram or NDV representations. For example, Hive natively generates KLL sketches from Apache Datasketches for histograms, Spark has it's own custom format. Some other connectors like mysql use a JSON format. Iceberg tables can store NDV estimates in Apache DataSketches Theta sketches. If we intend to support a variety of stats from other connectors this change is necessary. Allowing the connector to override the function for each statistic type will let us generate statistic data in the format needed by the connector in order to store it in the connector-specific catalog metadata.
f86ff8c to
2e318ec
Compare
Contributor
Author
|
Closing this in lieu of #20993 |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description
This change adds function name parameters to the enum values of ColumnStatisticType. The function names are used when generating ColumnStatisticMetadata for the ANALYZE query.
This change is in preparation to allow connectors to override the function used to execute the statistics aggregations. Mainly it is to support connectors that have differing underlying histogram or NDV representations. For example, Hive natively generates KLL sketches from Apache Datasketches for histograms, Spark has it's own custom format. Some other connectors like mysql use a JSON format. Iceberg tables can store NDV estimates in Apache DataSketches Theta sketches. If we intend to support a variety of stats from other connectors this change is necessary.
Allowing the connector to override the function for each statistic type will let us generate statistic data in the format needed by the connector in order to store it in the connector-specific catalog metadata.
Motivation and Context
Eventual implementation of more column statistics
Impact
No user facing impact
Test Plan
No additional features need to be tested
Contributor checklist
Release Notes