Support arbitrary aggregation functions during ANALYZE (v2)#14233
Support arbitrary aggregation functions during ANALYZE (v2)#14233findepi merged 5 commits intotrinodb:masterfrom
Conversation
b8908f1 to
40b246d
Compare
`ColumnStatisticMetadata` is used in `StatisticAggregationsDescriptor` as a map key. Before the change, a hand-written serialization was used for that. After the change, the map is replaced with a list of key/value pairs for the purpose of the serialization.
The aggregation function result type is known, it doesn't need to be given.
40b246d to
63a179f
Compare
losipiuk
left a comment
There was a problem hiding this comment.
This looks nice. Should I look at the other one?
|
I think you don't need to, currently. |
|
CI #14239 |
|
Can all the existing |
|
@alexjo2144 this is exactly what i did initially, i.e. in #14220
however
Thus
|
There was a problem hiding this comment.
This seems to be a bug:
Optional<String> s1 = Optional.of("hello ");
String s2 = "world!";
System.out.println(s1 + s2);
outputs
Optional[hello ]world!
There was a problem hiding this comment.
Is this functionality covered by any tests?
There was a problem hiding this comment.
No, there are no fine-grained tests for EXPLAIN.
There was a problem hiding this comment.
nit: change the name of the constant so that it doesn't collide with ColumnStatisticType.NUMBER_OF_DISTINCT_VALUES
The `ColumnStatisticType` enum was defining what is possible to collect during statistics collection. While looking generic, the chosen options matched exactly what stats Hive metastore collects. Different metadata storages may require different statistics to be collected, for example data sketches with some specific configuration. This change allows a connector to pick any existing aggregation function.
bf4d570 to
30b39e1
Compare
(an alternative to #14220, maintaining backwards compatibility)
A connector may ask engine to collect anything defined by
ColumnStatisticTypeSPI enum. This is convenient, but sometimes a connector needs to provide its own way of calculating statistics.For example, Iceberg statistics include
This has two components which are not supported today
This PR addresses the first limitation. It allows the connector to pick an aggregation function of its choice for statistics collection.