Support data_size when analyzing in Delta Lake#12814
Conversation
f6a66fa to
64ff015
Compare
|
CI hit #12818 |
plugin/trino-delta-lake/src/main/java/io/trino/plugin/deltalake/DeltaLakeMetadata.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
It doesn't make sense to calculate TOTAL_SIZE_IN_BYTES for eg numbers, since engine doesn't use this value for fixed-width data types.
There was a problem hiding this comment.
also, if we decide we cannot merge existing stats with no data size (see other commnent), we should ask engine to collect them
There was a problem hiding this comment.
if we decide we cannot merge existing stats with no data size (see other commnent), we should ask engine to collect them
I don't understand how to achieve this. Could you share the details?
There was a problem hiding this comment.
In io.trino.plugin.deltalake.DeltaLakeMetadata#getStatisticsCollectionMetadata we already read the ExtendedStatistics. We can check whether we have data size for selected columns.
If we don't, we just don't create ColumnStatisticMetadata asking to collect TOTAL_SIZE_IN_BYTES
There was a problem hiding this comment.
Don't invoke this when computedStatistics.containsKey(NUMBER_OF_DISTINCT_VALUES_SUMMARY), since this may be somewhat expensive allocation.
BTW what is the case when ! computedStatistics.containsKey(NUMBER_OF_DISTINCT_VALUES_SUMMARY)?
We ask engine to calculate the HLL, so we may expect it to be present, right?
plugin/trino-delta-lake/src/main/java/io/trino/plugin/deltalake/DeltaLakeMetadata.java
Outdated
Show resolved
Hide resolved
plugin/trino-delta-lake/src/main/java/io/trino/plugin/deltalake/DeltaLakeMetadata.java
Outdated
Show resolved
Hide resolved
...delta-lake/src/main/java/io/trino/plugin/deltalake/statistics/DeltaLakeColumnStatistics.java
Outdated
Show resolved
Hide resolved
...lake/src/test/java/io/trino/plugin/deltalake/metastore/TestDeltaLakeMetastoreStatistics.java
Outdated
Show resolved
Hide resolved
...lake/src/test/java/io/trino/plugin/deltalake/metastore/TestDeltaLakeMetastoreStatistics.java
Outdated
Show resolved
Hide resolved
|
@losipiuk PTAL |
|
There's a version number in |
We could, but we would need to update the logic here (since the change is quite backwards compatible, i though we're going to leave the current number as is, but maybe i under-appreciate some consequences of doing so) |
|
I didn't increase the number since I thought this is backward compatible as @findepi already said. Let me know if we should increment the number. |
a24426a to
4cd8683
Compare
plugin/trino-delta-lake/src/main/java/io/trino/plugin/deltalake/DeltaLakeMetadata.java
Outdated
Show resolved
Hide resolved
4cd8683 to
df757b8
Compare
|
CI hit #12858 |
Description
Add support for
data_sizewhen analyzing in Delta LakeDocumentation
(x) Sufficient documentation is included in this PR.
Release notes
(x) Release notes entries required with the following suggested text: