Expose nan_count in $partitions metadata table#10709
Merged
losipiuk merged 3 commits intotrinodb:masterfrom Apr 4, 2022
Merged
Expose nan_count in $partitions metadata table#10709losipiuk merged 3 commits intotrinodb:masterfrom
nan_count in $partitions metadata table#10709losipiuk merged 3 commits intotrinodb:masterfrom
Conversation
b5575a8 to
7761c00
Compare
3adbbe7 to
dca26c4
Compare
dca26c4 to
6064f0e
Compare
6064f0e to
4c44a5f
Compare
Contributor
Author
|
Rebased on |
Nan values relate solely to double statistics. When dealing with nan values, there can't be delivered any range statistics about the data. Therefore a new field `nanValueCount` has been introduced in the `ColumnStatistics` to deal with this situation.
4c44a5f to
a9586b3
Compare
losipiuk
reviewed
Mar 25, 2022
| BooleanStatistics booleanStatistics, | ||
| IntegerStatistics integerStatistics, | ||
| DoubleStatistics doubleStatistics, | ||
| Long numberOfNanValues, |
Member
There was a problem hiding this comment.
Why not put it inside DoubleStatistics?
Member
There was a problem hiding this comment.
Feels you coud keep min/max as null if there are NaNs but keep nans count inside object. Woudl that not work?
Contributor
Author
There was a problem hiding this comment.
That was my initial intention as well.
However the DoubleStatisticsBuilder doesn't allow dealing with NaNs
cc @dain
Contributor
Author
|
@losipiuk CPTAL ? |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description
Expose
nan_countpartition metadata information in the$partitionsmetadata table.This is a new feature added to be consistent with the information exposed by Iceberg partitions table.
This change is targeted primarily at the Iceberg connector.
However, because this change requires exposing new information from
trino-orcmodule, it may (although it shouldn't) affect other Trino functionality which reads/writes ORC files.This change adds new metadata information about REAL, DOUBLE columns in Iceberg $partitions metadata table.
Related issues, pull requests, and links
Documentation
( ) No documentation is needed.
(x) Sufficient documentation is included in this PR.
( ) Documentation PR is available with #prnumber.
( ) Documentation issue #issuenumber is filed, and can be handled later.
Release notes
( ) No release notes entries required.
(x ) Release notes entries required with the following suggested text: