Don't truncate min/max for iceberg delete files#12671
Conversation
|
I used metric FULL_METRIC_CONFIG for parquet to allow storing full paths in min/max. It was impossible to do it this way for ORC so I decided to put hard limit there. |
There was a problem hiding this comment.
We never want to truncate the stats path, right? So could we do
DataSize.ofBytes(Long.MAX_VALUE) here?
There was a problem hiding this comment.
hmm that makes sense, I will change it
There was a problem hiding this comment.
Actually it has to be Integer.MAX_VALUE
There was a problem hiding this comment.
Maybe cleaner to not have the stats limit be optional in createOrcWriter and instead get it from the session here, so we always have a value
There was a problem hiding this comment.
Is there a GH Issue to support metrics mode for Parquet? If not, can you make one
Either way, link it here
6168f68 to
c242315
Compare
c242315 to
582a1b1
Compare
Description
Previously min/max were truncated to very short numbers which may have had very bad impact on performance.
Related issues, pull requests, and links
Documentation
( ) No documentation is needed.
( ) Sufficient documentation is included in this PR.
( ) Documentation PR is available with #prnumber.
( ) Documentation issue #issuenumber is filed, and can be handled later.
Release notes
( ) No release notes entries required.
( ) Release notes entries required with the following suggested text: