Use only latest Puffin files for Iceberg stats#16745
Conversation
plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/TableStatisticsReader.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Why create this here, if it's not needed until the loop below?
There was a problem hiding this comment.
Why remove here? The map isn't used if the value is found, so I don't see the point in modifying it.
Seems this is leftover from before, and we can change to the three-argument version of toMap() that returns an unmodifiable map.
There was a problem hiding this comment.
This is not anymore the result of previous analyze right?
ANALYZE " + tableName + " WITH (columns = ARRAY['nationkey', 'regionkey']) has replaced the stats for comment and name as well.
Does the comment still make sense?
58cad8f to
b174d9a
Compare
|
Could you please add in the commit comment also an explanation about why is this change being done? |
plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/TableStatisticsReader.java
Outdated
Show resolved
Hide resolved
I think we should skip this from release notes. |
There was a problem hiding this comment.
getOnlyElement(statsFileBySnapshot.values()
more importantly, this special-case path can return a stat file that the longer version below wouldn't, when there is stats file for a newer snapshot)
let's remove the == 1 special case
plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/TableStatisticsReader.java
Outdated
Show resolved
Hide resolved
plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/TableStatisticsReader.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
reanalyze with a subset of columns should carry on unchanged stats to the new stats file
that's the basic idea behind this whole change
f3df999 to
42545df
Compare
There was a problem hiding this comment.
what would happen if we used INCREMENTAL_UPDATE always?
There was a problem hiding this comment.
You would call unnecessarily call code that merges changes instead of just replacing them i think nothing else
There was a problem hiding this comment.
INCREMENTAL_UPDATE facilitates stats update during inserts
we cannot update stats during deletions though. That's why we need REPLACE mode. ANALYZE uses REPLACE mode to "erase" values that are no longer in the table. (this is a challenge for incremental analyze, but that's another story).
now, ANALYZE wants to use REPLACE mode regardless of whether all data columns got analyzed, or only the subset. Otherwise it wouldn't be able to fix stats drifting over time. see #16889
There was a problem hiding this comment.
I cherry-picked your pr and the tests passed so I'd assume we are fine. Or what am I missing ?
There was a problem hiding this comment.
That test doesn't cover the case where subset of columns is being analyzed. Can modify the test accordingly?
There was a problem hiding this comment.
I've updated #16889 to include analyze with a subset of columns.
The current version of this PR will break that test.
There was a problem hiding this comment.
i am not sure we need to capture that in a handle, as the finish* method can know the analyzed vs all column names
There was a problem hiding this comment.
So I did it for 2 reasons, and probably both are invalid :|
Initially I wanted to do that based on column IDs but that is not feasible because they come as Strings as ANALYZE properties.
When I realised that I didn't revert this because I know I will need this for INCREMENTAL ANALYZE anyway. But as it is premature here I will do what you suggest.
4f04088 to
1670550
Compare
alexjo2144
left a comment
There was a problem hiding this comment.
Overall a good change, just a couple style nitpicks and things
There was a problem hiding this comment.
This is the same as columns.size() right?
There was a problem hiding this comment.
You can use Stream.distinct() rather than collecting to a set.
There was a problem hiding this comment.
yes but then I will have to collect to List and get size of that list so I thought that this is simpler ;)
There was a problem hiding this comment.
Reminder to clean this up, why are they commented out?
There was a problem hiding this comment.
oh, silly, I just commented them out for the debugging purpose..
1670550 to
4bbc03e
Compare
bed55cc to
4736938
Compare
There was a problem hiding this comment.
I don't see need to distinguisih between the two cases.
mergeStatisticsIfNecessary gets some statistics to write.
in REPLACE mode, it should write those new statistics, and should carry forward all statistics that hasn't been replaced (other columns, or other stats).
There was a problem hiding this comment.
Good point, we can remove the extra case and always run what's in the REPLACE_PARTIALLY block now, right?. It may just be worth refactoring the updateStats method slightly to make the reading of the previous Puffin file lazy.
5f62ccf to
d3315e9
Compare
92fe842 to
3aa690f
Compare
There was a problem hiding this comment.
remove, the class is used
the right way to deal with error-prone reporting problems is do add record compact constructor making a defensive copy and null checks on the provided parameters. thank you, error-prone, for reminding us about that :)
There was a problem hiding this comment.
that looks like validation belonging to the (compact) constructor?
There was a problem hiding this comment.
what is .map(entry -> Map.entry(entry.getKey(), entry.getValue())) for?
There was a problem hiding this comment.
I didn't come up with a better way to make a one stream of type Stream<Map.Entry<Integer, Object>> out of these two...
There was a problem hiding this comment.
what is oldSketches for?
why isn't io.trino.plugin.iceberg.TableStatisticsWriter#copyRetainedStatistics covering it?
(i think i found out -- the writer should use getLatestStatisticsFile and then reading oldSketches isn't needed here)
There was a problem hiding this comment.
This should use getLatestStatisticsFile
getLatestStatisticsFile(table, snapshotId)
.ifPresent(previousStatisticsFile -> copyRetainedStatistics
There was a problem hiding this comment.
My IDE shows a warning here. Is it because of my specific configuration, or something to address?
There was a problem hiding this comment.
readability: orElse for an optional created over previous 37 lines isn't readable to me
correctness: in INCREMENTAL_UPDATE, the collectedNdvSketches need to be merged with existing stats, otherwise they are incorrect.
i'd expect #17227 to fail (but it doesn't). Can you explain to me where the logic applied ensuring we don't write NDV(1) after inserting one row to a table that already has million of rows, but doesn't have stats?
There was a problem hiding this comment.
found the code, it's here
trino/plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/IcebergMetadata.java
Lines 1791 to 1795 in 3db96bb
still, the stats writer needs to be smarter, as if that code didn't exist.
There was a problem hiding this comment.
- I added that test.
- I will refactor that orElse
- It is merged here
if (finalUpdateMode == INCREMENTAL_UPDATE) {
newSketch = SetOperation.builder().buildUnion().union(readPreviousSketch(read), newSketch);
}
There was a problem hiding this comment.
per the comment elsewhere, the else { sholdn't be needed, the old values should be taken care of using copyRetainedStatistics
3aa690f to
734bc7f
Compare
findepi
left a comment
There was a problem hiding this comment.
Make sure TestIcebergStatistics passes also when
- you include test added in #17227
- apply the below change
Index: plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/IcebergMetadata.java
IDEA additional info:
Subsystem: com.intellij.openapi.diff.impl.patch.CharsetEP
<+>UTF-8
===================================================================
diff --git a/plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/IcebergMetadata.java b/plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/IcebergMetadata.java
--- a/plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/IcebergMetadata.java (revision 734bc7f5a61c503044fac6ce73aa432751a3550b)
+++ b/plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/IcebergMetadata.java (date 1682514123649)
@@ -1789,7 +1789,7 @@
return getStatisticsCollectionMetadata(tableMetadata, Optional.empty(), availableColumnNames -> {});
}
Set<String> columnsWithExtendedStatistics = tableStatistics.getColumnStatistics().entrySet().stream()
- .filter(entry -> !entry.getValue().getDistinctValuesCount().isUnknown())
+ // .filter(entry -> !entry.getValue().getDistinctValuesCount().isUnknown()) -- this is for optimization only, not correctness, so let's see what happens without it
.map(entry -> ((IcebergColumnHandle) entry.getKey()).getName())
.collect(toImmutableSet());
return getStatisticsCollectionMetadata(tableMetadata, Optional.of(columnsWithExtendedStatistics), availableColumnNames -> {});
plugin/trino-iceberg/src/test/java/io/trino/plugin/iceberg/TestIcebergStatistics.java
Outdated
Show resolved
Hide resolved
02c3db4 to
5181c1f
Compare
There was a problem hiding this comment.
Why read anything in REPLACE mode?
i think we gonna not use the read results anyway?
There was a problem hiding this comment.
what if updateMode is something else than REPLACE or INCREMENTAL_UPDATE? (future-proofing the code)
There was a problem hiding this comment.
I am looking at this diff and i don't understand why this is being removed.
Wasn't the previous structure OK?
same for mergeStatisticsIfNecessary return type. What was wrong about it?
if, in the future, we collect anything else besides NDV (CollectedStatistics changes), we want the mergeStatisticsIfNecessary to be updated appropriately, so having the method return CollectedStatistics looked right to me
There was a problem hiding this comment.
Why do this only if no previous statistics file? Shouldn't REPLACE mean always replace?
973b638 to
fbe72e0
Compare
fbe72e0 to
c8b9e0f
Compare
|
rebasing |
c8b9e0f to
024cca9
Compare
There was a problem hiding this comment.
is needed only in one case, so should inside switch below
i pushed a fixup with this (and some small other changes)
There was a problem hiding this comment.
this should be same as hasUsefulData above, i.e.
.filter(blobMetadata -> columnsWithRecentlyComputedStats.contains(getOnlyElement(blobMetadata.fields())))
There was a problem hiding this comment.
i fail to understand why this .filter(blobMetadata -> pendingPreviousNdvSketches.contains(getOnlyElement(blobMetadata.inputFields()))) is removed.
can you please explain?
There was a problem hiding this comment.
I probably lost it while changing the implementation few times. Thanks for catching it
There was a problem hiding this comment.
remove if (columnsWithRecentlyComputedStats.remove(fieldId)), here we shouldn't be reading anything that we're not joining with computed NDVs
everything else should go with copyRetainedStatistics path
There was a problem hiding this comment.
columnsWithRecentlyComputedStats will become immutable, so no need for initializing it with new HashSet<>(
There was a problem hiding this comment.
This means "for any column we have only partial data (previously had no NDV information), treat this partial data as complete information".
So if i insert 1 row into a table with 1 M rows, i can have NDV of 1 for such column.
Am i reading this right?
eeb1af4 to
f524e09
Compare
There was a problem hiding this comment.
style
CompactSketch union = SetOperation.builder().buildUnion().union(
readPreviousSketch(read),
requireNonNull(collectedNdvSketches.get(fieldId), "collectedNdvSketches.get(fieldId) is null"));
however, if you look at the PR diff and want to avoid unnecessary changes, the code should look like this
Memory memory = Memory.wrap(ByteBuffers.getBytes(read.second())); // Memory.wrap(ByteBuffer) results in a different deserialized state
CompactSketch previousSketch = CompactSketch.wrap(memory);
CompactSketch newSketch = requireNonNull(collectedNdvSketches.get(fieldId), "collectedNdvSketches.get(fieldId) is null");
ndvSketches.put(fieldId, SetOperation.builder().buildUnion().union(previousSketch, newSketch));You can view the PR diff at https://github.com/trinodb/trino/pull/16745/files
f524e09 to
d8eebd1
Compare
d8eebd1 to
ce4cd66
Compare
|
pushed a small change to the writer & commit message |
We should require that one Puffin file has all the stats that are still relevant, thus we should not need to traverse stat files.
ce4cd66 to
21d80fd
Compare
|
thank you |
Description
The goal of this change is to improve performance of reading stats, also older stats files may have skewed data which is not optimal for CBO.
Additional context and related issues
Release notes
( ) This is not user-visible or docs only and no release notes are required.
( ) Release notes are required, please propose a release note for me.
(x) Release notes are required, with the following suggested text: