Construct AddFileEntry instance only if necessary while reading the Delta Lake checkpoint#19795
Conversation
...in/trino-delta-lake/src/main/java/io/trino/plugin/deltalake/transactionlog/AddFileEntry.java
Outdated
Show resolved
Hide resolved
1a41795 to
a8e51ec
Compare
...c/main/java/io/trino/plugin/deltalake/transactionlog/checkpoint/CheckpointEntryIterator.java
Outdated
Show resolved
Hide resolved
...c/main/java/io/trino/plugin/deltalake/transactionlog/checkpoint/CheckpointEntryIterator.java
Outdated
Show resolved
Hide resolved
...c/main/java/io/trino/plugin/deltalake/transactionlog/checkpoint/CheckpointEntryIterator.java
Outdated
Show resolved
Hide resolved
...in/trino-delta-lake/src/main/java/io/trino/plugin/deltalake/transactionlog/AddFileEntry.java
Outdated
Show resolved
Hide resolved
a8e51ec to
efd2637
Compare
There was a problem hiding this comment.
That's not enough - I'm seeing while debugging buildAddEntry that the blocks are all not lazy.
There was a problem hiding this comment.
Even if we use lazy blocks, the block behind the lazy block is the monolithic structure corresponding to the add entry.
If we want to actually avoid reading from parquet add related fields for the entries which are not relevant, we need to refactor the way we are reading the checkpoint.
There was a problem hiding this comment.
Even if we use lazy blocks, the block behind the lazy block is the monolithic structure
yes
If we want to actually avoid reading from parquet add related fields for the entries which are not relevant,
i don't think the parquet reader supports that, or can support that, given how values are encoded in Parquet.
There was a problem hiding this comment.
i don't think the parquet reader supports that,
I'm pointing here towards using a similar method as for dereference pushdown.
efd2637 to
e1ce7b8
Compare
ee830b6 to
17fac3f
Compare
|
Does this technically conflict with #19848 ? |
findepi
left a comment
There was a problem hiding this comment.
"Avoid early to construct the AddFileEntry"
...in/trino-delta-lake/src/main/java/io/trino/plugin/deltalake/transactionlog/AddFileEntry.java
Outdated
Show resolved
Hide resolved
...c/main/java/io/trino/plugin/deltalake/transactionlog/checkpoint/CheckpointEntryIterator.java
Outdated
Show resolved
Hide resolved
findepi
left a comment
There was a problem hiding this comment.
"Create CheckPointFieldExtractor instance only if necessary"
There was a problem hiding this comment.
requireNonNull becomes redundant
17fac3f to
b52e93f
Compare
There was a problem hiding this comment.
|| addPartitionValuesBlock.isNull(pagePosition) wasn't here, right?
why is it being added?
There was a problem hiding this comment.
The isNull check for addPartitionValuesBlock is being added because there are now 2 blocks (instead of initially 1) from which we build up the add entry and they need to be consistent.
Changing though slightly the logic. Thank you for raising this.
There was a problem hiding this comment.
Is this useful, especually considering Block.toString?
or to be removed?
There was a problem hiding this comment.
Are you implying here to eventually remove all the debug statements from the CheckpointEntryIterator class?
Potential follow-up?
There was a problem hiding this comment.
We could log the field count of RowBlock once per Page somewhere, it looks unnecessary to log it for every position in a Page
...c/main/java/io/trino/plugin/deltalake/transactionlog/checkpoint/CheckpointEntryIterator.java
Outdated
Show resolved
Hide resolved
...c/main/java/io/trino/plugin/deltalake/transactionlog/checkpoint/CheckpointSchemaManager.java
Outdated
Show resolved
Hide resolved
9dda8f5 to
1e3db18
Compare
...in/trino-delta-lake/src/main/java/io/trino/plugin/deltalake/transactionlog/AddFileEntry.java
Outdated
Show resolved
Hide resolved
...c/main/java/io/trino/plugin/deltalake/transactionlog/checkpoint/CheckpointEntryIterator.java
Outdated
Show resolved
Hide resolved
...c/main/java/io/trino/plugin/deltalake/transactionlog/checkpoint/CheckpointEntryIterator.java
Outdated
Show resolved
Hide resolved
...c/main/java/io/trino/plugin/deltalake/transactionlog/checkpoint/CheckpointEntryIterator.java
Outdated
Show resolved
Hide resolved
...c/main/java/io/trino/plugin/deltalake/transactionlog/checkpoint/CheckpointEntryIterator.java
Outdated
Show resolved
Hide resolved
...c/main/java/io/trino/plugin/deltalake/transactionlog/checkpoint/CheckpointEntryIterator.java
Outdated
Show resolved
Hide resolved
1e3db18 to
8dea018
Compare
8dea018 to
29fab3e
Compare
|
Rebased on master to handle conflicts with #19848 |
6f080ce to
206ad69
Compare
that's why i asked #19795 (comment) :) |
206ad69 to
ef818db
Compare
ef818db to
f1f645e
Compare
|
Rebased on |
...c/main/java/io/trino/plugin/deltalake/transactionlog/checkpoint/CheckpointEntryIterator.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Please reword the commit message to
Construct AddFileEntry lazily
When checkpoint filtering is applied
and there are partition constraints which do not
match the partition values of the entry, avoid
eagerly constructing `AddFileEntry`.
There was a problem hiding this comment.
Modified to
Construct AddFileEntry instance only if necessary
When checkpoint filtering is applied and there
are partition constraints which do not match the
partition values of the entry, avoid eagerly to
construct `AddFileEntry` instances.
...c/main/java/io/trino/plugin/deltalake/transactionlog/checkpoint/CheckpointEntryIterator.java
Outdated
Show resolved
Hide resolved
When checkpoint filtering is applied and there are partition constraints which do not match the partition values of the entry, avoid eagerly to construct `AddFileEntry` instances.
In case of performing checkpoint filtering in Delta Lake, avoid reading from Parquet pages loaded in memory the `add` entries which don't match the partition predicate.
f1f645e to
2039b5c
Compare
Description
In case that there is checkpoint filtering applied and there are partition constraints which do not
match the partition values of the entry, avoid early to create the
AddFileEntryinstance.Split the loading for the
addentries from the Parquet checkpoint in two channels:partitionValuesinformationaddWhen building the
addentry, check first the partition constraint to match against the partition values and only then load theaddblock to avoid unecessary resources spent on deserialization.Used for testing a multi-part checkpoint file (25 parts , each around 12MB ~ 300MB in total) for testing this feature while storing the checkpoint in local MinIO and came up with the following results:
As can be seen from the analysis above, there can't be spotted any relevant improvement in terms of IO with this change.
That's because the Parquet page is already loaded because it contains at least one entry matching the partition predicate.
This is why the checkpoint iterator still does read the same amount of bytes as the baseline in case of applying the partition pruning.
However, it can be seen that the elapsed number of milliseconds decreases in case of using this change because there is less deserialization performed.
Tested as well with a more permissive filter and haven't actually spotted bigger improvements than ~ 0.5s in terms of elapsed time between the base line and the current changes.
Additional context and related issues
This change builds on top of #19588
Release notes
( ) This is not user-visible or is docs only, and no release notes are required.
( ) Release notes are required. Please propose a release note for me.
(x) Release notes are required, with the following suggested text: