Support OR-ed condition in Delta checkpoint iterator#19439
Support OR-ed condition in Delta checkpoint iterator#19439findepi merged 2 commits intotrinodb:masterfrom
Conversation
findepi
left a comment
There was a problem hiding this comment.
"Accept FilterPredicate in ParquetReader"
lib/trino-parquet/src/main/java/io/trino/parquet/predicate/TupleDomainParquetPredicate.java
Outdated
Show resolved
Hide resolved
jkylling
left a comment
There was a problem hiding this comment.
Thank you for fixing this!
...st/java/io/trino/plugin/deltalake/transactionlog/checkpoint/TestCheckpointEntryIterator.java
Outdated
Show resolved
Hide resolved
plugin/trino-hive/src/main/java/io/trino/plugin/hive/parquet/ParquetPageSourceFactory.java
Outdated
Show resolved
Hide resolved
plugin/trino-hive/src/main/java/io/trino/plugin/hive/parquet/ParquetPageSourceFactory.java
Outdated
Show resolved
Hide resolved
plugin/trino-hive/src/main/java/io/trino/plugin/hive/parquet/ParquetPageSourceFactory.java
Outdated
Show resolved
Hide resolved
...ake/src/main/java/io/trino/plugin/deltalake/transactionlog/DeltaLakeTransactionLogEntry.java
Outdated
Show resolved
Hide resolved
...ake/src/main/java/io/trino/plugin/deltalake/transactionlog/DeltaLakeTransactionLogEntry.java
Outdated
Show resolved
Hide resolved
...c/main/java/io/trino/plugin/deltalake/transactionlog/checkpoint/CheckpointEntryIterator.java
Outdated
Show resolved
Hide resolved
plugin/trino-hive/src/main/java/io/trino/plugin/hive/parquet/ParquetPageSourceFactory.java
Outdated
Show resolved
Hide resolved
plugin/trino-hive/src/main/java/io/trino/plugin/hive/parquet/ParquetPageSourceFactory.java
Outdated
Show resolved
Hide resolved
plugin/trino-hive/src/main/java/io/trino/plugin/hive/parquet/ParquetPageSourceFactory.java
Outdated
Show resolved
Hide resolved
plugin/trino-hive/src/main/java/io/trino/plugin/hive/parquet/ParquetPageSourceFactory.java
Outdated
Show resolved
Hide resolved
plugin/trino-hive/src/main/java/io/trino/plugin/hive/parquet/ParquetPageSourceFactory.java
Outdated
Show resolved
Hide resolved
plugin/trino-hive/src/main/java/io/trino/plugin/hive/parquet/ParquetPageSourceFactory.java
Outdated
Show resolved
Hide resolved
plugin/trino-hive/src/main/java/io/trino/plugin/hive/parquet/ParquetPageSourceFactory.java
Outdated
Show resolved
Hide resolved
6d8e652 to
b00803c
Compare
in both cases we read ~50k of 300MB file |
The number of bytes read should be approximately the same, but the number of completed positions is very different (1235156 vs. 470000). The bottle neck is that we are iterating over all the rows of the row groups we read, and checking if they contain non-null data for the metadata or protocol entries, but this is only the case for two rows. On master, since the columns we select are all null except in two rows, it's cheap in terms of data, but it's expensive in terms of compute (relative to what this operation is supposed to do), as we iterate over 1235156 rows. |
lib/trino-parquet/src/main/java/io/trino/parquet/reader/ParquetReader.java
Outdated
Show resolved
Hide resolved
plugin/trino-hive/src/main/java/io/trino/plugin/hive/parquet/ParquetPageSourceFactory.java
Outdated
Show resolved
Hide resolved
plugin/trino-hive/src/main/java/io/trino/plugin/hive/parquet/ParquetPageSourceFactory.java
Outdated
Show resolved
Hide resolved
plugin/trino-hive/src/main/java/io/trino/plugin/hive/parquet/ParquetPageSourceFactory.java
Outdated
Show resolved
Hide resolved
plugin/trino-hive/src/main/java/io/trino/plugin/hive/parquet/ParquetPageSourceFactory.java
Outdated
Show resolved
Hide resolved
...c/main/java/io/trino/plugin/deltalake/transactionlog/checkpoint/CheckpointEntryIterator.java
Outdated
Show resolved
Hide resolved
...ake/src/main/java/io/trino/plugin/deltalake/transactionlog/DeltaLakeTransactionLogEntry.java
Outdated
Show resolved
Hide resolved
b00803c to
e134ec5
Compare
e134ec5 to
000da7e
Compare
I went back and tested this PR with similar checkpoint part files to the ones reported in #17405 (comment) When fetching metadata and protocol entries on master: ~950 ms This is a bit at odds with the reported speed up from ~40 seconds to ~1 second when retrieving add entries in the issue. I tested again on master with a Still, the speed up from 950 ms to 150 ms should be enough for doing this PR. This delay is during the analysis phase, so is usually very noticeable when querying. |
|
thanks @jkylling for the feedback and measuring the times
spoke with @ebyhr and realized we measured only the amount of data loaded, but not the number of round trips to the storage. @findinpath please measure that (eg with MinIO notifications / request log) As @findinpath spoke earlier, Open question remains: could we achieve same improvement within ParquetReader, based on column select and all-null information for row groups. This would mean same improvement for chckpoint reading without any checkpoint reader changes, and also for other cases. |
000da7e to
d03c3fe
Compare
@findepi updated the description
|
We do already have the null/empty check here: |
|
Rebasing on |
Co-authored-by: Yuya Ebihara <ebyhry@gmail.com>
d03c3fe to
8c9986f
Compare
|
per offline discussion with @raunaqmorarka |
Description
Relates to #19156
Very much inspired from #19240
The problem this PR is solving is that it avoids working with
TupleDomain.all()as tuple domain when there are more predicates specified for theCheckpointEntryIterator:trino/plugin/trino-delta-lake/src/main/java/io/trino/plugin/deltalake/transactionlog/checkpoint/CheckpointEntryIterator.java
Lines 186 to 187 in d23921d
What this means is that in such cases, all the entries from the Delta Lake Parquet checkpoint file will be returned as results even though the code flow is interested in a small subset of the entries.
A common use-case of specifying more predicates (entry types) for the
CheckpointEntryIteratorisDeltaLakeMetadata.getTableHandle():trino/plugin/trino-delta-lake/src/main/java/io/trino/plugin/deltalake/DeltaLakeMetadata.java
Lines 495 to 498 in d23921d
Additional context and related issues
The
ParquetPageSourceFactoryhas been slightly refactored to expose a method handling multiple tuple domains.The results returned in the page source are corresponding to any of the specified tuple domains.
To put things in perspective, the changes from this PR have been tested against a 300MB checkpoint file to see how it performs.
The example showcased below depends on a local file, but the changes have been tested as well against a S3 bucket and for retrieving
metadata&protocoltogether there is no relevant performance gain.tldr; performance wise the changes coming with this PR are rather modest because the amount of bytes read from the Parquet columnar file for
metadataandprotocolis pretty small compared to the bytes needed to encode theaddentries.The results on testing with
masterThe results on testing with the changes from this PR:
I slightly modified the example above to test against MinIO and check what kind of conversations are made with the object storage:
Relevant output while running with this PR
Relevant output while running on
masterFor
METADATA&PROTOCOLwhen specified together, for the file I am using for testing, theCheckpointEntryIterator makes currently (inmaster`) the extra call:Release notes
(x) This is not user-visible or is docs only, and no release notes are required.
( ) Release notes are required. Please propose a release note for me.
( ) Release notes are required, with the following suggested text: