Skip to content

Add support for partition pruning in Delta checkpoint iterator#19588

Merged
ebyhr merged 1 commit intotrinodb:masterfrom
ebyhr:ebi/delta-part-values-parsed
Nov 16, 2023
Merged

Add support for partition pruning in Delta checkpoint iterator#19588
ebyhr merged 1 commit intotrinodb:masterfrom
ebyhr:ebi/delta-part-values-parsed

Conversation

@ebyhr
Copy link
Copy Markdown
Member

@ebyhr ebyhr commented Oct 31, 2023

Release notes

(x) Release notes are required, with the following suggested text:

# Delta Lake
* Improve performance when reading large checkpoint files on partitioned tables. ({issue}`issuenumber`)

@cla-bot cla-bot bot added the cla-signed label Oct 31, 2023
@github-actions github-actions bot added docs delta-lake Delta Lake connector labels Oct 31, 2023
@ebyhr ebyhr self-assigned this Oct 31, 2023
@ebyhr ebyhr force-pushed the ebi/delta-part-values-parsed branch 3 times, most recently from a4106b2 to 7c9ac69 Compare October 31, 2023 21:39
Copy link
Copy Markdown
Contributor

@findinpath findinpath Nov 6, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't we have here only 1 entry?
Probably this relates to https://github.com/trinodb/trino/pull/19588/files/7c9ac692875bdb08827aa1dc9f7beac63a9874d4#r1383331077
We should have also the check to see that a reduced amount of entries has been actually read from the parquet file

        assertThat(checkpointEntryIterator.getCompletedPositions().orElseThrow()).isEqualTo(....);

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When doing buildAddEntry check whether the partitionValues / partitionValues_parsed match the partitionConstraint and return null if not matching.

Map<DeltaLakeColumnHandle, Domain> enforcedDomains = enforcedPartitionConstraint.getDomains().orElseThrow();
if (!partitionMatchesPredicate(addAction.getCanonicalPartitionValues(), enforcedDomains)) {

@ebyhr
Copy link
Copy Markdown
Member Author

ebyhr commented Nov 8, 2023

Just rebased on master.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

perhaps remove && !partitionConstraint.isAll()

i think the new code path should eventually replace the old cache-based approach, so we can use isCheckpointPartitionFilterEnabled as a algorithm-selecting toggle

@ebyhr
Copy link
Copy Markdown
Member Author

ebyhr commented Nov 9, 2023

CI hit #19602

@ebyhr ebyhr marked this pull request as ready for review November 9, 2023 08:11
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to check this for every position ? Seems like we should know this per file based on parquet file metadata (maybe it's possible to use io.trino.plugin.hive.ReaderPageSource#getReaderColumns).

Copy link
Copy Markdown
Member Author

@ebyhr ebyhr Nov 13, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree with using Parquet metadata though getReaderColumns returns an empty list in this case. Sent another PR #19727

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While this may help in reducing the number of DeltaLakeTransactionLogEntry, doing the filtering after materialising all channels on each position of a page means that we can't benefit from lazy loading of blocks.
Ideally we should filter directly on the relevant block channels and skip to next position without decoding the remaining channels when the predicate does not match. But this can be looked at as a follow-up.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct.
The partition matching check should be done directly in io.trino.plugin.deltalake.transactionlog.checkpoint.CheckpointEntryIterator#buildAddEntry

If we know that we have the field partitionValues_parsed (see https://github.com/trinodb/trino/pull/19588/files#r1389691135) , maybe we should do this check right away after doing

log.debug("Building add entry from %s pagePosition %d", block, pagePosition);
if (block.isNull(pagePosition)) {
return null;
}

optional: One word concerning using entry.getAdd().getCanonicalPartitionValues().
We have at hand the partitionValues_parsed. We could avoid deserializing the stringified partition values and use the "parsed" values directly. OTOH, we don't actually use the parsed partition values otherwise anywhere else. Did you intentionally restrain from reading the parsed partition values in favor of the stringified partition values?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct.
The partition matching check should be done directly in io.trino.plugin.deltalake.transactionlog.checkpoint.CheckpointEntryIterator#buildAddEntry

If we know that we have the field partitionValues_parsed (see https://github.com/trinodb/trino/pull/19588/files#r1389691135) , maybe we should do this check right away after doing

log.debug("Building add entry from %s pagePosition %d", block, pagePosition);
if (block.isNull(pagePosition)) {
return null;
}

optional: One word concerning using entry.getAdd().getCanonicalPartitionValues().
We have at hand the partitionValues_parsed. We could avoid deserializing the stringified partition values and use the "parsed" values directly. OTOH, we don't actually use the parsed partition values otherwise anywhere else. Did you intentionally restrain from reading the parsed partition values in favor of the stringified partition values?

@ebyhr ebyhr force-pushed the ebi/delta-part-values-parsed branch from e73edbc to c0494e2 Compare November 13, 2023 02:05
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why TODO? why not do it right away?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just wanted to focus on SELECT path in this PR. Going to handle in this PR.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The callers (eg split source) will likely repeat this work, so it's partially wasted.
Still useful because this allows us to materialize a shorter list.

I think this wouldn't be needed here if we could return a Stream/Iterator instead of a List.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

require... or use... ?

we wnt to use use partitionvalues_parsed field if it is present, but we don't require that it exists (we don't fail when it doesn't), right?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The set of partitioning columns may change in the meantime probably only through the CREATE OR REPLACE TABLE operation. In such case, we shouldn't need to read the old checkpoint file at all, but I don't know whether this is the case.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

( #19586 )

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you pls test coverage into TestDeltaLakeFileOperations with checkpoint_filtering_enabled session property enabled to add more transparence in regards to the consequences coming with this change?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Development

Successfully merging this pull request may close these issues.

4 participants