Skip to content

Conversation

@jkylling
Copy link
Contributor

Description

Currently, when a new commit is made to a Delta table, the cached metadata entry of the most up to date TableSnapshot is invalidated. This means that a metadata entry must be re-read from the checkpoint and the commits made after the checkpoint (please see c0d0937). This is unnecessary work, as we always read the new commits, and could be reconciling the cached metadata entry with the possible metadata entries loaded from the new commits.

This PR modifies the TransactionLogTail to keep track of any metadata entries it may contain. In the TableSnapshot the cached metadata entry is reconciled with any metadata entry of the TransactionLogTail.

This fixes the seconds part of #17406 .

Similar code can be added to cache the protocol entry, to avoid any reads from the checkpoint files except when new checkpoints are made. I'd be happy to contribute this as well.

Additional context and related issues

Please see #17406 .

Release notes

( ) This is not user-visible or docs only and no release notes are required.
( ) Release notes are required, please propose a release note for me.
(x) Release notes are required, with the following suggested text:

# Section
* Improve caching of metadata entries from Delta logs. ({issue}`17406`)

@github-actions
Copy link

This pull request has gone a while without any activity. Tagging the Trino developer relations team: @bitsondatadev @colebow @mosabua

@github-actions github-actions bot added the stale label Jan 15, 2024
@mosabua
Copy link
Member

mosabua commented Jan 15, 2024

@jkylling @kasiafi and @findepi .. could you work together on this PR and get it towards merge?

@Pluies
Copy link
Contributor

Pluies commented Jan 22, 2024

@mosabua 👋 that has been on my plate for a while, I opened #20437 as a rebased + improved version of this PR 👍

@ebyhr
Copy link
Member

ebyhr commented Jan 22, 2024

Superseded by #20437

@ebyhr ebyhr closed this Jan 22, 2024
@mosabua
Copy link
Member

mosabua commented Jan 22, 2024

Great work @Pluies .. thank you.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cla-signed delta-lake Delta Lake connector

Development

Successfully merging this pull request may close these issues.

5 participants