Skip to content

Conversation

@belen-pruvost
Copy link
Member

@belen-pruvost belen-pruvost commented Apr 18, 2024

Description

When running CREATE [OR REPLACE] TABLE statements for an existing table, the finishCreate method is currently not writing a checkpoint to the Delta Log. This PR makes it possible to write those checkpoints. It results in faster analysis times for queries that read from those tables.

Additional context and related issues

We have noticed that tables recreated through this flow (CREATE [OR REPLACE] TABLE) have more than 2000 commits and no checkpoint, which leads to a very bad analysis time

Release notes

(x) This is not user-visible or is docs only, and no release notes are required.

@cla-bot cla-bot bot added the cla-signed label Apr 18, 2024
@belen-pruvost belen-pruvost force-pushed the write-checkpoint-on-replace branch from 346ea1c to 40df1d3 Compare April 18, 2024 10:43
@jkylling jkylling requested a review from Praveen2112 April 18, 2024 10:43
@github-actions github-actions bot added the delta-lake Delta Lake connector label Apr 18, 2024
@wendigo wendigo requested a review from findinpath April 18, 2024 12:33
@belen-pruvost belen-pruvost force-pushed the write-checkpoint-on-replace branch from 40df1d3 to c100560 Compare April 19, 2024 15:13
@belen-pruvost belen-pruvost changed the title Add checkpoint to finishCreate for create or replace statements for existing tables Write checkpoint if necessary for CREATE OR REPLACE statements Apr 19, 2024
@belen-pruvost belen-pruvost force-pushed the write-checkpoint-on-replace branch from c100560 to 0878304 Compare April 24, 2024 19:41
@belen-pruvost belen-pruvost force-pushed the write-checkpoint-on-replace branch 3 times, most recently from 4283234 to 4210391 Compare April 25, 2024 07:19
@belen-pruvost belen-pruvost force-pushed the write-checkpoint-on-replace branch from 4210391 to 0d6124a Compare April 25, 2024 09:37
@belen-pruvost belen-pruvost force-pushed the write-checkpoint-on-replace branch 4 times, most recently from 63197e9 to 6bc3c31 Compare April 26, 2024 07:17
@belen-pruvost belen-pruvost force-pushed the write-checkpoint-on-replace branch 2 times, most recently from 5b97d81 to e704950 Compare April 26, 2024 09:21
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we need here as well the check to see whether the table existed before?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it may be redundant considering that on line 968
boolean replaceExistingTable = tableHandle != null && saveMode == SaveMode.REPLACE;
and that we know it exists in that scenario, as we then do this on line 970
ConnectorTableMetadata existingTableMetadata = getTableMetadata(session, tableHandle);

@belen-pruvost belen-pruvost force-pushed the write-checkpoint-on-replace branch from e704950 to ea53815 Compare April 26, 2024 09:50
@belen-pruvost
Copy link
Member Author

We implemented a similar change to this on top of our Trino fork on April 19th, as we have multiple tables being queried that are created via CREATE OR REPLACE statements.
This is how the average analysis time has evolved:
Screenshot 2024-04-26 at 11 57 49

@belen-pruvost belen-pruvost force-pushed the write-checkpoint-on-replace branch from ea53815 to b43e4ae Compare April 26, 2024 10:20
@findinpath findinpath requested a review from ebyhr May 3, 2024 09:07
Copy link
Member

@ebyhr ebyhr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ebyhr ebyhr force-pushed the write-checkpoint-on-replace branch from b43e4ae to 5382bab Compare May 28, 2024 08:15
@ebyhr ebyhr merged commit d8f7efa into trinodb:master May 28, 2024
@github-actions github-actions bot added this to the 449 milestone May 28, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cla-signed delta-lake Delta Lake connector

Development

Successfully merging this pull request may close these issues.

3 participants