Skip to content

Avoid no-op Iceberg table snapshot if DML doesn't change table state#12412

Merged
findepi merged 1 commit intotrinodb:masterfrom
findinpath:iceberg-avoid-recording-empty-dml-statements
Jun 9, 2022
Merged

Avoid no-op Iceberg table snapshot if DML doesn't change table state#12412
findepi merged 1 commit intotrinodb:masterfrom
findinpath:iceberg-avoid-recording-empty-dml-statements

Conversation

@findinpath
Copy link
Copy Markdown
Contributor

@findinpath findinpath commented May 16, 2022

Description

This PR avoids creating an Iceberg snapshot when dealing with a DML statement that does not affect the contents of the Iceberg table.

e.g. :

INSERT INTO iceberg.default.iceberg_table SELECT * FROM source_table WHERE false

NOTE however that a CTAS statement that corresponds to an empty SELECT will be obviously recorded as a table snapshot because it creates the table.

Is this change a fix, improvement, new feature, refactoring, or other?

Improvement

Is this a change to the core query engine, a connector, client library, or the SPI interfaces? (be specific)

Iceberg connector

How would you describe this change to a non-technical end user or system administrator?

Avoid creating a snapshot for a table when dealing with an empty DML (INSERT, UPDATE, DELETE) statement.

Related issues, pull requests, and links

Fixes #12319

Documentation

(x) No documentation is needed.
( ) Sufficient documentation is included in this PR.
( ) Documentation PR is available with #prnumber.
( ) Documentation issue #issuenumber is filed, and can be handled later.

Release notes

(x) No release notes entries required.
( ) Release notes entries required with the following suggested text:

# Iceberg
* Avoid creating a table snapshot for DML statements that do not change the table state

@cla-bot cla-bot bot added the cla-signed label May 16, 2022
@findinpath findinpath requested review from alexjo2144, ebyhr, findepi and homar and removed request for alexjo2144 and findepi May 16, 2022 11:14
@findepi findepi changed the title Avoid recording empty DML statements Avoid no-op Iceberg table snapshot if DML doesn't change table state May 16, 2022
@findinpath findinpath force-pushed the iceberg-avoid-recording-empty-dml-statements branch from 4ffb3f8 to 02c4940 Compare May 16, 2022 18:49
@findinpath findinpath requested a review from alexjo2144 May 16, 2022 18:49
@findinpath findinpath force-pushed the iceberg-avoid-recording-empty-dml-statements branch from 02c4940 to fa4a6d5 Compare May 17, 2022 10:53
@findinpath findinpath requested a review from ebyhr May 17, 2022 10:53
@findinpath findinpath force-pushed the iceberg-avoid-recording-empty-dml-statements branch from fa4a6d5 to 31f0630 Compare May 18, 2022 04:54
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we could change this to

// avoid empty snapshot for existing tables
if (commitTasks.isEmpty() && (table.currentSnapshot() != null)) {

Then we don't need the special case in finishCreateTable().

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this could be done indeed as you mentioned.
This has been actually the initial version of the code in the PR.

However, the code would be a bit more difficult to understand with the handling for this special case in finishInsert so we've decided to go further with adding the handling for empty CTAS into finishCreateTable.

Related discussion:

#12412 (comment)

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't this check happen before instantiating Type[] partitionColumnTypes ?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I moved the check at the beginning of the method. Thanks for bringing this up.

@findepi
Copy link
Copy Markdown
Member

findepi commented Jun 7, 2022

@findinpath please update

@alexjo2144 please review after the update

@findinpath findinpath requested a review from electrum June 8, 2022 04:10
@findinpath findinpath force-pushed the iceberg-avoid-recording-empty-dml-statements branch from 31f0630 to 306f730 Compare June 8, 2022 16:03
@findinpath findinpath requested a review from homar June 8, 2022 16:05
Comment on lines 569 to 571
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why these two lines?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i see now, please add a comment

@findinpath findinpath force-pushed the iceberg-avoid-recording-empty-dml-statements branch from 306f730 to 1f676b1 Compare June 9, 2022 08:28
@findepi
Copy link
Copy Markdown
Member

findepi commented Jun 9, 2022

@findinpath do you want to suggest RN entry?

@findinpath
Copy link
Copy Markdown
Contributor Author

do you want to suggest RN entry?

I think that the change in this PR is rather minor for Iceberg functionality and probably is not worth mentioning in the release notes.

I added however the following RN line the PR description.

Avoid creating a table snapshot for DML statements that do not change the table state

Up to you @findepi whether this should get included in the RN.

@findepi findepi merged commit e75ba00 into trinodb:master Jun 9, 2022
@findepi findepi mentioned this pull request Jun 9, 2022
@github-actions github-actions bot added this to the 386 milestone Jun 9, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Development

Successfully merging this pull request may close these issues.

Avoid creating snapshots on Iceberg for empty operations

6 participants