Skip to content

Only remove Iceberg delete files when safe to do so#13343

Merged
ebyhr merged 2 commits intotrinodb:masterfrom
alexjo2144:iceberg/optimize-deletes
Jul 29, 2022
Merged

Only remove Iceberg delete files when safe to do so#13343
ebyhr merged 2 commits intotrinodb:masterfrom
alexjo2144:iceberg/optimize-deletes

Conversation

@alexjo2144
Copy link
Copy Markdown
Member

The initial implementation removed delete files from a partition
even if the whole table was not scanned. This was fine, but assumes
the enforced predicate describes entire partitions. This
assumption will not be true after #13012

Description

Is this change a fix, improvement, new feature, refactoring, or other?

Is this a change to the core query engine, a connector, client library, or the SPI interfaces? (be specific)

How would you describe this change to a non-technical end user or system administrator?

Related issues, pull requests, and links

Documentation

( ) No documentation is needed.
( ) Sufficient documentation is included in this PR.
( ) Documentation PR is available with #prnumber.
( ) Documentation issue #issuenumber is filed, and can be handled later.

Release notes

( ) No release notes entries required.
( ) Release notes entries required with the following suggested text:

# Section
* Fix some things. ({issue}`issuenumber`)

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you wanted if the whole table ?

@homar
Copy link
Copy Markdown
Member

homar commented Jul 26, 2022

@alexjo2144 so the changes made in #13012 in IcebergMetadata.getTableHandleForOptimize are not safe? or we just want to be safe for the future ?

@alexjo2144
Copy link
Copy Markdown
Member Author

@alexjo2144 so the changes made in #13012 in IcebergMetadata.getTableHandleForOptimize are not safe? or we just want to be safe for the future ?

So I noticed one major problem with the original approach we had, which was that we assumed getTableHandleForExecute was called after applyFilter so we could know if it's a whole table scan or not. That's not the case.

Overall, in hindsight I think this logic is more error prone than it's worth

The initial implementation removed delete files from a partition
even if the whole table was not scanned. This was fine, but assumes
the enforced predicate describes entire partitions. This
assumption will not be true after trinodb#13012
@alexjo2144 alexjo2144 force-pushed the iceberg/optimize-deletes branch from 3b361bd to cf73dc9 Compare July 26, 2022 21:33
@ebyhr
Copy link
Copy Markdown
Member

ebyhr commented Jul 28, 2022

@homar Could you please confirm the above comment by @alexjo2144?

@homar
Copy link
Copy Markdown
Member

homar commented Jul 28, 2022

@homar Could you please confirm the above comment by @alexjo2144?

yes I think we should remove that logic :/
tough I am not sure what was @findepi's opinion, I remember it was discussed recently

@findepi
Copy link
Copy Markdown
Member

findepi commented Jul 28, 2022

👍

@ebyhr
Copy link
Copy Markdown
Member

ebyhr commented Jul 29, 2022

Confirmed that Iceberg tests passed in #13400

@ebyhr ebyhr merged commit e210738 into trinodb:master Jul 29, 2022
@ebyhr
Copy link
Copy Markdown
Member

ebyhr commented Jul 29, 2022

Merged, thanks!

@github-actions github-actions bot added this to the 392 milestone Jul 29, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Development

Successfully merging this pull request may close these issues.

4 participants