Skip to content

Avoid reading Iceberg delete files when not needed#13395

Merged
findepi merged 2 commits intotrinodb:masterfrom
alexjo2144:iceberg/filter-delete-files
Aug 8, 2022
Merged

Avoid reading Iceberg delete files when not needed#13395
findepi merged 2 commits intotrinodb:masterfrom
alexjo2144:iceberg/filter-delete-files

Conversation

@alexjo2144
Copy link
Copy Markdown
Member

Description

Parqet only.

Skip reading the delete files associated with a data file if the deletes are
not relevant. This can happen when the statistics from the data file already
show the split can be skipped. Additionally, this can happen when the line
numbers read by the split are known and can be used to filter positional
deletes.

Is this change a fix, improvement, new feature, refactoring, or other?

Performance improvement

Is this a change to the core query engine, a connector, client library, or the SPI interfaces? (be specific)

Iceberg connector

How would you describe this change to a non-technical end user or system administrator?

Minimize I/O operations

Related issues, pull requests, and links

#13219

Documentation

(x) No documentation is needed.
( ) Sufficient documentation is included in this PR.
( ) Documentation PR is available with #prnumber.
( ) Documentation issue #issuenumber is filed, and can be handled later.

Release notes

(x) No release notes entries required.
( ) Release notes entries required with the following suggested text:

@cla-bot cla-bot bot added the cla-signed label Jul 28, 2022
@alexjo2144 alexjo2144 requested review from ebyhr, electrum and findepi July 28, 2022 21:20
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why? add a comment

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Switched the approach here to just wrap the DeleteFilter reading in a Supplier. I think that reads better

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

.array()

do we need to make a defensive copy of these?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably. Added a call to clone

@alexjo2144 alexjo2144 force-pushed the iceberg/filter-delete-files branch from d3b7369 to 43369ad Compare August 2, 2022 16:12
@alexjo2144
Copy link
Copy Markdown
Member Author

alexjo2144 commented Aug 2, 2022

Applied comments in fixup commit, thanks @findepi

@findepi findepi force-pushed the iceberg/filter-delete-files branch from b676542 to 716b527 Compare August 2, 2022 17:59
@findepi
Copy link
Copy Markdown
Member

findepi commented Aug 2, 2022

squashed

@findepi
Copy link
Copy Markdown
Member

findepi commented Aug 3, 2022

@alexjo2144 can you please rebase?

Parqet only.

Skip reading the delete files associated with a data file if the deletes are
not relevant. This can happen when the statistics from the data file already
show the split can be skipped. Additionally, this can happen when the line
numbers read by the split are known and can be used to filter positional
deletes.
@alexjo2144 alexjo2144 force-pushed the iceberg/filter-delete-files branch from 716b527 to 6df8a8b Compare August 3, 2022 16:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Development

Successfully merging this pull request may close these issues.

3 participants