Skip to content

Use native implementation of Iceberg delete filters#13219

Merged
electrum merged 2 commits intotrinodb:masterfrom
electrum:deletefilter
Jul 26, 2022
Merged

Use native implementation of Iceberg delete filters#13219
electrum merged 2 commits intotrinodb:masterfrom
electrum:deletefilter

Conversation

@electrum
Copy link
Copy Markdown
Member

@electrum electrum commented Jul 19, 2022

Description

Improve performance of Iceberg queries for tables with updated or deleted rows.

Related issues, pull requests, and links

Documentation

(x) No documentation is needed.

Release notes

(x) Release notes entries required with the following suggested text:

# Iceberg connector
* Improve performance of Iceberg queries for tables with updated or deleted rows. ({issue}`13092`)

@lhofhansl
Copy link
Copy Markdown
Member

lhofhansl commented Jul 19, 2022

Works well on my scenario. And faster than #13112.
(Only tested postional deletes.)

@findepi
Copy link
Copy Markdown
Member

findepi commented Jul 19, 2022

@electrum can you please extract some of the prep easy commits to a separate PR to have them merged quickly?

@electrum electrum mentioned this pull request Jul 19, 2022
@electrum
Copy link
Copy Markdown
Member Author

@findepi extracted to #13236

@electrum
Copy link
Copy Markdown
Member Author

@lhofhansl thanks for the feedback! That's great to hear.

Copy link
Copy Markdown
Member

@djsagain djsagain left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lots of very nice cleanups in this PR!

I didn't take the time to figure out Iceberg's StructLikeSet and StructProjection, but the rest looked right to me.

@findepi
Copy link
Copy Markdown
Member

findepi commented Jul 20, 2022

@findepi extracted to #13236

thanks, approved that one

@electrum electrum force-pushed the deletefilter branch 2 times, most recently from f8db58e to 2ed07b9 Compare July 21, 2022 03:32
@electrum electrum requested review from ebyhr and findepi July 21, 2022 03:44
@findepi findepi requested a review from alexjo2144 July 22, 2022 14:26
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This creates intermediate nodes. It would be more efficient to have a composite flat "and" predicate with a list of sub-predicates

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I considered that originally, but I'm not sure it would be more efficient. We expect to only have one or a few filters, so with inlining, the JVM might be able to de-virtualize and flatten the calls into a single expression. The flat list seems to have more overhead for the common cases.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can optimize this later if it becomes an issue. An obvious one is to combine all the deletes into a single bitmap. For equality, we could use method handle combinators to combine the predicates, and possibly also to "compile" the equality comparison expression.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I combined position deletes into a single bitmap.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We probably need to track memory contained in this field to prevent worker node crashes.
See #9914
cc @losipiuk

Copy link
Copy Markdown
Member Author

@electrum electrum Jul 24, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, though we aren't doing that today, so this isn't a regression. We can do that as a follow up. I didn't see an easy way to get memory usage from RoaringBitmap. We could serialize and use that as an estimate, or do a simple estimate based on the cardinality (multiply by some constant factor).

Copy link
Copy Markdown
Member

@alexjo2144 alexjo2144 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Development

Successfully merging this pull request may close these issues.

Iceberg scanning with Delete Files is extremely/unusably slow

6 participants