Skip to content

Conversation

@rdblue
Copy link
Contributor

@rdblue rdblue commented Aug 7, 2020

This adds set-based filter implementations for equality and position deletes. Equality deletes use the StructLikeSet added in #1307.

Copy link
Contributor

@JingsongLi JingsongLi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @rdblue , Looks very nice.

public static <T> CloseableIterable<T> positionSetFilter(CloseableIterable<T> rows,
Function<T, Long> rowToPosition,
CloseableIterable<Long> posDeletes) {
try (CloseableIterable<Long> deletes = posDeletes) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we have a static method toLongSet(We can optimize it to primitive long set in future) in CloseableIterable(Or some other places)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added the set methods as you suggested, which should make this a bit cleaner. Thanks!

try (CloseableIterable<StructLike> deletes = eqDeletes) {
CloseableIterator<StructLike> eqDeleteIterator = deletes.iterator();
if (eqDeleteIterator.hasNext()) {
StructLikeSet deleteSet = StructLikeSet.create(eqType);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe this set can be reused? A set gets bigger and bigger in merging?
So can we have a method like addAll(CloseableIterable<StructLike>) in StructLikeSet?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I refactored this so that the filter method accepts a set. That will make them reusable.

CloseableIterator<StructLike> eqDeleteIterator = deletes.iterator();
if (eqDeleteIterator.hasNext()) {
StructLikeSet deleteSet = StructLikeSet.create(eqType);
Iterators.addAll(deleteSet, eqDeleteIterator);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure I get this part. Do we create StructLikeSet for every entry in eqDeletes?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No. The outer hasNext check is used to see whether we need to filter at all. If there are no equality deletes then we just return rows in the else case. This could happen if we filter the deletes using the scan predicates. If you're looking for a specific ID that is also used for a delete, then you only need to merge in the deletes with that ID.

To fill the delete set, we call Iterators.addAll that will add all of the remaining items from an iterator to a collection.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it now.

@rdblue
Copy link
Contributor Author

rdblue commented Aug 12, 2020

Thanks for reviewing, @JingsongLi and @aokolnychyi!

@rdblue rdblue merged commit 914ea8e into apache:master Aug 12, 2020
cmathiesen pushed a commit to ExpediaGroup/iceberg that referenced this pull request Aug 19, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add a row filter implementation for equality deletes

3 participants