-
Notifications
You must be signed in to change notification settings - Fork 2.9k
Core: Replace Set with Bitmap to make delete filtering simpler and faster #3535
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
| } | ||
|
|
||
| @Override | ||
| public boolean deleted(long position) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a reason why we didn't use isDeleted? That is the normal convention for boolean methods.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Made the change in the new commit.
| return positionDeleteIndex; | ||
| } catch (IOException e) { | ||
| throw new UncheckedIOException("Failed to close position delete source", e); | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this method be named toPositionDeleteIndex? The way PositionDeleteIndex is modeled, it could be implemented by a set.
|
@flyrain, the data on #3287 supports the idea of using a bitmap instead of a set of long values. But this PR actually removes the streaming delete filter, which doesn't load all of the deletes into memory at the same time. Instead, it streams through the delete file to avoid keeping deletes in memory. Can you provide data comparing the memory consumption and performance of streaming deletes? |
rdblue
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this needs further discussion. Although there is evidence that a bitmap is better for memory consumption than set of positions, the effect of this PR is actually to remove the more memory efficient streaming delete filter.
I think that this needs to either only replace set with bitmap (as in the summary) or we need more justification that removing the streaming version is also a good idea.
|
Thanks @rdblue for the review. Sorry for the delayed update. I agree we need more justification to remove streaming version. I will try to create another one with perf test for that. I'll update this PR to replace set with bitmap, and the refactor you mentioned for the class |
87bd108 to
9edb689
Compare
|
It'd be nice to get into 0.13 release since it has the change of API |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mostly look good to me, while waiting for the performance test results.
There are quite a few interface changes for the public Deletes class, ideally I think we should just preserve the names or at least mark the old one as deprecated for one release, but I don't know if there is really any place that is depending on this. I feel it's public more for the purpose of the ease of use.
|
|
||
| @Override | ||
| public boolean deleted(long position) { | ||
| public boolean isDeleted(long position) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why changing the name?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is suggested by @rdblue. Make sense to me as well. It is less ambiguous.
|
Thanks @jackye1995 for the review. Made the change per your suggestion. To be clear, there is no need of perf test in this PR. I will open another PR for replacing the streaming part, which needs the perf test. I agreed with you about the interface. To add to it, the interface |
|
Hi @rdblue and @jackye1995, Happy new yea! Do you get a chance to take a look this PR? |
|
Thanks, @flyrain! |
|
Thanks @rdblue and @jackye1995 for the review! |
This PR replaces
Set<Long>withRoaring64Bitmapto store the deleted positions for filtering. Bitmap use significant less memory than a set, here is a benchmark. With that, we don't have to use streaming filter in case of large amount of positions, we can keep all deleted positions in memory.cc @aokolnychyi @rdblue @RussellSpitzer @karuppayya @szehon-ho