-
Notifications
You must be signed in to change notification settings - Fork 3.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[fix] Make operations on individualDeletedMessages
in lock scope
#22966
[fix] Make operations on individualDeletedMessages
in lock scope
#22966
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that we need to find another solution. ReadWriteLock adds a lot more overhead than StampedLock.
I wonder if it would be a viable option to catch exceptions and retry with a read lock if that happens?
Yes, but RoaringBitmap is not designed for Concurrency at all, and the PR is a quick fix, we can make further improvements in the future. |
Then we may catch a lot of exceptions when a broker is in a large throughput, I'm not sure if the cost is less than RWLock or not. |
That's a valid concern, we should investigate the different choices and experiment. |
I think that we should revert the migration to RoaringBitSet in branch-3.0, branch-3.2 and branch-3.3 so that we don't need to rush with the solution. |
I reverted the changes in branch-3.0, branch-3.2 and branch-3.3. Here's the PR to revert the change in master branch: #22968 . It's better to have a fresh start with a proper fix that is validated so that it doesn't cause performance regressions and also addresses the concurrency issues. The concern about switching to ReadWriteLock is about it causing a performance regression. It's possible that it's not a valid concern, but let's validate that before applying the solution. |
I did a less rigorous test: @Test
public void test() {
long start = System.currentTimeMillis();
CountDownLatch latch = new CountDownLatch(2);
ConcurrentRoaringBitSet bitSet = new ConcurrentRoaringBitSet();
new Thread(() -> {
for (int i = 0; i < 100000000; i++) {
bitSet.set(1);
}
latch.countDown();
}).start();
new Thread(() -> {
for (int i = 0; i < 100000000; i++) {
bitSet.get(1);
}
latch.countDown();
}).start();
try {
latch.await();
} catch (InterruptedException e) {
e.printStackTrace();
}
System.out.println("Time: " + (System.currentTimeMillis() - start));
} I started 2 threads to call get/set methods on Maybe we don't need to worry about the performance regression? |
When we do Readonly operations on StampLock based ConcurrentRoaringBitSet, it does faster than ReadWriteLock(about 5 times faster), but in the case we use |
In Pulsar we have https://github.com/apache/pulsar/tree/master/microbench module with JMH. I think JMH is better for comparisons. For Pulsar, the efficiency also matters so the comparison might not be that simple. btw. In Pulsar |
It makes sense, I addressed this, PTAL |
@dao-jun Looks good, I'll soon review in more detail. Please update the PR title and description so that it describes the motivation and modifications of this PR more accurately. |
individualDeletedMessages
thread-safe
individualDeletedMessages
thread-safeindividualDeletedMessages
in lock scope
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please use write lock for individualDeletedMessages.resetDirtyKeys();
call in buildIndividualDeletedMessageRanges
method.
…currency_issue # Conflicts: # pulsar-common/src/main/java/org/apache/pulsar/common/util/collections/OpenLongPairRangeSet.java
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rename ConcurrentOpenLongPairRangeSet to OpenLongPairRangeSet and mark it as NotThreadSafe.
I guess this change and the switch to use RoaringBitSet (in version 1.1.0) was lost in rebasing?
This is actually a real bug in the current implementation and needs to be fixed even if we wouldn't switch to use RoaringBitMap's RoaringBitSet. |
One possibility would be to complete this PR by switching to the non-thread version of ConcurrentOpenLongPairRangeSet using ordinary BitSet in this PR and then switch to use RoaringBitSet in a follow up PR. It's possible that using StampedLock in ConcurrentBitSet results in similar problems as we had with StampedLock in ConcurrentRoaringBitSet. By looking at the code of BitSet, it seems that assertions in this method could fail in ConcurrentBitSet: private void checkInvariants() {
assert(wordsInUse == 0 || words[wordsInUse - 1] != 0);
assert(wordsInUse >= 0 && wordsInUse <= words.length);
assert(wordsInUse == words.length || words[wordsInUse] == 0);
} However the problems are hidden since assertions aren't commonly enabled in production. |
Yes, |
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #22966 +/- ##
============================================
- Coverage 73.57% 73.43% -0.15%
- Complexity 32624 33219 +595
============================================
Files 1877 1903 +26
Lines 139502 142680 +3178
Branches 15299 15574 +275
============================================
+ Hits 102638 104771 +2133
- Misses 28908 29891 +983
- Partials 7956 8018 +62
Flags with carried forward coverage won't be shown. Click here to find out more.
|
LGTM, good work @dao-jun |
pulsar-common/src/main/java/org/apache/pulsar/common/util/collections/OpenLongPairRangeSet.java
Show resolved
Hide resolved
…pache#22966) (cherry picked from commit dbbb6b6) (cherry picked from commit e01e90f)
…pache#22966) (cherry picked from commit dbbb6b6) (cherry picked from commit e01e90f)
Motivation
In #22908 we introduced
ConcurrentRoaringBitSet
which is based onStampLock
andRoaringBitmap
to optimize the memory usage and GC pause onBitSet
.However, there is a concurrency issue on
ConcurrentRoaringBitSet
.It will throw NPE when calling
ConcurrentRoaringBitSet#get
andConcurrentRoaringBitSet#set
in multiple threads, the situation is a little similar with #18388.see:
RoaringBitmap#add
RoaringBitmap#get
It will throw NPE if use StampLock, the situation is a little similar with #18388
Modifications
ConcurrentBitSet
ConcurrentOpenLongPairRangeSet
toOpenLongPairRangeSet
and mark it as NotThreadSafe.ManageCursorImpl#individualDeletedMessages
in ReadWriteLock scope.Verifying this change
(Please pick either of the following options)
This change is a trivial rework / code cleanup without any test coverage.
(or)
This change is already covered by existing tests, such as (please describe tests).
(or)
This change added tests and can be verified as follows:
(example:)
Does this pull request potentially affect one of the following parts:
If the box was checked, please highlight the changes
Documentation
doc
doc-required
doc-not-needed
doc-complete
Matching PR in forked repository
PR in forked repository: