Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
[fix][broker] Fix deadlock while skip non-recoverable ledgers. (apach…
…e#21915) ### Motivation The sequence of events leading to the deadlock when methods from org.apache.bookkeeper.mledger.impl.ManagedCursorImpl are invoked concurrently is as follows: 1. Thread A calls asyncDelete, which then goes on to internally call internalAsyncMarkDelete. This results in acquiring a lock on pendingMarkDeleteOps through synchronized (pendingMarkDeleteOps). 2. Inside internalAsyncMarkDelete, internalMarkDelete is called which subsequently calls persistPositionToLedger. At the start of persistPositionToLedger, buildIndividualDeletedMessageRanges is invoked, where it tries to acquire a read lock using lock.readLock().lock(). At this point, if the write lock is being held by another thread, Thread A will block waiting for the read lock. 3. Concurrently, Thread B executes skipNonRecoverableLedger which first obtains a write lock using lock.writeLock().lock() and then proceeds to call asyncDelete. 4. At this moment, Thread B already holds the write lock and is attempting to acquire the synchronized lock on pendingMarkDeleteOps that Thread A already holds, while Thread A is waiting for the read lock that Thread B needs to release. In code, the deadlock appears as follows: Thread A: synchronized (pendingMarkDeleteOps) -> lock.readLock().lock() (waiting) Thread B: lock.writeLock().lock() -> synchronized (pendingMarkDeleteOps) (waiting) ### Modifications Avoid using a long-range lock. Co-authored-by: ruihongzhou <[email protected]> Co-authored-by: Jiwe Guo <[email protected]> Co-authored-by: Lari Hotari <[email protected]>
- Loading branch information