[Bug] Broker became irresponsive due to deadlock during race-condition in metadatastore callback #22840
Closed
3 tasks done
Labels
type/bug
The PR fixed a bug or issue reported a bug
Search before asking
Read release policy
Version
Minimal reproduce step
What did you expect to see?
This issue must be solved by addressing three different problems in the execution path
1. Bottleneck at SINGLE metadatastore callback thread
Having a single metadata store callback can easily cause deadlock in any regression. Therefore, we must have multiple or the same number of callback threads as number of IO threads so that, callback thread doesn't face bottleneck and doesn't cause any deadlock in the system.
PR: #22841
2. Make BookKeeper client creation async
BookKeeper client creation path has a blocking call and blocks managed-ledger creation, and that eventually causes the deadlock. therefore, managed-ledger creation must create bk-client asynchronously.
PR: #22842
3. Prevent forever wait on bk-client blocking creation
Currently, Bk-client has multiple blocking calls with metadata-store and doesn't have a timeout. introducing timeout and retry helps the bk-client break the deadlock and complete the execution path.
PR: #22843
What did you see instead?
deadlock
Anything else?
No response
Are you willing to submit a PR?
The text was updated successfully, but these errors were encountered: