Retry on busy and locktimeout failures for rocksdb transactions#5976
Retry on busy and locktimeout failures for rocksdb transactions#5976garyschulte wants to merge 13 commits intohyperledger:mainfrom
Conversation
|
| "RocksDB Busy exception caught on attempt {} of {}, retrying", | ||
| attemptNumber, | ||
| retryLimit); | ||
| commit(attemptNumber + 1, retryLimit); |
There was a problem hiding this comment.
do you think a direct retry without any delay will fix the problem ? maybe setting the level to info like that we can debug and see if this fix can unblock this kind of situation ?
There was a problem hiding this comment.
I haven't yet been able to reproduce a busy exception, but I bumped the logging to warn. It seems reasonable to leave it at that level since it likely will provide good signal and shouldn't spam the logs due to the transaction commit timeout.
c3e7d7c to
5d6e4f5
Compare
|
I remember that @gfukushima tried a similar approach in https://github.com/hyperledger/besu/pull/5287/files but from what recall the retry didn't fix the busy error. |
Thanks I didn't know he already tried this. Was he able correlate this to rocksdb activity? The rocks code indicates that 'Status::Busy' should be transient and the remedy is to retry. I have some instances trying to reproduce this now, hoping to get some activity metrics and and rocksdb logs to see what might be causing the contention. |
Not sure about correlating it to rocksdb activity. Too long ago. He is on holiday right now. Have asked around, hopefully someone remembers more than I do. Agree that doing a retry definitely seems like the right thing to do. |
|
…tions Signed-off-by: garyschulte <garyschulte@gmail.com>
Signed-off-by: garyschulte <garyschulte@gmail.com>
Signed-off-by: garyschulte <garyschulte@gmail.com>
Signed-off-by: garyschulte <garyschulte@gmail.com>
Signed-off-by: garyschulte <garyschulte@gmail.com>
Signed-off-by: garyschulte <garyschulte@gmail.com>
Signed-off-by: garyschulte <garyschulte@gmail.com>
Signed-off-by: garyschulte <garyschulte@gmail.com>
…age parsing Signed-off-by: garyschulte <garyschulte@gmail.com>
|
I have been able to reproduce this somewhat readily by snap syncing a forest node with slow storage. I think Forest is particularly vulnerable to these transient locking/busy issues, because it is using TransactionDB mode, which locks keys during transaction write rather than during commit. So far, I have seen it exclusively on .put() rather than .commit(), so I tweaked the implementation to retry all RocksDBTransaction actions that encounter a non-fatal exception and am re-testing |
c2c0cde to
faa5dd6
Compare
|
Interestingly, this did fix one of the TimedOut(LockTimeout) exceptions on one thread on scheduler-8. But 60 seconds later the same thing happened to both scheduler-8 and scheduler-13. It looks like they were both trying to obtain locks and ended up in a deadlock kind of situation: It is interesting that the retry worked for the first lock timeout, but 8 and 13 together seemed to be deadlocked in the same time frame, for the same timeout duration. Perhaps what we might need to do is introduce a static timeout supplier that forces an incremental backoff wait. That way at least one of the threads might be able to get the lock and progress while the other one waits. I will try that on this node since it seems to readily reproduce it within an hour or so of starting a mainnet sync. edit: also, I would swear we have a write timeout that is longer than 1 second?? edit-edit: Looking at the RocksDB LOG file, there is a suspicious GAP in the otherwise incrediby chatty rocksdb log when the LockTimeouts occurred: rocksdb is usually writing tens of log lines per second. it is quite conspicuous for there to be a 5 second gap. This might be correlation, but it seems odd to not have any activity logged, regardless of 2 (of many) transactions getting lock timeouts 🤔 |
…onDB Signed-off-by: garyschulte <garyschulte@gmail.com>
Signed-off-by: garyschulte <garyschulte@gmail.com>
|
If the bug arrives largely during the sync. I can try to make a logic of retry but at the pipeline level in addition to this pr. Instead of failing pipeline on a lock I restart the step later. Your PR is trying to manage the commit in general but I think we can also try to have better management of rocksdb error on the pipeline side too |
Signed-off-by: garyschulte <garyschulte@gmail.com>
There seems to be a variety of places this can occur. I encountered LockTimeout most often during sync in state trie commit visitors, when they are just doing a transaction I think it is better to address it systemically rather than having individual processes duplicate the retry logic. If we find that some code paths are particularly prone to this or need special care, we can always catch the storage exception that is eventually thrown and apply process-specific logic there. |
Ok i will try to also start some sync next week hoping to have it with bonsai |
|
@matkt - perhaps I spoke too soon about not trying to handle this in the sync process. With an incremental backoff retry, I am still seeing LockTimeouts. Sometimes a single thread will get locktimeouts and will resolve, but inevitably two separate threads are trying to write to overlapping key ranges and a deadlock occurs. It doesn't matter how long we wait since each process/transaction is trying to get a lock on the keys of the other. This doesn't happen with OptimisticTransactionDB because conflicts are resolved at commit time rather than transaction write. We will have to do some instrumenting/logging to test this theory, but I think we have 2 separate worldstate download threads that are trying to write to some of the same trie keys during the commit phase. However... I suspect the better option is going to be to ditch TransactionDB for forest and go back to OptimisticTransactionDB with transaction retries. IIRC, we reverted forest to TransactionDB to alleviate "Busy" exceptions. The current retry strategy in this PR might make TransactionDB unnecessary for forest, and we can avoid the concurrency/deadlock issue. |
For the sync part. The fix can be very simple, that's why I propose this. It is juste to catch the error during a persist and to marke the task as failed. The pipeline therefore will process this task again later with the pipeline .Apart from that I think that the probability of having this error outside the sync should be limited. Because we do not have too many other cases where we have several threads that persist in the Worldstate at tthe same |
I think timeout is the non Optimistic version of Busy |
Seems reasonable. It isn't obvious that allowing both threads to write is going to result in a correct trie either, so allowing one or both to fail and retry the task later is probably the better strategy. Looking back over all of the "Busy" issues, it seems all of them have been related to worldstate sync. If we handle the sync failure gracefully in the pipeline, there really isn't any need for TransactionDB anymore afaict. We know that Optimistic is more performant, so if TransactionDB isn't 'fixing' anything, we should remove it as well. |
Signed-off-by: garyschulte <garyschulte@gmail.com>
9a7f850 to
5e7c92e
Compare
|
closing this in favor of #6004 |
PR description
Simple default retry policy for rocksdb busy and locktimeout exceptions.
**This may or may not remedy the busy exceptions encountered in the listed issues, but it will at a minimum provide better signal regarding whether these are transient issues due to compaction, or more systemic issues
Fixed Issue(s)
related to #5911
related to #5549
related to #5807