Retry on busy and locktimeout failures for rocksdb transactions by garyschulte · Pull Request #5976 · hyperledger/besu

garyschulte · 2023-10-02T19:35:24Z

PR description

Simple default retry policy for rocksdb busy and locktimeout exceptions.

**This may or may not remedy the busy exceptions encountered in the listed issues, but it will at a minimum provide better signal regarding whether these are transient issues due to compaction, or more systemic issues

Fixed Issue(s)

related to #5911
related to #5549
related to #5807

github-actions · 2023-10-02T19:35:39Z

I thought about documentation and added the doc-change-required label to this PR if updates are required.
I thought about the changelog and included a changelog update if required.
If my PR includes database changes (e.g. KeyValueSegmentIdentifier) I have thought about compatibility and performed forwards and backwards compatibility tests

matkt · 2023-10-03T07:27:20Z

...b/src/main/java/org/hyperledger/besu/plugin/services/storage/rocksdb/RocksDBTransaction.java

+            "RocksDB Busy exception caught on attempt {} of {}, retrying",
+            attemptNumber,
+            retryLimit);
+        commit(attemptNumber + 1, retryLimit);


do you think a direct retry without any delay will fix the problem ? maybe setting the level to info like that we can debug and see if this fix can unblock this kind of situation ?

I haven't yet been able to reproduce a busy exception, but I bumped the logging to warn. It seems reasonable to leave it at that level since it likely will provide good signal and shouldn't spam the logs due to the transaction commit timeout.

jframe · 2023-10-05T01:09:40Z

I remember that @gfukushima tried a similar approach in https://github.com/hyperledger/besu/pull/5287/files but from what recall the retry didn't fix the busy error.

garyschulte · 2023-10-05T03:20:56Z

I remember that @gfukushima tried a similar approach in https://github.com/hyperledger/besu/pull/5287/files but from what recall the retry didn't fix the busy error.

Thanks I didn't know he already tried this.

Was he able correlate this to rocksdb activity? The rocks code indicates that 'Status::Busy' should be transient and the remedy is to retry. I have some instances trying to reproduce this now, hoping to get some activity metrics and and rocksdb logs to see what might be causing the contention.

jframe · 2023-10-05T05:20:58Z

I remember that @gfukushima tried a similar approach in https://github.com/hyperledger/besu/pull/5287/files but from what recall the retry didn't fix the busy error.

Thanks I didn't know he already tried this.

Was he able correlate this to rocksdb activity? The rocks code indicates that 'Status::Busy' should be transient and the remedy is to retry. I have some instances trying to reproduce this now, hoping to get some activity metrics and and rocksdb logs to see what might be causing the contention.

Not sure about correlating it to rocksdb activity. Too long ago. He is on holiday right now. Have asked around, hopefully someone remembers more than I do.

Agree that doing a retry definitely seems like the right thing to do.

siladu · 2023-10-05T05:57:36Z

hopefully someone remembers more than I do.
Maybe @matkt ?

…tions Signed-off-by: garyschulte <garyschulte@gmail.com>

Signed-off-by: garyschulte <garyschulte@gmail.com>

…age parsing Signed-off-by: garyschulte <garyschulte@gmail.com>

garyschulte · 2023-10-05T21:20:42Z

I have been able to reproduce this somewhat readily by snap syncing a forest node with slow storage.

I think Forest is particularly vulnerable to these transient locking/busy issues, because it is using TransactionDB mode, which locks keys during transaction write rather than during commit.

So far, I have seen it exclusively on .put() rather than .commit(), so I tweaked the implementation to retry all RocksDBTransaction actions that encounter a non-fatal exception and am re-testing

garyschulte · 2023-10-06T04:20:22Z

Interestingly, this did fix one of the TimedOut(LockTimeout) exceptions on one thread on scheduler-8. But 60 seconds later the same thing happened to both scheduler-8 and scheduler-13. It looks like they were both trying to obtain locks and ended up in a deadlock kind of situation:

{"@timestamp":"2023-10-05T23:50:50,129","level":"WARN","thread":"EthScheduler-Services-8 (batchPersistStorageData)","class":"RocksDBTransaction","message":"RocksDB Transient exception caught on attempt 0 of 3, status: TimedOut(LockTimeout), retrying.","throwable":""}
{"@timestamp":"2023-10-05T23:51:45,556","level":"WARN","thread":"EthScheduler-Services-8 (batchPersistStorageData)","class":"RocksDBTransaction","message":"RocksDB Transient exception caught on attempt 0 of 3, status: TimedOut(LockTimeout), retrying.","throwable":""}
{"@timestamp":"2023-10-05T23:51:45,627","level":"WARN","thread":"EthScheduler-Services-13 (batchPersistLargeStorageData)","class":"RocksDBTransaction","message":"RocksDB Transient exception caught on attempt 0 of 3, status: TimedOut(LockTimeout), retrying.","throwable":""}
{"@timestamp":"2023-10-05T23:51:46,557","level":"WARN","thread":"EthScheduler-Services-8 (batchPersistStorageData)","class":"RocksDBTransaction","message":"RocksDB Transient exception caught on attempt 1 of 3, status: TimedOut(LockTimeout), retrying.","throwable":""}
{"@timestamp":"2023-10-05T23:51:46,630","level":"WARN","thread":"EthScheduler-Services-13 (batchPersistLargeStorageData)","class":"RocksDBTransaction","message":"RocksDB Transient exception caught on attempt 1 of 3, status: TimedOut(LockTimeout), retrying.","throwable":""}
{"@timestamp":"2023-10-05T23:51:47,558","level":"WARN","thread":"EthScheduler-Services-8 (batchPersistStorageData)","class":"RocksDBTransaction","message":"RocksDB Transient exception caught on attempt 2 of 3, status: TimedOut(LockTimeout), retrying.","throwable":""}
{"@timestamp":"2023-10-05T23:51:47,630","level":"WARN","thread":"EthScheduler-Services-13 (batchPersistLargeStorageData)","class":"RocksDBTransaction","message":"RocksDB Transient exception caught on attempt 2 of 3, status: TimedOut(LockTimeout), retrying.","throwable":""}
{"@timestamp":"2023-10-05T23:51:48,558","level":"WARN","thread":"EthScheduler-Services-8 (batchPersistStorageData)","class":"RocksDBTransaction","message":"RocksDB Transient exception caught on attempt 3 of 3, status: TimedOut(LockTimeout), retrying.","throwable":""}
{"@timestamp":"2023-10-05T23:51:48,631","level":"WARN","thread":"EthScheduler-Services-13 (batchPersistLargeStorageData)","class":"RocksDBTransaction","message":"RocksDB Transient exception caught on attempt 3 of 3, status: TimedOut(LockTimeout), retrying.","throwable":""}

It is interesting that the retry worked for the first lock timeout, but 8 and 13 together seemed to be deadlocked in the same time frame, for the same timeout duration. Perhaps what we might need to do is introduce a static timeout supplier that forces an incremental backoff wait. That way at least one of the threads might be able to get the lock and progress while the other one waits.

I will try that on this node since it seems to readily reproduce it within an hour or so of starting a mainnet sync.

edit: also, I would swear we have a write timeout that is longer than 1 second??

edit-edit: Looking at the RocksDB LOG file, there is a suspicious GAP in the otherwise incrediby chatty rocksdb log when the LockTimeouts occurred:

2023/10/05-23:51:43.520518 139977706960448 EVENT_LOG_v1 {"time_micros": 1696549903520516, "job": 789, "event": "table_file_deletion", "file_number": 2509}
2023/10/05-23:51:43.521454 139977706960448 [le/delete_scheduler.cc:77] Deleted file /data/besu/database/002508.sst immediately, rate_bytes_per_sec 0, total_trash_size 0 max_trash_db_ratio 0.250000
2023/10/05-23:51:43.521469 139977706960448 EVENT_LOG_v1 {"time_micros": 1696549903521467, "job": 789, "event": "table_file_deletion", "file_number": 2508}
2023/10/05-23:51:48.571692 139977323705920 [/db_impl/db_impl_write.cc:2155] [] New memtable created with log file: #2852. Immutable memtables: 0.
2023/10/05-23:51:48.571768 139977673389632 [/db_impl/db_impl_compaction_flush.cc:105] [JOB 793] Syncing log #2848
2023/10/05-23:51:48.873910 139977673389632 (Original Log Time 2023/10/05-23:51:48.571755) [/db_impl/db_impl_compaction_flush.cc:2960] Calling FlushMemTableToOutputFile with column family [], flush slots ava

rocksdb is usually writing tens of log lines per second. it is quite conspicuous for there to be a 5 second gap. This might be correlation, but it seems odd to not have any activity logged, regardless of 2 (of many) transactions getting lock timeouts 🤔

…onDB Signed-off-by: garyschulte <garyschulte@gmail.com>

Signed-off-by: garyschulte <garyschulte@gmail.com>

matkt · 2023-10-06T06:43:00Z

If the bug arrives largely during the sync. I can try to make a logic of retry but at the pipeline level in addition to this pr. Instead of failing pipeline on a lock I restart the step later. Your PR is trying to manage the commit in general but I think we can also try to have better management of rocksdb error on the pipeline side too

Signed-off-by: garyschulte <garyschulte@gmail.com>

garyschulte · 2023-10-06T16:18:17Z

If the bug arrives largely during the sync. I can try to make a logic of retry but at the pipeline level in addition to this pr. Instead of failing pipeline on a lock I restart the step later. Your PR is trying to manage the commit in general but I think we can also try to have better management of rocksdb error on the pipeline side too

There seems to be a variety of places this can occur. I encountered LockTimeout most often during sync in state trie commit visitors, when they are just doing a transaction put(). I haven't been able to reproduce the Busy exception during a bonsai sync, but Optimistic transaction db seems to have a different behavior/pattern since it resolves locks at transaction .commit() time.

I think it is better to address it systemically rather than having individual processes duplicate the retry logic. If we find that some code paths are particularly prone to this or need special care, we can always catch the storage exception that is eventually thrown and apply process-specific logic there.

matkt · 2023-10-06T18:24:23Z

If the bug arrives largely during the sync. I can try to make a logic of retry but at the pipeline level in addition to this pr. Instead of failing pipeline on a lock I restart the step later. Your PR is trying to manage the commit in general but I think we can also try to have better management of rocksdb error on the pipeline side too

There seems to be a variety of places this can occur. I encountered LockTimeout most often during sync in state trie commit visitors, when they are just doing a transaction put(). I haven't been able to reproduce the Busy exception during a bonsai sync, but Optimistic transaction db seems to have a different behavior/pattern since it resolves locks at transaction .commit() time.

I think it is better to address it systemically rather than having individual processes duplicate the retry logic. If we find that some code paths are particularly prone to this or need special care, we can always catch the storage exception that is eventually thrown and apply process-specific logic there.

Ok i will try to also start some sync next week hoping to have it with bonsai

garyschulte · 2023-10-06T18:26:01Z

@matkt - perhaps I spoke too soon about not trying to handle this in the sync process. With an incremental backoff retry, I am still seeing LockTimeouts. Sometimes a single thread will get locktimeouts and will resolve, but inevitably two separate threads are trying to write to overlapping key ranges and a deadlock occurs. It doesn't matter how long we wait since each process/transaction is trying to get a lock on the keys of the other. This doesn't happen with OptimisticTransactionDB because conflicts are resolved at commit time rather than transaction write.

We will have to do some instrumenting/logging to test this theory, but I think we have 2 separate worldstate download threads that are trying to write to some of the same trie keys during the commit phase.

However... I suspect the better option is going to be to ditch TransactionDB for forest and go back to OptimisticTransactionDB with transaction retries. IIRC, we reverted forest to TransactionDB to alleviate "Busy" exceptions. The current retry strategy in this PR might make TransactionDB unnecessary for forest, and we can avoid the concurrency/deadlock issue.

matkt · 2023-10-06T18:44:57Z

@matkt - perhaps I spoke too soon about not trying to handle this in the sync process. With an incremental backoff retry, I am still seeing LockTimeouts. Sometimes a single thread will get locktimeouts and will resolve, but inevitably two separate threads are trying to write to overlapping key ranges and a deadlock occurs. It doesn't matter how long we wait since each process/transaction is trying to get a lock on the keys of the other. This doesn't happen with OptimisticTransactionDB because conflicts are resolved at commit time rather than write.

We will have to do some instrumenting/logging to test this theory, but I think we have 2 separate worldstate download threads that are trying to write to some of the same trie keys during the commit phase.

However... I suspect the better option is going to be to ditch TransactionDB for forest and go back to OptimisticTransactionDB with transaction retries. IIRC, we reverted forest to TransactionDB to alleviate "Busy" exceptions. The current retry strategy in this PR might make TransactionDB unnecessary for forest, and we can avoid the concurrency/deadlock issue.

For the sync part. The fix can be very simple, that's why I propose this. It is juste to catch the error during a persist and to marke the task as failed. The pipeline therefore will process this task again later with the pipeline .Apart from that I think that the probability of having this error outside the sync should be limited. Because we do not have too many other cases where we have several threads that persist in the Worldstate at tthe same

matkt · 2023-10-06T18:46:01Z

@matkt - perhaps I spoke too soon about not trying to handle this in the sync process. With an incremental backoff retry, I am still seeing LockTimeouts. Sometimes a single thread will get locktimeouts and will resolve, but inevitably two separate threads are trying to write to overlapping key ranges and a deadlock occurs. It doesn't matter how long we wait since each process/transaction is trying to get a lock on the keys of the other. This doesn't happen with OptimisticTransactionDB because conflicts are resolved at commit time rather than write.
We will have to do some instrumenting/logging to test this theory, but I think we have 2 separate worldstate download threads that are trying to write to some of the same trie keys during the commit phase.
However... I suspect the better option is going to be to ditch TransactionDB for forest and go back to OptimisticTransactionDB with transaction retries. IIRC, we reverted forest to TransactionDB to alleviate "Busy" exceptions. The current retry strategy in this PR might make TransactionDB unnecessary for forest, and we can avoid the concurrency/deadlock issue.

For the sync part. The fix can be very simple, that's why I propose this. It is juste to catch the error during a persist and to marke the task as failed. The pipeline therefore will process this task again later with the pipeline

I think timeout is the non Optimistic version of Busy

garyschulte · 2023-10-06T19:00:26Z

For the sync part. The fix can be very simple, that's why I propose this. It is juste to catch the error during a persist and to marke the task as failed. The pipeline therefore will process this task again later with the pipeline .Apart from that I think that the probability of having this error outside the sync should be limited. Because we do not have too many other cases where we have several threads that persist in the Worldstate at tthe same

Seems reasonable. It isn't obvious that allowing both threads to write is going to result in a correct trie either, so allowing one or both to fail and retry the task later is probably the better strategy.

Looking back over all of the "Busy" issues, it seems all of them have been related to worldstate sync. If we handle the sync failure gracefully in the pipeline, there really isn't any need for TransactionDB anymore afaict. We know that Optimistic is more performant, so if TransactionDB isn't 'fixing' anything, we should remove it as well.

Signed-off-by: garyschulte <garyschulte@gmail.com>

garyschulte · 2023-10-20T20:33:14Z

closing this in favor of #6004

matkt reviewed Oct 3, 2023

View reviewed changes

garyschulte force-pushed the feature/rocks-tx-retry branch from c3e7d7c to 5d6e4f5 Compare October 4, 2023 19:06

garyschulte closed this Oct 5, 2023

garyschulte added 9 commits October 5, 2023 11:58

implicit 3x retry on busy and locktimeout failures for rocksdb transc…

0d9b425

…tions Signed-off-by: garyschulte <garyschulte@gmail.com>

use real tx commit method when asserting repeated closed does not throw

0626c66

Signed-off-by: garyschulte <garyschulte@gmail.com>

spdx header, javadoc, and bump logging to info for BUSY exceptions

267e418

Signed-off-by: garyschulte <garyschulte@gmail.com>

nuisance javadoc and rebase main

1f661e4

Signed-off-by: garyschulte <garyschulte@gmail.com>

overly picky javadoc linter

c3c7408

Signed-off-by: garyschulte <garyschulte@gmail.com>

use warn for rocksdb retry exceptions

c14692e

Signed-off-by: garyschulte <garyschulte@gmail.com>

use contains rather than equals for rocksdb exception parsing

19657d8

Signed-off-by: garyschulte <garyschulte@gmail.com>

wrap all rockdb transactions in a retryable

31ed321

Signed-off-by: garyschulte <garyschulte@gmail.com>

use status.code for identifying transient exceptions rather than mess…

faa5dd6

…age parsing Signed-off-by: garyschulte <garyschulte@gmail.com>

garyschulte reopened this Oct 5, 2023

garyschulte force-pushed the feature/rocks-tx-retry branch from c2c0cde to faa5dd6 Compare October 5, 2023 21:24

garyschulte added 2 commits October 5, 2023 21:45

add deadlock detect and set a lock timeout of 5 seconds for Transacti…

2c9c9c9

…onDB Signed-off-by: garyschulte <garyschulte@gmail.com>

add deadlock incremental backoff

df125b0

Signed-off-by: garyschulte <garyschulte@gmail.com>

add retryBackoff for all retryable operations

7a4c564

Signed-off-by: garyschulte <garyschulte@gmail.com>

garyschulte marked this pull request as draft October 6, 2023 18:51

removed transactiondb for discovery only

5e7c92e

Signed-off-by: garyschulte <garyschulte@gmail.com>

garyschulte force-pushed the feature/rocks-tx-retry branch from 9a7f850 to 5e7c92e Compare October 6, 2023 19:07

garyschulte closed this Oct 24, 2023

garyschulte deleted the feature/rocks-tx-retry branch January 24, 2024 21:35

matthew1001 mentioned this pull request Mar 12, 2024

Invalid block detected (BREACH_OF_PROTOCOL) | IBFT 2.0 #6680

Closed

Conversation

garyschulte commented Oct 2, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR description

Fixed Issue(s)

Uh oh!

github-actions bot commented Oct 2, 2023

Uh oh!

matkt Oct 3, 2023

Choose a reason for hiding this comment

Uh oh!

garyschulte Oct 4, 2023

Choose a reason for hiding this comment

Uh oh!

jframe commented Oct 5, 2023

Uh oh!

garyschulte commented Oct 5, 2023

Uh oh!

jframe commented Oct 5, 2023

Uh oh!

siladu commented Oct 5, 2023

Uh oh!

garyschulte commented Oct 5, 2023

Uh oh!

garyschulte commented Oct 6, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

matkt commented Oct 6, 2023

Uh oh!

garyschulte commented Oct 6, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

matkt commented Oct 6, 2023

Uh oh!

garyschulte commented Oct 6, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

matkt commented Oct 6, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

matkt commented Oct 6, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

garyschulte commented Oct 6, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

garyschulte commented Oct 20, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

garyschulte commented Oct 2, 2023 •

edited

Loading

garyschulte commented Oct 6, 2023 •

edited

Loading

garyschulte commented Oct 6, 2023 •

edited

Loading

garyschulte commented Oct 6, 2023 •

edited

Loading

matkt commented Oct 6, 2023 •

edited

Loading

matkt commented Oct 6, 2023 •

edited

Loading

garyschulte commented Oct 6, 2023 •

edited

Loading