ledger: add callback to clear state between commitRound retries#6190
Merged
cce merged 10 commits intoalgorand:masterfrom Dec 10, 2024
Merged
ledger: add callback to clear state between commitRound retries#6190cce merged 10 commits intoalgorand:masterfrom
cce merged 10 commits intoalgorand:masterfrom
Conversation
…e corruption in catchpointtracker
gmalouf
reviewed
Dec 5, 2024
gmalouf
reviewed
Dec 5, 2024
gmalouf
reviewed
Dec 5, 2024
gmalouf
reviewed
Dec 5, 2024
gmalouf
reviewed
Dec 5, 2024
Contributor
gmalouf
left a comment
There was a problem hiding this comment.
Left a few comments, looks okay overall.
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #6190 +/- ##
==========================================
- Coverage 51.88% 51.85% -0.04%
==========================================
Files 639 639
Lines 85489 85508 +19
==========================================
- Hits 44359 44339 -20
- Misses 38320 38354 +34
- Partials 2810 2815 +5 ☔ View full report in Codecov by Sentry. |
jannotti
reviewed
Dec 5, 2024
0feac4b to
49dad4c
Compare
algorandskiy
approved these changes
Dec 6, 2024
jannotti
approved these changes
Dec 10, 2024
gmalouf
approved these changes
Dec 10, 2024
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Some of our unit tests use an in-memory SQLite DB, rather than file-based SQLite, to make tests run faster. This requires enabling shared-cache mode so multiple goroutines can hold connections to the same in-memory DB. However, in shared cache mode, even read operations require table-level locks.
To handle these lock errors, our dbutil.go wrapper around SQLite transactions (
AtomicContext) implements retry logic, where a provided function is retried multiple times when the error issqlite3.ErrLockedorsqlite3.ErrBusy. Our unit tests that use concurrent connections to the same in-memory SQLite DB often show many retries before successfully committing, due to contention. In regular on-disk operation, shared cache mode is not enabled, and these errors do not occur.The catchpointtracker's
commitRound()function flushes a batch of round updates to a merkle trie. UnfortunatelycommitRound()cannot be safely retried inside anAtomicContext, because it updates the trie's SQLite table as well as an in-memory cache. When retries occur, the DB transaction are rolled back (along with other tracker's committed), but the in-memory data is not rolled back. This extra callback allows the catchpointtracker to clear state between retries ofcommitRound().Related: #5568
Test Plan
This should only impact tests that use in-memory SQLite, used for faster test performance, and make them more reliable. A new test
TestCatchpointTrackerFastRoundsDBRetrywas added that tries to corrupt the merkle trie was added, and is flaky (fails most of the time depending on timing/luck) without this PR.