core/state/snapshot: snapshot generation shutdown race condition#33540
Open
JonathanOppenheimer wants to merge 7 commits intoethereum:masterfrom
Open
Conversation
263f75c to
1e675e3
Compare
core/state/snapshot: snapshot generation shutdown race condition
core/state/snapshot: snapshot generation shutdown race conditionCo-authored-by: Tsvetan Dimitrov tsvetan.dimitrov@avalabs.org
alarso16
reviewed
Jan 13, 2026
|
|
||
| abort = <-dl.genAbort | ||
| abort <- stats | ||
| dl.genStats = stats |
Contributor
There was a problem hiding this comment.
Where is this being read from? Kinda suspicious, looks like it might need to be in a lock
Author
There was a problem hiding this comment.
No this doesn't need a lock -- it's only read after stopGeneration() has terminated -- snapshot.go:520-631
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Overview
This PR fixes a race condition during blockchain shutdown where snapshot generation could continue accessing the trie database after it has been closed, leading to iterator errors. We noticed this in one of our nodes on https://github.com/ava-labs/avalanchego, which relies on an older version of geth with the same issue (so this behavior does happen!).
During node shutdown, the following sequence occurs:
BlockChain.Stop()callssnaps.Release()to clean up snapshot resourcesRelease()only resets the cache but doesn't stop the generator goroutinetriedb.Close()"Generator failed to iterate storage trie")Problem
There are three related bugs:
Release()doesn't stop generation: ThediskLayer.Release()method only resets the cache without stopping ongoing snapshot generation, leaving the generator goroutine running after database closure.stopGeneration()has an incorrect completion check: ThestopGeneration()method checksgenMarker != nilto determine if generation is running. However,genMarkeris set to nil when generation completes successfully, even though the generator goroutine is still waiting for the abort signal at the end ofgenerate(). See line 705 ingenerate.go:go-ethereum/core/state/snapshot/generate.go
Lines 699 to 707 in eaaa5b7
stopGeneration()returns early without sending the abort signal.stopGeneration()or sends the abort signal to the generator, causing the generator to access a closed database and error.Fix
diskLayer.Release()to callstopGeneration()before releasing resourcesstopGeneration()to properly and safely stop snapshot generationTestReleaseStopsGenerationto verify the fix and prevent regression. The test fails without the fix and passes with it.Release()Note that this fix follows the same pattern used in
Tree.Disable()in #30040, which introducedstopGeneration()for use inDisable()andRebuild()but didn't address the shutdown path.