Fixed restoring state-db journals on startup#8494
Conversation
|
The code change looks good to me. |
|
I've considered this. The limit can't be used as an attack vector, because only valid blocks are imported into the DB. And with BABE/VRF it is astronomically unlikely that 32 valid blocks may be produced at the same height. |
|
I was not thinking of babe producing 32 blocks at once (astronomical indeed), but babe producing five time 2 block over a period without finalization (finalization stalling for a quiet long time). |
cheme
left a comment
There was a problem hiding this comment.
LGTM, maybe the limit to 32 fork could be written in top level documentation (or configurable).
| //! Database engine uses a notion of canonicality, rather then finality. A canonical block may not be yet finalized | ||
| //! from the perspective of the consensus engine, but it still can't be reverted in the database. Most of the time | ||
| //! during normal operation last canonical block is the same as lst finalized. However if finality stall for a | ||
| //! long duration for some reason, there's only a certain number of blocks that can fit in the non-canonical overlay, |
There was a problem hiding this comment.
So if we actually could not re-org on long non-finalized forks?
There was a problem hiding this comment.
Yes, If the fork is longer than 4096 blocks. That's why we recommend running with archive mode on validators.
| index += 1; | ||
| total += 1; | ||
| }, | ||
| None => break, |
There was a problem hiding this comment.
Ahh, so this was the bug? There was some gap and we did not loaded the block(s) after the gap?
Co-authored-by: Bastian Köcher <bkchr@users.noreply.github.com>
* Fixed restoring state-db journals on startup * Improved documentation a bit * Update client/state-db/src/lib.rs Co-authored-by: Bastian Köcher <bkchr@users.noreply.github.com> Co-authored-by: Bastian Köcher <bkchr@users.noreply.github.com>
* Fixed restoring state-db journals on startup * Improved documentation a bit * Update client/state-db/src/lib.rs Co-authored-by: Bastian Köcher <bkchr@users.noreply.github.com> Co-authored-by: Bastian Köcher <bkchr@users.noreply.github.com>
Currently when a block
Nis canonicalized, all discarded branches are removed from the canonicalization overlay immediately (unless pinned). They are also removed from the DB. The may lead to a situation, when out of two journals, for blockN+1andN+1', the former is discarded and the latter remains in the DB.On startup the journals are loaded by trying DB keys for each block number, increasing the journal sub-index starting from 0. So if sub-index 0 is missing and sub-index 1 is there, it won't be loaded.
On kusama/polkadot this has low chance of happening, because forks of more than 1 block in length are rare and the node needs to be stopped/restarted on such a fork for the issue to happen. On rococo forks are more likely due to increased finality window and there are a lot of restart because of staling issues, which help finding this issue.
I've considered a fix that keeps track of journal counts for each block number, or involves organising journals in a linked list. Ultimately this requires updating an extra database key for each commit. So I've opted for a solution that tries a limited number of possible journals on startup. This very slightly increases startup time, but requires no extra writes when running.