Fixed restoring state-db journals on startup by arkpar · Pull Request #8494 · paritytech/substrate

arkpar · 2021-03-30T13:14:38Z

Currently when a block N is canonicalized, all discarded branches are removed from the canonicalization overlay immediately (unless pinned). They are also removed from the DB. The may lead to a situation, when out of two journals, for block N+1 and N+1', the former is discarded and the latter remains in the DB.

On startup the journals are loaded by trying DB keys for each block number, increasing the journal sub-index starting from 0. So if sub-index 0 is missing and sub-index 1 is there, it won't be loaded.

On kusama/polkadot this has low chance of happening, because forks of more than 1 block in length are rare and the node needs to be stopped/restarted on such a fork for the issue to happen. On rococo forks are more likely due to increased finality window and there are a lot of restart because of staling issues, which help finding this issue.

I've considered a fix that keeps track of journal counts for each block number, or involves organising journals in a linked list. Ultimately this requires updating an extra database key for each commit. So I've opted for a solution that tries a limited number of possible journals on startup. This very slightly increases startup time, but requires no extra writes when running.

cheme · 2021-03-31T07:10:11Z

The code change looks good to me.
The overhead on launch indeed doesn't seems like an issue.
I am more wondering about the new limit of 32 simultaneous fork at a given block height (is it 'D3-trivial'?).
Maybe not an issue with babe and stalled finalization (2^5), but it limits substrate possible use-cases (it may already be).

arkpar · 2021-03-31T08:10:30Z

I've considered this. The limit can't be used as an attack vector, because only valid blocks are imported into the DB. And with BABE/VRF it is astronomically unlikely that 32 valid blocks may be produced at the same height. state-db overlay is not really designed to push too many blocks into it, as it keeps everything in memory. So if an alternative consensus engine expects to have 32 forks for a block height, it will probably have to be redesigned anyway. Overall I don't think there will be an issue with any real-world chain.

cheme · 2021-03-31T08:29:39Z

I was not thinking of babe producing 32 blocks at once (astronomical indeed), but babe producing five time 2 block over a period without finalization (finalization stalling for a quiet long time).
Edit: babe probably do some fork choice so 5 is indeed big, but those I don't really know if those choice are enforced by consensus (think not). Anyway, seems very far fetched indeed.

cheme

LGTM, maybe the limit to 32 fork could be written in top level documentation (or configurable).

client/state-db/src/lib.rs

bkchr · 2021-04-03T19:58:13Z

client/state-db/src/lib.rs

+//! Database engine uses a notion of canonicality, rather then finality. A canonical block may not be yet finalized
+//! from the perspective of the consensus engine, but it still can't be reverted in the database. Most of the time
+//! during normal operation last canonical block is the same as lst finalized. However if finality stall for a
+//! long duration for some reason, there's only a certain number of blocks that can fit in the non-canonical overlay,


So if we actually could not re-org on long non-finalized forks?

Yes, If the fork is longer than 4096 blocks. That's why we recommend running with archive mode on validators.

bkchr · 2021-04-03T20:02:10Z

client/state-db/src/noncanonical.rs

-							index += 1;
-							total += 1;
-						},
-						None => break,


Ahh, so this was the bug? There was some gap and we did not loaded the block(s) after the gap?

Co-authored-by: Bastian Köcher <bkchr@users.noreply.github.com>

* Fixed restoring state-db journals on startup * Improved documentation a bit * Update client/state-db/src/lib.rs Co-authored-by: Bastian Köcher <bkchr@users.noreply.github.com> Co-authored-by: Bastian Köcher <bkchr@users.noreply.github.com>

Fixed restoring state-db journals on startup

f59ebe1

arkpar force-pushed the a-fix-state-db branch from ebb9c47 to f59ebe1 Compare March 30, 2021 13:27

cheme approved these changes Mar 31, 2021

View reviewed changes

Improved documentation a bit

55b6a2f

bkchr approved these changes Apr 3, 2021

View reviewed changes

Update client/state-db/src/lib.rs

f8be608

Co-authored-by: Bastian Köcher <bkchr@users.noreply.github.com>

arkpar merged commit 2ab715f into master Apr 3, 2021

arkpar deleted the a-fix-state-db branch April 3, 2021 20:49

davxy mentioned this pull request Sep 23, 2022

Limit number of blocks per level (2nd attempt) paritytech/cumulus#1559

Merged

This was referenced Jan 11, 2024

A0-3582: Deal with block limit aleph-zero-foundation/polkadot-sdk#1

Closed

Remove block limit per level paritytech/polkadot-sdk#2933

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fixed restoring state-db journals on startup#8494

Fixed restoring state-db journals on startup#8494
arkpar merged 3 commits intomasterfrom
a-fix-state-db

arkpar commented Mar 30, 2021

Uh oh!

cheme commented Mar 31, 2021

Uh oh!

arkpar commented Mar 31, 2021

Uh oh!

cheme commented Mar 31, 2021 •

edited

Loading

Uh oh!

cheme left a comment

Uh oh!

Uh oh!

bkchr Apr 3, 2021

Uh oh!

arkpar Apr 3, 2021

Uh oh!

bkchr Apr 3, 2021

Uh oh!

arkpar Apr 3, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

arkpar commented Mar 30, 2021

Uh oh!

cheme commented Mar 31, 2021

Uh oh!

arkpar commented Mar 31, 2021

Uh oh!

cheme commented Mar 31, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

cheme left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

bkchr Apr 3, 2021

Choose a reason for hiding this comment

Uh oh!

arkpar Apr 3, 2021

Choose a reason for hiding this comment

Uh oh!

bkchr Apr 3, 2021

Choose a reason for hiding this comment

Uh oh!

arkpar Apr 3, 2021

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

cheme commented Mar 31, 2021 •

edited

Loading