You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I have spent a few days digging into the low level process of how we handle the "transactional" txhashset extension.
We now have a better understanding of exactly what the issue is when we see "corrupted" data on startup, and a proposed solution.
Quick overview of the steps involved (processing a new full block, assumes header already processed for simplicity) -
begin lmdb transaction (aka batch)
begin an lmdb "child" transaction
begin txhashset extension (in memory)
rewind and apply fork
no-op if block is "next" block w.r.t current chain head
calculate "fork point" (header where chain diverges on fork)
rewind mmrs to fork point (in memory)
apply all necessary block to put forked chain in correct state
apply new block
if fork is less total work then main chain
discard mmr changes (discard in memory changes)
discard (rollback) the child transaction
output_pos index is reverted etc.
commit the outer transaction
save block to db (needed if we subsequently process this fork again)
if fork increases the total work (main chain extended or in reorg scenario)
sync updated mmr files to disk
sync output mmr
truncate hash file
append new bytes to hash file, flush to disk
truncate data file
append new bytes to data file, flush to disk
rewrite the "leaf_set" file, flush to disk
sync rangeproof mmr (see output mmr above)
sync kernel mmr (see output mmr above, excluding leaf_set)
commit the child transaction (changes rolled into outer transaction)
output_pos index updated
various block specific data saved to disk
chain head updated
commit outer transaction
save block to db
save all child transaction updates to db
The key points from above are that we do the following -
start a db transaction
truncate, extends and flush various files to disk
commit the db transaction
We attempt to be as "atomic" as possible but the various file operations (truncate and flush) are not transactional.
Our issues arise when we experience an unclean shutdown during step (6) above. We attempt to minimize the chance of corruption in various ways but once we start truncating files on disk and/or appending new bytes to them we must complete this entire process successfully (up to and including committing the outer lmdb transaction) or we risk corrupting the data on disk.
[tbd - describe proposed "checkpoint" solution]
The (P)MMR data structures are append-only.
We can prune them (for output and rangeproof MMRs) which removed contiguous chunks of bytes. But we never overwrite or update old data. It is immutable, with one exception, we support "rewind" which can rewrite recent data (but only recent data).
Note: During rewind we determine the "fork point", the point where the fork diverges from the current main chain. All data after the fork point is subject to change, it will be rewritten. All data before the fork point is immutable within the context of this single block being processed.
This gives us a known block (header) where earlier data is guaranteed to be immutable.
From this header we can determine byte locations (via mmr sizes in the header) for all hash and data files in the various MMRs. We can guarantee data cannot be corrupted prior to this point.
In the event of needing to recover from corrupted data we can revert to this "checkpoint", effectively resetting the chain to the "fork point" and reprocess blocks from this point forward.
In the common case the fork point matches the current chain head. Recovery in this situation is to discard any data partially written to the MMR files.
In the less common fork/reorg scenario we will potentially discard one or more full blocks of data that may have been partially rewritten, allowing us to reprocess full blocks safely from this reverted state.
[todo - describe why the leaf_set remains an ugly issue]
The text was updated successfully, but these errors were encountered:
Draft PR for "checkpoint" here - #3266
Still very much WIP as there are many edge cases to think through.
The realization that (obvious with hindsight) we know the byte location in all the files where we can partition between immutable and mutable (the fork point) does appear to make the possibility of a robust recovery feasible.
I have spent a few days digging into the low level process of how we handle the "transactional" txhashset extension.
We now have a better understanding of exactly what the issue is when we see "corrupted" data on startup, and a proposed solution.
Quick overview of the steps involved (processing a new full block, assumes header already processed for simplicity) -
The key points from above are that we do the following -
We attempt to be as "atomic" as possible but the various file operations (truncate and flush) are not transactional.
Our issues arise when we experience an unclean shutdown during step (6) above. We attempt to minimize the chance of corruption in various ways but once we start truncating files on disk and/or appending new bytes to them we must complete this entire process successfully (up to and including committing the outer lmdb transaction) or we risk corrupting the data on disk.
[tbd - describe proposed "checkpoint" solution]
The (P)MMR data structures are append-only.
We can prune them (for output and rangeproof MMRs) which removed contiguous chunks of bytes. But we never overwrite or update old data. It is immutable, with one exception, we support "rewind" which can rewrite recent data (but only recent data).
Note: During rewind we determine the "fork point", the point where the fork diverges from the current main chain. All data after the fork point is subject to change, it will be rewritten. All data before the fork point is immutable within the context of this single block being processed.
This gives us a known block (header) where earlier data is guaranteed to be immutable.
From this header we can determine byte locations (via mmr sizes in the header) for all hash and data files in the various MMRs. We can guarantee data cannot be corrupted prior to this point.
In the event of needing to recover from corrupted data we can revert to this "checkpoint", effectively resetting the chain to the "fork point" and reprocess blocks from this point forward.
In the common case the fork point matches the current chain head. Recovery in this situation is to discard any data partially written to the MMR files.
In the less common fork/reorg scenario we will potentially discard one or more full blocks of data that may have been partially rewritten, allowing us to reprocess full blocks safely from this reverted state.
[todo - describe why the leaf_set remains an ugly issue]
The text was updated successfully, but these errors were encountered: