Rewind head and header_head consistently. #2918

antiochp · 2019-06-25T14:32:38Z

This PR makes a cleaner separation between the "txhashset" and the standalone header extension.
We have 3 "heads" that track different chain tips.

head - the chain head
header_head - the header chain head
sync_head - the sync header chain head (used when syncing headers from a peer)

Current behavior is to update sync_head as we receive chunks of headers from a peer during header sync. We then update header_head if these increase our total known work on a validated chain of headers.
We use header_head to know which blocks to request - effectively the blocks we have not yet seen between head and header_head.

During the "private floonet" hardfork testing we discovered a couple of issues, specifically around serialization/deserialization of headers in the local db (where version={1|2}).
Some investigation around that led to exploring using the header MMR to read header hashes (instead of needing to deserialize a full header from the db).

This identified some room for significant simplication and improvement with respect to how we maintain head and header_head alongside our header MMR and output|rangeproof|kernel MMRs.

Intuitively header_head should match the last entry in the header MMR.
But this is not always the case due to how the header MMR interacts with the output|rangeproof|kernel MMRs in a full txhashset.

We would extend the header MMR when syncing headers but then when processing a full block we would sync the header MMR to disk aligned with the full txhashset (i.e. at head and not the further along header_head).

Proposed Solution (This PR)

When processing headers during header sync -
- flush the sync header MMR to disk if total work is increased (relative to current sync_head)
- and update sync_head to reflect this
- flush the header MMR to disk if total work is increased (relative to current header_head)
- updated header_head to reflect this
When processing "header first" broadcast -
- flush the header MMR to disk if total work is increased (relative to current header_head)
- updated header_head to reflect this
When processing full blocks -
- txhashset extension writes to output|rangeproof|kernel MMRs
- txhashset extension uses the existing header MMR readonly
- flush the output|rangeproof|kernel MMRs to disk if total work is increased
- use the existing header MMR readonly
- update head to reflect latest block (if total work increased)

These changes allow us to simplify a bunch of code in various places -

The set_txhashset_roots_forked is no longer required for testing fork behavior.
We can simplify our "rewind" logic and make it more consistent.

This makes our rewind behavior significantly more robust in the presence of forks and reorgs.
We can now maintain a consistent header_head and rely on it being aligned with our current header MMR, even when processing blocks on chain forks.

I think (but this is really hard to actually verify) this plugs a couple of gaps where we can corrupt the MMR structures if the node is shutdown during fork processing.

shortcircuit "rewind and apply fork" for headers if next header

antiochp · 2019-07-25T15:33:07Z

This has been tested pretty extensively against mainnet for a few weeks now.
Both fast_sync and regular broadcast over time.

Planning to merge this today as it frees up some flexibility for a subsequent in-progress PR.

quentinlesceller

👍 . Though, haven't had the time to review this extensively. But overall if you feel confident about it + tested for a while, I'd say LGTM.

tromp · 2019-07-25T19:57:17Z

I hoped this patch my avoid the crash of latest gin build on floonet that last synced before HF.
But it still crashes with no stdout and following log:

20190725 21:54:29.630 INFO grin_util::logger - log4rs is initialized, file level: Info, stdout level: Trace, min. level: Trace
20190725 21:54:29.630 INFO grin - Using configuration file at /Users/tromp/.grin/floo/grin-server.toml
20190725 21:54:29.630 INFO grin - This is Grin version 2.0.1-beta.1 (git v2.0.0-39-g45cf1d9), built for x86_64-apple-darwin by rustc 1.36.0 (a53f9df32 2019-07-03).
20190725 21:54:29.630 WARN grin::cmd::server - Starting GRIN in UI mode...
20190725 21:54:29.630 INFO grin_servers::grin::server - Starting server, genesis block: edc758c1370d
20190725 21:54:30.529 ERROR grin_util::logger - 
thread 'main' panicked at 'called `Result::unwrap()` on an `Err` value: Chain(Error { inner: SerErr("invalid block version")

Store Error: SerErr("invalid block version"), reason: Serialization Error })': src/libcore/result.rs:999stack backtrace:
   0: backtrace::backtrace::trace::h12367e71db9fd6de (0x10b8aecde)
   1: backtrace::capture::Backtrace::new::hedb30514c7de0a5b (0x10b8add08)
   2: grin_util::logger::send_panic_to_log::{{closure}}::h6053d53b40f9032a (0x10b835ce2)
   3: std::panicking::rust_panic_with_hook::hddd286a9c773fc67 (0x10b9656d1)
   4: std::panicking::continue_panic_fmt::ha1bbafe7a6df805a (0x10b96511d)
   5: rust_begin_unwind (0x10b965009)
   6: core::panicking::panic_fmt::h7c36dce80bd02f4c (0x10b97e3a2)
   7: core::result::unwrap_failed::h0c1045035718d579 (0x10b1a83c3)
   8: grin::cmd::server::start_server::h8e0684c5a7d3970d (0x10b19c781)
   9: grin::cmd::server::server_command::hbf4d12a75ccbff69 (0x10b1a10d0)
  10: grin::real_main::h0c025d27375cac9e (0x10b1bcf4c)
  11: grin::main::h2e207c1812dff14b (0x10b1bbd49)
  12: std::rt::lang_start::{{closure}}::hd0b0a2cbdb98b50d (0x10b196f56)
  13: std::panicking::try::do_call::h8037d9f03e27d896 (0x10b964f88)
  14: __rust_maybe_catch_panic (0x10b96a60f)
  15: std::rt::lang_start_internal::hc8e69e673740d4ae (0x10b9659ae)
  16: main (0x10b1bda59)

antiochp · 2019-07-26T07:37:49Z

invalid block version

@tromp Unfortunately this PR fixes some issues that we uncovered when investigating your one, but not that specific issue itself.

antiochp added this to the 2.x.x milestone Jun 25, 2019

antiochp added the enhancement label Jun 25, 2019

antiochp self-assigned this Jun 25, 2019

antiochp requested review from yeastplume, ignopeverell and quentinlesceller June 25, 2019 14:33

antiochp mentioned this pull request Jun 25, 2019

handle the HF invalid header scenario safely during chain init #2912

Closed

1 task

antiochp force-pushed the rewind_head branch from 45ace0f to 0c62f53 Compare July 2, 2019 09:13

antiochp changed the base branch from master to milestone/2.x.x July 2, 2019 09:13

antiochp force-pushed the rewind_head branch from 0c62f53 to 48389b1 Compare July 6, 2019 15:11

antiochp changed the title ~~Rewind head and header_head consistently.~~ [2.x.x] Rewind head and header_head consistently. Jul 6, 2019

antiochp force-pushed the rewind_head branch 2 times, most recently from 68df947 to c78ca14 Compare July 12, 2019 14:00

antiochp changed the base branch from milestone/2.x.x to master July 24, 2019 15:41

antiochp changed the title ~~[2.x.x] Rewind head and header_head consistently.~~ Rewind head and header_head consistently. Jul 24, 2019

antiochp added 8 commits July 25, 2019 10:21

maintain header_head as distinctly separate from head

34005d6

cleanup corrupted storage log msg

e4f97fd

simplify process_header and check_header_known

2cd5924

remember to commit the batch when successfully processing a header...

c09c810

rework sync_block_headers for consistency with process_block_header

631ed63

cleanup unrelated code

569f480

fix pool tests

46f03c8

cleanup chain tests

b86a58d

antiochp force-pushed the rewind_head branch from a0df18d to b86a58d Compare July 25, 2019 09:57

antiochp added 2 commits July 25, 2019 11:52

cleanup chain tests (reuse helpers more)

562ce85

cleanup - head not header on an extension

d0c9769

shortcircuit "rewind and apply fork" for headers if next header

quentinlesceller approved these changes Jul 25, 2019

View reviewed changes

antiochp merged commit 515fa54 into mimblewimble:master Jul 26, 2019

antiochp deleted the rewind_head branch July 26, 2019 07:36

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Rewind head and header_head consistently. #2918

Rewind head and header_head consistently. #2918

antiochp commented Jun 25, 2019 •

edited

Loading

antiochp commented Jul 25, 2019

quentinlesceller left a comment •

edited

Loading

tromp commented Jul 25, 2019

antiochp commented Jul 26, 2019

Rewind head and header_head consistently. #2918

Rewind head and header_head consistently. #2918

Conversation

antiochp commented Jun 25, 2019 • edited Loading

Proposed Solution (This PR)

antiochp commented Jul 25, 2019

quentinlesceller left a comment • edited Loading

Choose a reason for hiding this comment

tromp commented Jul 25, 2019

antiochp commented Jul 26, 2019

antiochp commented Jun 25, 2019 •

edited

Loading

quentinlesceller left a comment •

edited

Loading