Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rewind head and header_head consistently. #2918

Merged
merged 10 commits into from
Jul 26, 2019

Conversation

antiochp
Copy link
Member

@antiochp antiochp commented Jun 25, 2019

This PR makes a cleaner separation between the "txhashset" and the standalone header extension.
We have 3 "heads" that track different chain tips.

  • head - the chain head
  • header_head - the header chain head
  • sync_head - the sync header chain head (used when syncing headers from a peer)

Current behavior is to update sync_head as we receive chunks of headers from a peer during header sync. We then update header_head if these increase our total known work on a validated chain of headers.
We use header_head to know which blocks to request - effectively the blocks we have not yet seen between head and header_head.

During the "private floonet" hardfork testing we discovered a couple of issues, specifically around serialization/deserialization of headers in the local db (where version={1|2}).
Some investigation around that led to exploring using the header MMR to read header hashes (instead of needing to deserialize a full header from the db).

This identified some room for significant simplication and improvement with respect to how we maintain head and header_head alongside our header MMR and output|rangeproof|kernel MMRs.

Intuitively header_head should match the last entry in the header MMR.
But this is not always the case due to how the header MMR interacts with the output|rangeproof|kernel MMRs in a full txhashset.

We would extend the header MMR when syncing headers but then when processing a full block we would sync the header MMR to disk aligned with the full txhashset (i.e. at head and not the further along header_head).


Proposed Solution (This PR)

  • When processing headers during header sync -
    • flush the sync header MMR to disk if total work is increased (relative to current sync_head)
    • and update sync_head to reflect this
    • flush the header MMR to disk if total work is increased (relative to current header_head)
    • updated header_head to reflect this
  • When processing "header first" broadcast -
    • flush the header MMR to disk if total work is increased (relative to current header_head)
    • updated header_head to reflect this
  • When processing full blocks -
    • txhashset extension writes to output|rangeproof|kernel MMRs
    • txhashset extension uses the existing header MMR readonly
    • flush the output|rangeproof|kernel MMRs to disk if total work is increased
    • use the existing header MMR readonly
    • update head to reflect latest block (if total work increased)

These changes allow us to simplify a bunch of code in various places -

  • The set_txhashset_roots_forked is no longer required for testing fork behavior.
  • We can simplify our "rewind" logic and make it more consistent.

This makes our rewind behavior significantly more robust in the presence of forks and reorgs.
We can now maintain a consistent header_head and rely on it being aligned with our current header MMR, even when processing blocks on chain forks.

I think (but this is really hard to actually verify) this plugs a couple of gaps where we can corrupt the MMR structures if the node is shutdown during fork processing.

@antiochp antiochp added this to the 2.x.x milestone Jun 25, 2019
@antiochp antiochp self-assigned this Jun 25, 2019
@antiochp antiochp changed the base branch from master to milestone/2.x.x July 2, 2019 09:13
@antiochp antiochp changed the title Rewind head and header_head consistently. [2.x.x] Rewind head and header_head consistently. Jul 6, 2019
@antiochp antiochp force-pushed the rewind_head branch 2 times, most recently from 68df947 to c78ca14 Compare July 12, 2019 14:00
@antiochp antiochp changed the base branch from milestone/2.x.x to master July 24, 2019 15:41
@antiochp antiochp changed the title [2.x.x] Rewind head and header_head consistently. Rewind head and header_head consistently. Jul 24, 2019
antiochp added 2 commits July 25, 2019 11:52
shortcircuit "rewind and apply fork" for headers if next header
@antiochp
Copy link
Member Author

This has been tested pretty extensively against mainnet for a few weeks now.
Both fast_sync and regular broadcast over time.

Planning to merge this today as it frees up some flexibility for a subsequent in-progress PR.

Copy link
Member

@quentinlesceller quentinlesceller left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 . Though, haven't had the time to review this extensively. But overall if you feel confident about it + tested for a while, I'd say LGTM.

@tromp
Copy link
Contributor

tromp commented Jul 25, 2019

I hoped this patch my avoid the crash of latest gin build on floonet that last synced before HF.
But it still crashes with no stdout and following log:

20190725 21:54:29.630 INFO grin_util::logger - log4rs is initialized, file level: Info, stdout level: Trace, min. level: Trace
20190725 21:54:29.630 INFO grin - Using configuration file at /Users/tromp/.grin/floo/grin-server.toml
20190725 21:54:29.630 INFO grin - This is Grin version 2.0.1-beta.1 (git v2.0.0-39-g45cf1d9), built for x86_64-apple-darwin by rustc 1.36.0 (a53f9df32 2019-07-03).
20190725 21:54:29.630 WARN grin::cmd::server - Starting GRIN in UI mode...
20190725 21:54:29.630 INFO grin_servers::grin::server - Starting server, genesis block: edc758c1370d
20190725 21:54:30.529 ERROR grin_util::logger - 
thread 'main' panicked at 'called `Result::unwrap()` on an `Err` value: Chain(Error { inner: SerErr("invalid block version")

Store Error: SerErr("invalid block version"), reason: Serialization Error })': src/libcore/result.rs:999stack backtrace:
   0: backtrace::backtrace::trace::h12367e71db9fd6de (0x10b8aecde)
   1: backtrace::capture::Backtrace::new::hedb30514c7de0a5b (0x10b8add08)
   2: grin_util::logger::send_panic_to_log::{{closure}}::h6053d53b40f9032a (0x10b835ce2)
   3: std::panicking::rust_panic_with_hook::hddd286a9c773fc67 (0x10b9656d1)
   4: std::panicking::continue_panic_fmt::ha1bbafe7a6df805a (0x10b96511d)
   5: rust_begin_unwind (0x10b965009)
   6: core::panicking::panic_fmt::h7c36dce80bd02f4c (0x10b97e3a2)
   7: core::result::unwrap_failed::h0c1045035718d579 (0x10b1a83c3)
   8: grin::cmd::server::start_server::h8e0684c5a7d3970d (0x10b19c781)
   9: grin::cmd::server::server_command::hbf4d12a75ccbff69 (0x10b1a10d0)
  10: grin::real_main::h0c025d27375cac9e (0x10b1bcf4c)
  11: grin::main::h2e207c1812dff14b (0x10b1bbd49)
  12: std::rt::lang_start::{{closure}}::hd0b0a2cbdb98b50d (0x10b196f56)
  13: std::panicking::try::do_call::h8037d9f03e27d896 (0x10b964f88)
  14: __rust_maybe_catch_panic (0x10b96a60f)
  15: std::rt::lang_start_internal::hc8e69e673740d4ae (0x10b9659ae)
  16: main (0x10b1bda59)

@antiochp antiochp merged commit 515fa54 into mimblewimble:master Jul 26, 2019
@antiochp antiochp deleted the rewind_head branch July 26, 2019 07:36
@antiochp
Copy link
Member Author

invalid block version

@tromp Unfortunately this PR fixes some issues that we uncovered when investigating your one, but not that specific issue itself.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants