Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Merged by Bors] - Separate execution payloads in the DB #3157

Closed

Conversation

michaelsproul
Copy link
Member

@michaelsproul michaelsproul commented Apr 12, 2022

Proposed Changes

Reduce post-merge disk usage by not storing finalized execution payloads in Lighthouse's database.

⚠️ This is achieved in a backwards-incompatible way for networks that have already merged ⚠️. Kiln users and shadow fork enjoyers will be unable to downgrade after running the code from this PR. The upgrade migration may take several minutes to run, and can't be aborted after it begins.

The main changes are:

  • New column in the database called ExecPayload, keyed by beacon block root.
  • The BeaconBlock column now stores blinded blocks only.
  • Lots of places that previously used full blocks now use blinded blocks, e.g. analytics APIs, block replay in the DB, etc.
  • On finalization:
    • prune_abanonded_forks deletes non-canonical payloads whilst deleting non-canonical blocks.
    • migrate_db deletes finalized canonical payloads whilst deleting finalized states.
  • Conversions between blinded and full blocks are implemented in a compositional way, duplicating some work from Sean's PR [Merged by Bors] - Builder Specs v0.2.0 #3134.
  • The execution layer has a new get_payload_by_block_hash method that reconstructs a payload using the EE's eth_getBlockByHash call.
  • Payload reconstruction is used in the HTTP API via BeaconChain::get_block, which is now async. Due to the async fn, the blocking_json wrapper has been removed.
  • Payload reconstruction is used in network RPC to serve blocks-by-{root,range} responses. Here the async adjustment is messier, although I think I've managed to come up with a reasonable compromise: the handlers take the SendOnDrop by value so that they can drop it on task completion (after the fn returns). Still, this is introducing disk reads onto core executor threads, which may have a negative performance impact (thoughts appreciated).

Additional Info

  • For performance it would be great to remove the cloning of full blocks when converting them to blinded blocks to write to disk. I'm going to experiment with a put_block API that takes the block by value, breaks it into a blinded block and a payload, stores the blinded block, and then re-assembles the full block for the caller.
  • We should measure the latency of blocks-by-root and blocks-by-range responses.
  • We should add integration tests that stress the payload reconstruction (basic tests done, issue for more extensive tests: Add transactions to EE integration tests #3159)
  • We should (manually) test the schema v9 migration from several prior versions, particularly as blocks have changed on disk and some migrations rely on being able to load blocks.

@michaelsproul michaelsproul added work-in-progress PR is a work-in-progress backwards-incompat Backwards-incompatible API change labels Apr 12, 2022
@michaelsproul michaelsproul added ready-for-review The code is ready for review and removed work-in-progress PR is a work-in-progress labels May 9, 2022
@michaelsproul
Copy link
Member Author

Marking this as ready for review pending some metrics on blocks-by-root and blocks-by-range

@michaelsproul michaelsproul added the bellatrix Required to support the Bellatrix Upgrade label May 9, 2022
@michaelsproul
Copy link
Member Author

I've added a metric for the per-payload reconstruction time, and it seems that the timings are within tolerable levels already: on Kiln the median reconstruction time is around 3ms, and the 99th percentile is 25ms. This means that we add 64 x 3ms = 190ms to our blocks by range responses on average, and in the worst-case where all 64 blocks requested take 25ms, we add only 64*25ms = 1.6s, which is well within the 10s timeout.

Graph with more percentiles, open in new tab to view:

ee_reconstruct
Lighthouse + Nethermind on Kiln

I think it's important that we get more widespread testing of this PR ASAP, and tear off the band-aid for the database schema change. I think we should review and merge it soon, and then we can optimise our payload reconstruction times using caches and the new EE APIs for block reconstruction (e.g. ethereum/execution-apis#218).

@michaelsproul
Copy link
Member Author

I've just pushed a new commit (21298d0) that implements a schema upgrade for Kiln/shadow-fork users. The caveat is that it requires ~10GB of RAM 😁

In the process I also found a bug where we weren't loading the split point before running the migration, so looking up the genesis state would fail (because of bad split slot math) and the migration would be forced into the no-slot-clock no-op case with dubious results. It's fixed by loading the split slot first, which is safe because it has never been subject to a migration. We also fail if we can't get a slot clock.

I've tested the migration on a v2.2.1 Kiln node v8 -> v9, and it worked (with 10GB peak RAM usage) and also on a v2.0.1 Prater node v5 -> v9.

Copy link
Member

@paulhauner paulhauner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks really good! I'm impressed at how clean this change was. It's nice to be able to ignore the exec payload in places we don't need it (e.g., the parent block during block verification).

As discussed elsewhere, I've made a PR that reduces the memory footprint of the migration: michaelsproul#3

That PR is my only suggestion, I'm happy to merge after that is addressed ☺️

.or_else(|| self.early_attester_cache.get_block(*block_root));

Ok(block_opt)
if let Some(block) = self.early_attester_cache.get_block(*block_root) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice optimization!

@paulhauner paulhauner added waiting-on-author The reviewer has suggested changes and awaits thier implementation. and removed ready-for-review The code is ready for review labels May 11, 2022
@paulhauner
Copy link
Member

Oh also, it might be useful to upgrade the description to account for the migration :)

Perform v9 migration less-atomically
@michaelsproul
Copy link
Member Author

Just re-tested the new low-memory migration and it ran flawlessly, thanks Paul!

Let's merge!

bors r+

@michaelsproul michaelsproul added ready-for-merge This PR is ready to merge. and removed waiting-on-author The reviewer has suggested changes and awaits thier implementation. labels May 12, 2022
bors bot pushed a commit that referenced this pull request May 12, 2022
## Proposed Changes

Reduce post-merge disk usage by not storing finalized execution payloads in Lighthouse's database.

:warning: **This is achieved in a backwards-incompatible way for networks that have already merged** :warning:. Kiln users and shadow fork enjoyers will be unable to downgrade after running the code from this PR. The upgrade migration may take several minutes to run, and can't be aborted after it begins.

The main changes are:

- New column in the database called `ExecPayload`, keyed by beacon block root.
- The `BeaconBlock` column now stores blinded blocks only.
- Lots of places that previously used full blocks now use blinded blocks, e.g. analytics APIs, block replay in the DB, etc.
- On finalization:
    - `prune_abanonded_forks` deletes non-canonical payloads whilst deleting non-canonical blocks.
    - `migrate_db` deletes finalized canonical payloads whilst deleting finalized states.
- Conversions between blinded and full blocks are implemented in a compositional way, duplicating some work from Sean's PR #3134.
- The execution layer has a new `get_payload_by_block_hash` method that reconstructs a payload using the EE's `eth_getBlockByHash` call.
   - I've tested manually that it works on Kiln, using Geth and Nethermind.
   - This isn't necessarily the most efficient method, and new engine APIs are being discussed to improve this: ethereum/execution-apis#146.
   - We're depending on the `ethers` master branch, due to lots of recent changes. We're also using a workaround for gakonst/ethers-rs#1134.
- Payload reconstruction is used in the HTTP API via `BeaconChain::get_block`, which is now `async`. Due to the `async` fn, the `blocking_json` wrapper has been removed.
- Payload reconstruction is used in network RPC to serve blocks-by-{root,range} responses. Here the `async` adjustment is messier, although I think I've managed to come up with a reasonable compromise: the handlers take the `SendOnDrop` by value so that they can drop it on _task completion_ (after the `fn` returns). Still, this is introducing disk reads onto core executor threads, which may have a negative performance impact (thoughts appreciated).

## Additional Info

- [x] For performance it would be great to remove the cloning of full blocks when converting them to blinded blocks to write to disk. I'm going to experiment with a `put_block` API that takes the block by value, breaks it into a blinded block and a payload, stores the blinded block, and then re-assembles the full block for the caller.
- [x] We should measure the latency of blocks-by-root and blocks-by-range responses.
- [x] We should add integration tests that stress the payload reconstruction (basic tests done, issue for more extensive tests: #3159)
- [x] We should (manually) test the schema v9 migration from several prior versions, particularly as blocks have changed on disk and some migrations rely on being able to load blocks.

Co-authored-by: Paul Hauner <[email protected]>
@bors bors bot changed the title Separate execution payloads in the DB [Merged by Bors] - Separate execution payloads in the DB May 12, 2022
@bors bors bot closed this May 12, 2022
@michaelsproul michaelsproul deleted the db-payload-separation branch May 12, 2022 02:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backwards-incompat Backwards-incompatible API change bellatrix Required to support the Bellatrix Upgrade ready-for-merge This PR is ready to merge.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants