op-node/specs: refactor batch queue, add parent_hash to batches, extend testing by protolambda · Pull Request #3221 · ethereum-optimism/optimism

protolambda · 2022-08-14T02:47:54Z

This adds parent_hash to the batch definition, to ensure we never skip a batch. Skipping a batch can cause transactions with invalid nonce values to be inserted into the execution engine, causing reorgs or complete halts.

The parent_hash matches the L2 block hash of the block the batch will be building on top of, i.e. the L2 safe head.
To enable this, we have to change the batch-queue processing to only output 1 batch at the time. Without this, the previous L2 block hash would not be available, and so we could not check the hash otherwise.

In a previous iteration of this I considered hashing just the batch inputs, and chaining those hashes. However, we can't include those hashes into the L2 chain elegantly, which is required to find a sync continuation point without loading all historical batches. I.e. we would need to add additional data to the L1 block info transaction, and it would almost be self-referencing, as it hashes together the other transactions, which are included later.
Using the native block hashes thus is preferred to not add additional data to the info deposit, and avoid the near cyclic dependency issue.

Not outputting multiple batches at a time really means that we are not filling multiple batches at a time when there is a gap to fill. The eager batch derivation already outputted a single batch otherwise.

Previously the validity conditions on batches were spread between the batch-filtering, batch-filling, extension-checking and batch queue main function. I changed this so we can output 1 batch at a time, check all validity conditions, and isolate the validity conditions in one function that now has 100% test coverage.

Part of this unwinding of the different checks also meant we can now remove the L1 RPC from the function: all necessary L1 data is there, and is easy to access (first L1 block always matches L2 safe head L1 origin, second L1 block is optional).

And validating the batches this way means there is no difference between eager and regular batch-derivation: we always consider the validity of the first batch, since we output one at a time.

The spec is updated to reflect this same methodology, so we don't need eager batch queue <> regular batch queue distinction anymore.

I hope the PR is not too large to review, this was not an easy change, at least it should reduce total lines of regular code, while adding more tests + ensuring 100% coverage of batch validation.

Fix ENG-2593

changeset-bot · 2022-08-14T02:47:57Z

⚠️ No Changeset found

Latest commit: c003072

Merging this PR will not cause a version bump for any packages. If these changes should not result in a new version, you're good to go. If these changes should result in a version bump, you need to add a changeset.

This PR includes no changesets

When changesets are added to this PR, you'll see the packages that this PR includes changesets for and the associated semver types

Click here to learn what changesets are, and how to add one.

Click here if you're a maintainer who wants to add a changeset to this PR

mergify · 2022-08-14T02:48:21Z

This PR changes implementation code, but doesn't include a changeset. Did you forget to add one?

protolambda · 2022-08-16T15:59:47Z

Rebased onto develop for the devnet CI fix.

norswap · 2022-08-16T16:22:39Z

The parent_hash matches the L2 block hash of the block the batch will be building on top of, i.e. the L2 safe head.
To enable this, we have to change the batch-queue processing to only output 1 batch at the time. Without this, the previous L2 block hash would not be available, and so we could not check the hash otherwise.

This has a pretty big consequence which is that if the sequencer does something bad on the black transition of batch X, then all the subsequent batches are rendered useless and cannot be replayed anymore. That's a pretty change and I feel like it should warrant some discussions of pros & cons.

In a previous iteration of this I considered hashing just the batch inputs, and chaining those hashes. However, we can't include those hashes into the L2 chain elegantly, which is required to find a sync continuation point without loading all historical batches. I.e. we would need to add additional data to the L1 block info transaction, and it would almost be self-referencing, as it hashes together the other transactions, which are included later.

I'm not entirely sure why we can't include the hash of the previous batch in the L1 infos of the next block?

norswap

Didn't review the code really, mostly spec + general principles.

I think we need to talk about the implications of the block hash (see my comment separate from this review).

Specs-wise, I have some suggestions about how to better structure this bit of the spec. Maybe with impact to the code (new stage?)

Regarding the code, this is a lot of changes indeed.

I think in this specific case, it would have been good to implement the change entirely without a refactor, then consider refactors separately.

And we could have split refactors in multiple PRs where each PR moves only a single check for one place to another.

Imho right now our options are basically: hope our testing catches any issues, or do the above. Don't feel very comfortable reviewing the changes and seeing "yup, I'm confident that before/after is the same behaviour modulo the new bit we're adding".

specs/derivation.md

trianglesphere

I went through and reviewed everything. I didn't realize how much you had refactored in there. Splitting this up into a refactor + adding parent hash as two steps (and maybe even smaller refactor steps depending on how it is) would have made this easier to review.

There's a couple spots that need some more comments.

My main concern is the repeated linear scan through bq.batches + how that array is modified. It does reduce the chance of leaking compared to the old design, but now requires (potentially) much more work.

op-node/rollup/derive/batch_queue.go

op-node/rollup/derive/batches.go

op-node/rollup/derive/batches_test.go

op-node/rollup/derive/batch_queue.go

op-node/rollup/derive/batch_queue_test.go

op-node/rollup/derive/batches.go

mergify · 2022-08-23T00:34:39Z

Hey @protolambda! This PR has merge conflicts. Please fix them before continuing review.

…update specs

protolambda · 2022-08-24T00:22:17Z

Rebased onto develop
Implemented the grouping of batches by timestamp, to handle the performance/DoS case, otherwise still in inclusion order, clarified in specs that future batches can be deferred in processing since these are marked as future anyway.
Clarified a bunch of comments/code based on review, changed structure a tiny bit to make it more readable

Ready for final review/merge.

implemented requested changes

op-node/rollup/derive/attributes_queue.go

op-node/rollup/derive/batch_queue.go

op-node/rollup/derive/batches.go

specs/derivation.md

norswap

Approving to move things along, flagging two things:

We need to have a conversation about batch replayability (will create linear ticket).
Still have my concerns with the state-machine-ish spec structures, but this can be easily addressed when doing l2 chain derivation refactoring work, so perfectly fine for now.
- This will conflict with my last derivation spec PR, what I'll do there is keep both version side by side (old version just need an extra constraint for the parent hash), and we'll consolidate as part of refactors.

mergify · 2022-08-24T17:51:35Z

This PR has been added to the merge queue, and will be merged soon.

mergify · 2022-08-24T17:51:48Z

This PR is next in line to be merged, and will be merged as soon as checks pass.

…nd testing (#3221) * op-node,specs: batches track parent-hash, refactor+test batch queue, update specs * op-node,specs: implement review suggestions * op-node,specs: implement suggestions/fixes based on review from mark Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>

…p-rs/kona#3206) This PR consolidates the `EngineActor` inbound channels into a single channel. See [this design doc](https://github.com/ethereum-optimism/design-docs/blob/main/protocol/kona-node-actor-simplification.md#3-and-4-actor-dependencies-are-unclear) for more info on motivation. This PR also: - Creates clients for all callers of `EngineActor` making interactions simple, more generic, safe (e.g. DerivationActor's engine client does not expose functions to perform block building), and mockable - Makes dependent actors generic over their engine client to decouple implementation and make mocking easy Closes #3220 Closes #3221

protolambda mentioned this pull request Aug 15, 2022

op-batcher: fix channel ID time choice [bedrock] #3222

Merged

protolambda force-pushed the refactor-batch-queue branch from 7dd4907 to c9ed6ef Compare August 16, 2022 15:59

norswap reviewed Aug 16, 2022

View reviewed changes

trianglesphere previously requested changes Aug 16, 2022

View reviewed changes

mergify bot added the conflict label Aug 23, 2022

protolambda added 2 commits August 24, 2022 01:29

op-node,specs: batches track parent-hash, refactor+test batch queue, …

1fa7d60

…update specs

op-node,specs: implement review suggestions

9d2a8c4

protolambda force-pushed the refactor-batch-queue branch from c9ed6ef to 9d2a8c4 Compare August 24, 2022 00:18

mergify bot removed the conflict label Aug 24, 2022