op-node/specs: refactor batch queue, add parent_hash to batches, extend testing#3221
op-node/specs: refactor batch queue, add parent_hash to batches, extend testing#3221mergify[bot] merged 4 commits intodevelopfrom
Conversation
|
|
This PR changes implementation code, but doesn't include a changeset. Did you forget to add one? |
7dd4907 to
c9ed6ef
Compare
|
Rebased onto |
This has a pretty big consequence which is that if the sequencer does something bad on the black transition of batch
I'm not entirely sure why we can't include the hash of the previous batch in the L1 infos of the next block? |
There was a problem hiding this comment.
Didn't review the code really, mostly spec + general principles.
I think we need to talk about the implications of the block hash (see my comment separate from this review).
Specs-wise, I have some suggestions about how to better structure this bit of the spec. Maybe with impact to the code (new stage?)
Regarding the code, this is a lot of changes indeed.
I think in this specific case, it would have been good to implement the change entirely without a refactor, then consider refactors separately.
And we could have split refactors in multiple PRs where each PR moves only a single check for one place to another.
Imho right now our options are basically: hope our testing catches any issues, or do the above. Don't feel very comfortable reviewing the changes and seeing "yup, I'm confident that before/after is the same behaviour modulo the new bit we're adding".
trianglesphere
left a comment
There was a problem hiding this comment.
I went through and reviewed everything. I didn't realize how much you had refactored in there. Splitting this up into a refactor + adding parent hash as two steps (and maybe even smaller refactor steps depending on how it is) would have made this easier to review.
There's a couple spots that need some more comments.
My main concern is the repeated linear scan through bq.batches + how that array is modified. It does reduce the chance of leaking compared to the old design, but now requires (potentially) much more work.
|
Hey @protolambda! This PR has merge conflicts. Please fix them before continuing review. |
c9ed6ef to
9d2a8c4
Compare
Ready for final review/merge. |
implemented requested changes
18b816a to
31a607b
Compare
norswap
left a comment
There was a problem hiding this comment.
Approving to move things along, flagging two things:
- We need to have a conversation about batch replayability (will create linear ticket).
- Still have my concerns with the state-machine-ish spec structures, but this can be easily addressed when doing l2 chain derivation refactoring work, so perfectly fine for now.
- This will conflict with my last derivation spec PR, what I'll do there is keep both version side by side (old version just need an extra constraint for the parent hash), and we'll consolidate as part of refactors.
|
This PR has been added to the merge queue, and will be merged soon. |
|
This PR is next in line to be merged, and will be merged as soon as checks pass. |
…nd testing (#3221) * op-node,specs: batches track parent-hash, refactor+test batch queue, update specs * op-node,specs: implement review suggestions * op-node,specs: implement suggestions/fixes based on review from mark Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>
…nd testing (#3221) * op-node,specs: batches track parent-hash, refactor+test batch queue, update specs * op-node,specs: implement review suggestions * op-node,specs: implement suggestions/fixes based on review from mark Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>
…p-rs/kona#3206) This PR consolidates the `EngineActor` inbound channels into a single channel. See [this design doc](https://github.com/ethereum-optimism/design-docs/blob/main/protocol/kona-node-actor-simplification.md#3-and-4-actor-dependencies-are-unclear) for more info on motivation. This PR also: - Creates clients for all callers of `EngineActor` making interactions simple, more generic, safe (e.g. DerivationActor's engine client does not expose functions to perform block building), and mockable - Makes dependent actors generic over their engine client to decouple implementation and make mocking easy Closes #3220 Closes #3221
This adds
parent_hashto the batch definition, to ensure we never skip a batch. Skipping a batch can cause transactions with invalid nonce values to be inserted into the execution engine, causing reorgs or complete halts.The
parent_hashmatches the L2 block hash of the block the batch will be building on top of, i.e. the L2 safe head.To enable this, we have to change the batch-queue processing to only output 1 batch at the time. Without this, the previous L2 block hash would not be available, and so we could not check the hash otherwise.
In a previous iteration of this I considered hashing just the batch inputs, and chaining those hashes. However, we can't include those hashes into the L2 chain elegantly, which is required to find a sync continuation point without loading all historical batches. I.e. we would need to add additional data to the L1 block info transaction, and it would almost be self-referencing, as it hashes together the other transactions, which are included later.
Using the native block hashes thus is preferred to not add additional data to the info deposit, and avoid the near cyclic dependency issue.
Not outputting multiple batches at a time really means that we are not filling multiple batches at a time when there is a gap to fill. The eager batch derivation already outputted a single batch otherwise.
Previously the validity conditions on batches were spread between the batch-filtering, batch-filling, extension-checking and batch queue main function. I changed this so we can output 1 batch at a time, check all validity conditions, and isolate the validity conditions in one function that now has 100% test coverage.
Part of this unwinding of the different checks also meant we can now remove the L1 RPC from the function: all necessary L1 data is there, and is easy to access (first L1 block always matches L2 safe head L1 origin, second L1 block is optional).
And validating the batches this way means there is no difference between eager and regular batch-derivation: we always consider the validity of the first batch, since we output one at a time.
The spec is updated to reflect this same methodology, so we don't need eager batch queue <> regular batch queue distinction anymore.
I hope the PR is not too large to review, this was not an easy change, at least it should reduce total lines of regular code, while adding more tests + ensuring 100% coverage of batch validation.
Fix ENG-2593