Skip to content

op-node/specs: refactor batch queue, add parent_hash to batches, extend testing#3221

Merged
mergify[bot] merged 4 commits intodevelopfrom
refactor-batch-queue
Aug 24, 2022
Merged

op-node/specs: refactor batch queue, add parent_hash to batches, extend testing#3221
mergify[bot] merged 4 commits intodevelopfrom
refactor-batch-queue

Conversation

@protolambda
Copy link
Contributor

@protolambda protolambda commented Aug 14, 2022

This adds parent_hash to the batch definition, to ensure we never skip a batch. Skipping a batch can cause transactions with invalid nonce values to be inserted into the execution engine, causing reorgs or complete halts.

The parent_hash matches the L2 block hash of the block the batch will be building on top of, i.e. the L2 safe head.
To enable this, we have to change the batch-queue processing to only output 1 batch at the time. Without this, the previous L2 block hash would not be available, and so we could not check the hash otherwise.

In a previous iteration of this I considered hashing just the batch inputs, and chaining those hashes. However, we can't include those hashes into the L2 chain elegantly, which is required to find a sync continuation point without loading all historical batches. I.e. we would need to add additional data to the L1 block info transaction, and it would almost be self-referencing, as it hashes together the other transactions, which are included later.
Using the native block hashes thus is preferred to not add additional data to the info deposit, and avoid the near cyclic dependency issue.

Not outputting multiple batches at a time really means that we are not filling multiple batches at a time when there is a gap to fill. The eager batch derivation already outputted a single batch otherwise.

Previously the validity conditions on batches were spread between the batch-filtering, batch-filling, extension-checking and batch queue main function. I changed this so we can output 1 batch at a time, check all validity conditions, and isolate the validity conditions in one function that now has 100% test coverage.

Part of this unwinding of the different checks also meant we can now remove the L1 RPC from the function: all necessary L1 data is there, and is easy to access (first L1 block always matches L2 safe head L1 origin, second L1 block is optional).

And validating the batches this way means there is no difference between eager and regular batch-derivation: we always consider the validity of the first batch, since we output one at a time.

The spec is updated to reflect this same methodology, so we don't need eager batch queue <> regular batch queue distinction anymore.

I hope the PR is not too large to review, this was not an easy change, at least it should reduce total lines of regular code, while adding more tests + ensuring 100% coverage of batch validation.

Fix ENG-2593

@changeset-bot
Copy link

changeset-bot bot commented Aug 14, 2022

⚠️ No Changeset found

Latest commit: c003072

Merging this PR will not cause a version bump for any packages. If these changes should not result in a new version, you're good to go. If these changes should result in a version bump, you need to add a changeset.

This PR includes no changesets

When changesets are added to this PR, you'll see the packages that this PR includes changesets for and the associated semver types

Click here to learn what changesets are, and how to add one.

Click here if you're a maintainer who wants to add a changeset to this PR

@mergify
Copy link
Contributor

mergify bot commented Aug 14, 2022

This PR changes implementation code, but doesn't include a changeset. Did you forget to add one?

@protolambda
Copy link
Contributor Author

Rebased onto develop for the devnet CI fix.

@norswap
Copy link
Contributor

norswap commented Aug 16, 2022

The parent_hash matches the L2 block hash of the block the batch will be building on top of, i.e. the L2 safe head.
To enable this, we have to change the batch-queue processing to only output 1 batch at the time. Without this, the previous L2 block hash would not be available, and so we could not check the hash otherwise.

This has a pretty big consequence which is that if the sequencer does something bad on the black transition of batch X, then all the subsequent batches are rendered useless and cannot be replayed anymore. That's a pretty change and I feel like it should warrant some discussions of pros & cons.

In a previous iteration of this I considered hashing just the batch inputs, and chaining those hashes. However, we can't include those hashes into the L2 chain elegantly, which is required to find a sync continuation point without loading all historical batches. I.e. we would need to add additional data to the L1 block info transaction, and it would almost be self-referencing, as it hashes together the other transactions, which are included later.

I'm not entirely sure why we can't include the hash of the previous batch in the L1 infos of the next block?

Copy link
Contributor

@norswap norswap left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Didn't review the code really, mostly spec + general principles.

I think we need to talk about the implications of the block hash (see my comment separate from this review).

Specs-wise, I have some suggestions about how to better structure this bit of the spec. Maybe with impact to the code (new stage?)

Regarding the code, this is a lot of changes indeed.

I think in this specific case, it would have been good to implement the change entirely without a refactor, then consider refactors separately.

And we could have split refactors in multiple PRs where each PR moves only a single check for one place to another.

Imho right now our options are basically: hope our testing catches any issues, or do the above. Don't feel very comfortable reviewing the changes and seeing "yup, I'm confident that before/after is the same behaviour modulo the new bit we're adding".

Copy link
Contributor

@trianglesphere trianglesphere left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I went through and reviewed everything. I didn't realize how much you had refactored in there. Splitting this up into a refactor + adding parent hash as two steps (and maybe even smaller refactor steps depending on how it is) would have made this easier to review.

There's a couple spots that need some more comments.

My main concern is the repeated linear scan through bq.batches + how that array is modified. It does reduce the chance of leaking compared to the old design, but now requires (potentially) much more work.

@mergify
Copy link
Contributor

mergify bot commented Aug 23, 2022

Hey @protolambda! This PR has merge conflicts. Please fix them before continuing review.

@mergify mergify bot added the conflict label Aug 23, 2022
@protolambda protolambda force-pushed the refactor-batch-queue branch from c9ed6ef to 9d2a8c4 Compare August 24, 2022 00:18
@mergify mergify bot removed the conflict label Aug 24, 2022
@protolambda
Copy link
Contributor Author

  • Rebased onto develop
  • Implemented the grouping of batches by timestamp, to handle the performance/DoS case, otherwise still in inclusion order, clarified in specs that future batches can be deferred in processing since these are marked as future anyway.
  • Clarified a bunch of comments/code based on review, changed structure a tiny bit to make it more readable

Ready for final review/merge.

@protolambda protolambda dismissed trianglesphere’s stale review August 24, 2022 00:22

implemented requested changes

@protolambda protolambda force-pushed the refactor-batch-queue branch from 18b816a to 31a607b Compare August 24, 2022 01:23
Copy link
Contributor

@norswap norswap left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approving to move things along, flagging two things:

  • We need to have a conversation about batch replayability (will create linear ticket).
  • Still have my concerns with the state-machine-ish spec structures, but this can be easily addressed when doing l2 chain derivation refactoring work, so perfectly fine for now.
    • This will conflict with my last derivation spec PR, what I'll do there is keep both version side by side (old version just need an extra constraint for the parent hash), and we'll consolidate as part of refactors.

@mergify
Copy link
Contributor

mergify bot commented Aug 24, 2022

This PR has been added to the merge queue, and will be merged soon.

@mergify
Copy link
Contributor

mergify bot commented Aug 24, 2022

This PR is next in line to be merged, and will be merged as soon as checks pass.

@mergify mergify bot merged commit c3493a0 into develop Aug 24, 2022
@mergify mergify bot deleted the refactor-batch-queue branch August 24, 2022 18:04
@mergify mergify bot removed the on-merge-train label Aug 24, 2022
maurelian pushed a commit that referenced this pull request Sep 15, 2022
…nd testing (#3221)

* op-node,specs: batches track parent-hash, refactor+test batch queue, update specs

* op-node,specs: implement review suggestions

* op-node,specs: implement suggestions/fixes based on review from mark

Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>
sam-goldman pushed a commit that referenced this pull request Sep 15, 2022
…nd testing (#3221)

* op-node,specs: batches track parent-hash, refactor+test batch queue, update specs

* op-node,specs: implement review suggestions

* op-node,specs: implement suggestions/fixes based on review from mark

Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>
theochap pushed a commit that referenced this pull request Jan 15, 2026
…p-rs/kona#3206)

This PR consolidates the `EngineActor` inbound channels into a single
channel. See [this design
doc](https://github.com/ethereum-optimism/design-docs/blob/main/protocol/kona-node-actor-simplification.md#3-and-4-actor-dependencies-are-unclear)
for more info on motivation.

This PR also:
- Creates clients for all callers of `EngineActor` making interactions
simple, more generic, safe (e.g. DerivationActor's engine client does
not expose functions to perform block building), and mockable
- Makes dependent actors generic over their engine client to decouple
implementation and make mocking easy

Closes #3220
Closes #3221
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants