Bedrock: updated Batch Derivation specification by protolambda · Pull Request #2807 · ethereum-optimism/optimism

protolambda · 2022-06-17T00:34:35Z

This spec update describes the updated batch derivation process, as well as batch-submission (previously missing from the spec!), and approaches derivation more systematically (in stages).

This is largely based on two docs:

The latest design proposal in notion (stable for the last week at least)
The existing rollup_node.md contents: split of from the node spec. The rollup-node is more than just derivation, and derivation is a topic that deserves its own document.

Update:

Added diagrams

There are some imperfections, and I need to update ToCs + linting. This is a draft.

See #2714 for ongoing implementation work.

Key bedrock improvements:

Data of L2 blocks can be spread across multiple L1 data transactions (fixes several sub-issues)
Stability/testing design improvements: isolated stages instead of monolithic driver state
Lower latency to L2 head via L1 syncing, with eager batch derivation
Improved reorg handling: the derivation pipeline can best determine when to redo derivation based on L1 reorgs. Previously the L1 head changes could force a sudden reorg, even if the L2 engine was not synced up to the reorged L1 blocks yet.
Fix error handling in the interactions with the execution-engine: an RPC error is not the same as an invalid transaction, payload attributes should never accidentally be dropped.
Improve robustness of derivation: smaller steps and buffering, any small RPC failure does not result in cascading effects.

Known bugs/issues in spec:

Eager batch derivation is not fully specced yet.
- @trianglesphere is working on this in notion
- I ported over the existing spec of full-sequencing window batch derivation, but split the batch-logic from the payload-attributes-derivation logic.
channels should have a timestamp (time constrained by inclusion of frame data) be part of their uniqueness: channels may not revive, or sync may not be consistent when not starting at genesis, even after waiting for a full timeout period
- discussed with Josh, should be a simple fix
The batch submission minSize API argument is not specced/implemented yet.
- maybe best to ignore until we test more of the derivation updates, it's not critical
Meta:
- Broken links need to be checked/fixed
- Table of Contents re-generation
- Markdown linting fixes (ported a lot over from notion, it's not all valid with the github linter)

changeset-bot · 2022-06-17T00:34:39Z

⚠️ No Changeset found

Latest commit: b94c634

Merging this PR will not cause a version bump for any packages. If these changes should not result in a new version, you're good to go. If these changes should result in a version bump, you need to add a changeset.

This PR includes no changesets

When changesets are added to this PR, you'll see the packages that this PR includes changesets for and the associated semver types

Click here to learn what changesets are, and how to add one.

Click here if you're a maintainer who wants to add a changeset to this PR

norswap

Huge review =)

First off, I've talked with Proto and given the amount of changes I'd like, I'll take over editing the spec to free up Proto on other things. He (and others) can then review the new spec PR.

I do have a few broad design & naming suggestions (these suggestions are also in review comments, but since there's so many of them, I'm hoisting here for visibility, sorry for duplication):

Naming: I don't love the current terminology of channels & batches.
- channel generally imply a communication conduit — here it's a big blob of data encoding multiple blocks
- batch normally implies multiplicity, but here a batch is a (simple) encoding of block inputs
- suggestion: rename "channel" to "batch" and "batch" to some paraphrase like "encoded block inputs" (it's not being used very often)

Architecture: it seems like a lot of the work could be done in the batcher instead of in the rollup ndoe and I think this mitigates some potential issues, see my comment at the end of batching.md for details.

I'm also curious (maybe dubious?) regarding the necessity of many concurrent channels. Seems like in most cases we're only using one at the same time? I think moving things to the batcher also helps mitigate the particular architectural overhead (the great thing thing about the design is that the on-chain data doesn't need to change, so we can potentially table this change for later if it is something we would like to do).

norswap · 2022-06-20T12:14:49Z

specs/batching.md

+# Batch submitting
+
+The process that signs and publishes transactions is not part of the rollup node, but an external “batcher”.
+This is effectively the inverse function of the derivation process, but specified here for posterity. This API design is not enforced by the protocol.


It's probably useful to mention that the batch format is enforced, and link to the page of definition.

norswap · 2022-06-20T12:16:16Z

specs/batching.md

+The process that signs and publishes transactions is not part of the rollup node, but an external “batcher”.
+This is effectively the inverse function of the derivation process, but specified here for posterity. This API design is not enforced by the protocol.
+
+The batcher may use any data-availability platform that registers batch data within a chain of block-hashes and block-numbers.


Do you mean "a blockchain such that every block has a number, and every block includes the hash of its predecessor block"?

norswap · 2022-06-20T12:19:45Z

specs/batching.md

+The batcher may use any data-availability platform that registers batch data within a chain of block-hashes and block-numbers.
+
+The rollup node is mostly agnostic to the data platform: as a verifier the channel-bank has to traverse the chain and data has to be written to the bank.
+Ethereum calldata is supported by default, other data-sources may be supported in future upgrades.


You mention the channel bank here before explaining what it is, which should generally be avoided.

Suggestion:

The rollup node is mostly agnostic to the data platform, as long as it can be treated as a traditional blockchain with numbers and block hashes, as explained earlier. Ethereum calldata is supported by default, other data-sources may be supported in future upgrades.

The part about the channel bank can be moved to the part where the channel bank is explained.

norswap · 2022-06-20T12:27:10Z

specs/derivation.md

+
+[Block derivation][g-derivation] is the process of building a chain of L2 [blocks][g-block] from a chain of L1 data.
+
+The L1 data can be thought of as a stream, and L2 rollup nodes are the ***readers***, transforming their state with the inputs.


suggestion:

The L1 chain can be seen as a stream of L2 derivation inputs, which L2 rollup nodes read from, using the derivation inputs to derive he L2 chain and update their state.

norswap · 2022-06-20T12:34:25Z

specs/derivation.md

+The L1 data can be thought of as a stream, and L2 rollup nodes are the ***readers***, transforming their state with the inputs.
+
+The L2 batcher nodes are the ***writers*** to the stream: they take L2 changes and submit the necessary data to L1.
+Others can then reproduce the same L2 changes.


Since batchers do not form an interconnected network, I would not calle them "nodes". Use "batcher" instead?

In general, this might lack precision: what are changes?

It might also be good to make explicit the distinction between verifiers and sequencers.

suggestion:

L2 batchers write to the stream (to L1): they take L2 blocks creates by a sequencer and submit the necessary data to L1, such that verifiers are able to derive the same blocks from L1.

norswap · 2022-06-22T13:39:14Z

specs/derivation.md

+Decode `BatchData` entries, each entry is prefixed with a version byte.
+Unknown versions are invalid just like malformatted `BatchData`, and the pipeline should move to the next frame.
+
+#### Batch encoding


I see so, we do have a "batch", which is basically an encoded block. This invalidate some comment I made earlier.

Also not fond of the name, though I know we've used it like that in the past — batch implies multiples of things.

norswap · 2022-06-22T13:41:38Z

specs/derivation.md

+and encoded as a byte-string (including version prefix byte) in the bundle RLP list.
+
+`<batch_version>`: `<name> = <content>`
+- `0`: `BatchV1 = RLP([epoch, timestamp, transaction_list])`


Was briefly confused about the name being part of the encoding. Making this a table with the first line as headers would make it clearer it's not the case.

Also do we really need names? (i.e. version 0 or "BatchV1" carries same info) Can add a comment column to the table later if need be.

norswap · 2022-06-22T13:45:29Z

specs/derivation.md

+    - timestamp
+    - basefee
+    - *random* (the output of the [`RANDOM` opcode][random])
+  - L1 log entries emitted for [user deposits][g-deposits], augmented with a [sourceHash](./deposits.md#).


I guess confirmation depth stuff to be added later?

norswap · 2022-06-22T13:55:56Z

specs/derivation.md

+  - `highest_valid_batch_timestamp = max(batch.timestamp for batch in filtered_batches)`,
+    or `0` if no there are no `filtered_batches`.
+    `batch.timestamp` refers to the L2 block timestamp encoded in the batch.
+  - `next_l1_timestamp` is the timestamp of the next L1 block.


I know this is a copy/paste, but would be really nice to have some rationale for highest_valid_batch_timestamp being used (just a note to self to add this).

norswap · 2022-06-22T13:57:17Z

specs/derivation.md

+
+### Payload Attributes Deriver
+
+Payload Attributes are derived by combining transactions derived from the referenced L1 origin, with transactions from the batch:


Probably clarify L1 origin as the L1 block whose number matches the batch.

trianglesphere

two design comments @protolambda

trianglesphere · 2022-06-22T17:41:33Z

specs/derivation.md

+frame_number      = uvarint  # identifies the index of frame_data in the channel
+frame_data_length = uvarint  # frame_data_length == len(frame_data)


I think we should change these to uint32s (could have frame # be uint16, need uint32 for data_length).

trianglesphere · 2022-06-22T17:42:59Z

specs/derivation.md

+
+```text
+CHANNEL_TIMEOUT = 600   # time in seconds until a channel is pruned from the channel bank
+MAX_CHANNEL_BANK_SIZE = 100_000_000  # max size of the combined channels in the channel bank


I think a per channel max size + cap on the number of open channels is more resilient.

trianglesphere · 2022-06-22T17:44:32Z

specs/derivation.md

+1. Read the version number of the L1 data, check if it is 0, the data is invalid if not.
+2. Determine the `total_size` of the bank by summing the `size` of all channels in the bank.
+   For as long as `total_size > MAX_CHANNEL_BANK_SIZE`    prune the first-seen channel (i.e. lowest `time`) and reduce `total_size` by that channel `size` .
+   The first-seen channel is pruned first, since this is the channel that fails to be read before the bank became this large.


I think we should prune the last seen to enable the node to keep processing the open channel. There's also an argument that we should reopen future channels after we've cleared the current channel.

norswap · 2022-06-22T17:12:09Z

specs/derivation.md

+2. Reset the Batch Queue, starting at the common L1 origin found during the Engine Queue reset.
+3. Reset the stages down until the Channel Bank, wiping the contents and copying the safe origin of the Batch Queue.
+4. Reset the Channel Bank
+5. Reset the L1 source by wiping any buffered data, and copying the safe origin of the Channel Bank.


What are those "safe origins"? Are they different from one another and different from origin of safe L2 block?

norswap · 2022-06-22T17:15:51Z

specs/derivation.md

+
+After the resets are performed (possibly in smaller iterative chain traversal steps),
+the regular pipeline derivation can dry-run to heal the buffer contents
+(i.e. drop any outputs of a stage that lags behind the next stage closer to L2).


What is meant by the "regular pipeline"? Is there an "irregular pipeline" 😅?

And what does it mean to dry-run it? I have a feeling this is related to "reconciliation" of existing blocks in the execution engine with (re-)derived execution payload — is that it?

i.e. drop any outputs of a stage that lags behind the next stage closer to L2

I don't get this one.

norswap · 2022-06-22T17:20:13Z

specs/derivation.md

+The `unsafe` block starts as the last known tip of the L2 chain,
+but is reset back to equal the `safe` block if at any point the `unsafe` block height is known but non-canonical.
+An `unsafe` head with an origin beyond the `safe` origin is considered "plausible":
+retaining the existing data is preferable, but it may be reorganized at a later point if consolidation with the `safe` chain fails.


reset back to equal the safe block if at any point the unsafe block height is known but non-canonical

what about:

reset back to equal the safe block if at any point the unsafe block becomes non-canonical.

Side note: we probably need to define the concept of "origin" in the glossary.

mergify · 2022-06-29T00:03:12Z

Hey @protolambda! This PR has merge conflicts. Please fix them before continuing review.

## Description Fixes CI by: - updating all dependencies (optimism-package, kurtosis, monorepo) - fixing cargo hack failures caused by #2762 - fixing udeps failure caused by #2762 - fixing sysgo job failure upstream caused by #2762

protolambda marked this pull request as ready for review June 22, 2022 12:04

protolambda requested review from norswap and trianglesphere June 22, 2022 12:04

norswap suggested changes Jun 22, 2022

View reviewed changes

trianglesphere reviewed Jun 22, 2022

View reviewed changes

norswap reviewed Jun 23, 2022

View reviewed changes

mergify bot added the conflict label Jun 29, 2022

protolambda added 3 commits July 15, 2022 13:59

batch derivation spec draft

e229a4d

add diagrams, fix recovery-from-reorg wording

0176917

update spec to cover channel ID timestamp

4bc6edd

protolambda force-pushed the batch-deriv-v2-spec branch from d6ab53f to b8ef653 Compare July 15, 2022 12:00

mergify bot removed the conflict label Jul 15, 2022

specs: update batch derivation diagrams

b94c634

protolambda force-pushed the batch-deriv-v2-spec branch from b8ef653 to b94c634 Compare July 15, 2022 12:32

protolambda closed this Jul 16, 2022


		[Block derivation][g-derivation] is the process of building a chain of L2 [blocks][g-block] from a chain of L1 data.

		The L1 data can be thought of as a stream, and L2 rollup nodes are the *readers*, transforming their state with the inputs.


		### Payload Attributes Deriver

		Payload Attributes are derived by combining transactions derived from the referenced L1 origin, with transactions from the batch:

		frame_number = uvarint # identifies the index of frame_data in the channel
		frame_data_length = uvarint # frame_data_length == len(frame_data)

Comments

Conversation

protolambda commented Jun 17, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

changeset-bot bot commented Jun 17, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

⚠️ No Changeset found

Uh oh!

norswap left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

trianglesphere left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mergify bot commented Jun 29, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

protolambda commented Jun 17, 2022 •

edited

Loading

changeset-bot bot commented Jun 17, 2022 •

edited

Loading