Bedrock: updated Batch Derivation specification#2807
Bedrock: updated Batch Derivation specification#2807protolambda wants to merge 4 commits intodevelopfrom
Conversation
|
norswap
left a comment
There was a problem hiding this comment.
Huge review =)
First off, I've talked with Proto and given the amount of changes I'd like, I'll take over editing the spec to free up Proto on other things. He (and others) can then review the new spec PR.
I do have a few broad design & naming suggestions (these suggestions are also in review comments, but since there's so many of them, I'm hoisting here for visibility, sorry for duplication):
Naming: I don't love the current terminology of channels & batches.
- channel generally imply a communication conduit — here it's a big blob of data encoding multiple blocks
- batch normally implies multiplicity, but here a batch is a (simple) encoding of block inputs
- suggestion: rename "channel" to "batch" and "batch" to some paraphrase like "encoded block inputs" (it's not being used very often)
Architecture: it seems like a lot of the work could be done in the batcher instead of in the rollup ndoe and I think this mitigates some potential issues, see my comment at the end of batching.md for details.
I'm also curious (maybe dubious?) regarding the necessity of many concurrent channels. Seems like in most cases we're only using one at the same time? I think moving things to the batcher also helps mitigate the particular architectural overhead (the great thing thing about the design is that the on-chain data doesn't need to change, so we can potentially table this change for later if it is something we would like to do).
specs/batching.md
Outdated
| # Batch submitting | ||
|
|
||
| The process that signs and publishes transactions is not part of the rollup node, but an external “batcher”. | ||
| This is effectively the inverse function of the derivation process, but specified here for posterity. This API design is not enforced by the protocol. |
There was a problem hiding this comment.
It's probably useful to mention that the batch format is enforced, and link to the page of definition.
specs/batching.md
Outdated
| The process that signs and publishes transactions is not part of the rollup node, but an external “batcher”. | ||
| This is effectively the inverse function of the derivation process, but specified here for posterity. This API design is not enforced by the protocol. | ||
|
|
||
| The batcher may use any data-availability platform that registers batch data within a chain of block-hashes and block-numbers. |
There was a problem hiding this comment.
Do you mean "a blockchain such that every block has a number, and every block includes the hash of its predecessor block"?
specs/batching.md
Outdated
| The batcher may use any data-availability platform that registers batch data within a chain of block-hashes and block-numbers. | ||
|
|
||
| The rollup node is mostly agnostic to the data platform: as a verifier the channel-bank has to traverse the chain and data has to be written to the bank. | ||
| Ethereum calldata is supported by default, other data-sources may be supported in future upgrades. |
There was a problem hiding this comment.
You mention the channel bank here before explaining what it is, which should generally be avoided.
Suggestion:
The rollup node is mostly agnostic to the data platform, as long as it can be treated as a traditional blockchain with numbers and block hashes, as explained earlier. Ethereum calldata is supported by default, other data-sources may be supported in future upgrades.
The part about the channel bank can be moved to the part where the channel bank is explained.
|
|
||
| [Block derivation][g-derivation] is the process of building a chain of L2 [blocks][g-block] from a chain of L1 data. | ||
|
|
||
| The L1 data can be thought of as a stream, and L2 rollup nodes are the ***readers***, transforming their state with the inputs. |
There was a problem hiding this comment.
suggestion:
The L1 chain can be seen as a stream of L2 derivation inputs, which L2 rollup nodes read from, using the derivation inputs to derive he L2 chain and update their state.
| The L1 data can be thought of as a stream, and L2 rollup nodes are the ***readers***, transforming their state with the inputs. | ||
|
|
||
| The L2 batcher nodes are the ***writers*** to the stream: they take L2 changes and submit the necessary data to L1. | ||
| Others can then reproduce the same L2 changes. |
There was a problem hiding this comment.
Since batchers do not form an interconnected network, I would not calle them "nodes". Use "batcher" instead?
In general, this might lack precision: what are changes?
It might also be good to make explicit the distinction between verifiers and sequencers.
suggestion:
L2 batchers write to the stream (to L1): they take L2 blocks creates by a sequencer and submit the necessary data to L1, such that verifiers are able to derive the same blocks from L1.
| Decode `BatchData` entries, each entry is prefixed with a version byte. | ||
| Unknown versions are invalid just like malformatted `BatchData`, and the pipeline should move to the next frame. | ||
|
|
||
| #### Batch encoding |
There was a problem hiding this comment.
I see so, we do have a "batch", which is basically an encoded block. This invalidate some comment I made earlier.
Also not fond of the name, though I know we've used it like that in the past — batch implies multiples of things.
| and encoded as a byte-string (including version prefix byte) in the bundle RLP list. | ||
|
|
||
| `<batch_version>`: `<name> = <content>` | ||
| - `0`: `BatchV1 = RLP([epoch, timestamp, transaction_list])` |
There was a problem hiding this comment.
Was briefly confused about the name being part of the encoding. Making this a table with the first line as headers would make it clearer it's not the case.
Also do we really need names? (i.e. version 0 or "BatchV1" carries same info) Can add a comment column to the table later if need be.
| - timestamp | ||
| - basefee | ||
| - *random* (the output of the [`RANDOM` opcode][random]) | ||
| - L1 log entries emitted for [user deposits][g-deposits], augmented with a [sourceHash](./deposits.md#). |
There was a problem hiding this comment.
I guess confirmation depth stuff to be added later?
| - `highest_valid_batch_timestamp = max(batch.timestamp for batch in filtered_batches)`, | ||
| or `0` if no there are no `filtered_batches`. | ||
| `batch.timestamp` refers to the L2 block timestamp encoded in the batch. | ||
| - `next_l1_timestamp` is the timestamp of the next L1 block. |
There was a problem hiding this comment.
I know this is a copy/paste, but would be really nice to have some rationale for highest_valid_batch_timestamp being used (just a note to self to add this).
|
|
||
| ### Payload Attributes Deriver | ||
|
|
||
| Payload Attributes are derived by combining transactions derived from the referenced L1 origin, with transactions from the batch: |
There was a problem hiding this comment.
Probably clarify L1 origin as the L1 block whose number matches the batch.
trianglesphere
left a comment
There was a problem hiding this comment.
two design comments @protolambda
| frame_number = uvarint # identifies the index of frame_data in the channel | ||
| frame_data_length = uvarint # frame_data_length == len(frame_data) |
There was a problem hiding this comment.
I think we should change these to uint32s (could have frame # be uint16, need uint32 for data_length).
|
|
||
| ```text | ||
| CHANNEL_TIMEOUT = 600 # time in seconds until a channel is pruned from the channel bank | ||
| MAX_CHANNEL_BANK_SIZE = 100_000_000 # max size of the combined channels in the channel bank |
There was a problem hiding this comment.
I think a per channel max size + cap on the number of open channels is more resilient.
| 1. Read the version number of the L1 data, check if it is 0, the data is invalid if not. | ||
| 2. Determine the `total_size` of the bank by summing the `size` of all channels in the bank. | ||
| For as long as `total_size > MAX_CHANNEL_BANK_SIZE` prune the first-seen channel (i.e. lowest `time`) and reduce `total_size` by that channel `size` . | ||
| The first-seen channel is pruned first, since this is the channel that fails to be read before the bank became this large. |
There was a problem hiding this comment.
I think we should prune the last seen to enable the node to keep processing the open channel. There's also an argument that we should reopen future channels after we've cleared the current channel.
| 2. Reset the Batch Queue, starting at the common L1 origin found during the Engine Queue reset. | ||
| 3. Reset the stages down until the Channel Bank, wiping the contents and copying the safe origin of the Batch Queue. | ||
| 4. Reset the Channel Bank | ||
| 5. Reset the L1 source by wiping any buffered data, and copying the safe origin of the Channel Bank. |
There was a problem hiding this comment.
What are those "safe origins"? Are they different from one another and different from origin of safe L2 block?
|
|
||
| After the resets are performed (possibly in smaller iterative chain traversal steps), | ||
| the regular pipeline derivation can dry-run to heal the buffer contents | ||
| (i.e. drop any outputs of a stage that lags behind the next stage closer to L2). |
There was a problem hiding this comment.
What is meant by the "regular pipeline"? Is there an "irregular pipeline" 😅?
And what does it mean to dry-run it? I have a feeling this is related to "reconciliation" of existing blocks in the execution engine with (re-)derived execution payload — is that it?
i.e. drop any outputs of a stage that lags behind the next stage closer to L2
I don't get this one.
| The `unsafe` block starts as the last known tip of the L2 chain, | ||
| but is reset back to equal the `safe` block if at any point the `unsafe` block height is known but non-canonical. | ||
| An `unsafe` head with an origin beyond the `safe` origin is considered "plausible": | ||
| retaining the existing data is preferable, but it may be reorganized at a later point if consolidation with the `safe` chain fails. |
There was a problem hiding this comment.
reset back to equal the
safeblock if at any point theunsafeblock height is known but non-canonical
what about:
reset back to equal the
safeblock if at any point theunsafeblock becomes non-canonical.
There was a problem hiding this comment.
Side note: we probably need to define the concept of "origin" in the glossary.
|
Hey @protolambda! This PR has merge conflicts. Please fix them before continuing review. |
d6ab53f to
b8ef653
Compare
b8ef653 to
b94c634
Compare
This spec update describes the updated batch derivation process, as well as batch-submission (previously missing from the spec!), and approaches derivation more systematically (in stages).
This is largely based on two docs:
rollup_node.mdcontents: split of from the node spec. The rollup-node is more than just derivation, and derivation is a topic that deserves its own document.Update:
There are some imperfections, and I need to update ToCs + linting. This is a draft.
See #2714 for ongoing implementation work.
Key bedrock improvements:
Known bugs/issues in spec:
minSizeAPI argument is not specced/implemented yet.