spec: Block Sync (cosmos#1204)

## Overview [Rendered file](https://github.com/rollkit/rollkit/blob/manav/p2p_blocksync_spec/block/block-manager.md) Adds block sync details to block manager spec.  ## Checklist  - [x] New and updated code has appropriate documentation - [ ] New and updated code has new and/or updated testing - [ ] Required CI checks are passing - [ ] Visual proof for any user facing features like CLI or documentation updates - [ ] Linked issues closed with keywords --------- Co-authored-by: Matthew Sevey <[email protected]> Co-authored-by: nashqueue <[email protected]>
caelus-labs · Oct 16, 2023 · c3b6ec0 · c3b6ec0
1 parent 0a8d655
commit c3b6ec0
Show file tree

Hide file tree

Showing 3 changed files with 103 additions and 33 deletions.
diff --git a/block/block-manager.md b/block/block-manager.md
@@ -1,14 +1,46 @@
-# Block
+# Block Manager
 
 ## Abstract
 
-The block manager is a key component of full nodes and is responsible for block production or block syncing depending on the node type. Block syncing in this context includes retrieving the published blocks from the network (p2p network or DA network), validating them to raise fraud proofs upon validation failure, updating the state, and storing the validated blocks. A full node invokes multiple block manager functionalities in parallel, such as:
+The block manager is a key component of full nodes and is responsible for block production or block syncing depending on the node type: sequencer or non-sequencer. Block syncing in this context includes retrieving the published blocks from the network (P2P network or DA network), validating them to raise fraud proofs upon validation failure, updating the state, and storing the validated blocks. A full node invokes multiple block manager functionalities in parallel, such as:
 
-* block production (only for sequencer full nodes)
-* block publication to DA network
-* block retrieval from DA network
-* block retrieval from Blockstore (which retrieves blocks from the p2p network)
-* block syncing
+* Block Production (only for sequencer full nodes)
+* Block Publication to DA network
+* Block Retrieval from DA network
+* Block Sync Service
+* Block Publication to P2P network
+* Block Retrieval from P2P network
+* State Update after Block Retrieval
+
+```mermaid
+sequenceDiagram
+    title Overview of Block Manager
+
+    participant User
+    participant Sequencer
+    participant Full Node 1
+    participant Full Node 2
+    participant DA Layer
+
+    User->>Sequencer: Send Tx
+    Sequencer->>Sequencer: Generate Block
+    Sequencer->>DA Layer: Publish Block
+
+    Sequencer->>Full Node 1: Gossip Block
+    Sequencer->>Full Node 2: Gossip Block
+    Full Node 1->>Full Node 1: Verify Block
+    Full Node 1->>Full Node 2: Gossip Block
+    Full Node 1->>Full Node 1: Mark Block Soft-Confirmed
+
+    Full Node 2->>Full Node 2: Verify Block
+    Full Node 2->>Full Node 2: Mark Block Soft-Confirmed
+
+    DA Layer->>Full Node 1: Retrieve Block
+    Full Node 1->>Full Node 1: Mark Block Hard-Confirmed
+
+    DA Layer->>Full Node 2: Retrieve Block
+    Full Node 2->>Full Node 2: Mark Block Hard-Confirmed
+```
 
 ## Protocol/Component Description
 
@@ -22,7 +54,7 @@ genesis|*cmtypes.GenesisDoc|initialize the block manager with genesis state (gen
 store|store.Store|local datastore for storing rollup blocks and states (default local store path is `$db_dir/rollkit` and `db_dir` specified in the `config.toml` file under the app directory)
 mempool, proxyapp, eventbus|mempool.Mempool, proxy.AppConnConsensus, *cmtypes.EventBus|for initializing the executor (state transition function). mempool is also used in the manager to check for availability of transactions for lazy block production
 dalc|da.DataAvailabilityLayerClient|the data availability light client used to submit and retrieve blocks to DA network
-blockstore|*goheaderstore.Store[*types.Block]|to retrieve blocks gossiped over the p2p network
+blockstore|*goheaderstore.Store[*types.Block]|to retrieve blocks gossiped over the P2P network
 
 Block manager configuration options:
 
@@ -54,23 +86,40 @@ The block manager of the sequencer nodes performs the following steps to produce
 
 ### Block Publication to DA Network
 
-The block manager of the sequencer full nodes regularly publishes the produced blocks (that are pending in the `pendingBlocks` queue) to the DA network using the `DABlockTime` configuration parameter defined in the block manager config. In the event of failure to publish the block to the DA network, the manager will perform [`maxSubmitAttempts`][maxSubmitAttempts] attempts and an exponential backoff interval between the attempts. The exponential backoff interval starts off at [`initialBackoff`][initialBackoff] and it doubles in the next attempt and capped at `DABlockTime`. A successful publish event leads to the emptying of `pendingBlocks` queue and a failure event leads to proper error reporting and without emptying of `pendingBlocks` queue.
+The block manager of the sequencer full nodes regularly publishes the produced blocks (that are pending in the `pendingBlocks` queue) to the DA network using the `DABlockTime` configuration parameter defined in the block manager config. In the event of failure to publish the block to the DA network, the manager will perform [`maxSubmitAttempts`][maxSubmitAttempts] attempts and an exponential backoff interval between the attempts. The exponential backoff interval starts off at [`initialBackoff`][initialBackoff] and it doubles in the next attempt and capped at `DABlockTime`. A successful publish event leads to the emptying of `pendingBlocks` queue and a failure event leads to proper error reporting without emptying of `pendingBlocks` queue.
 
 ### Block Retrieval from DA Network
 
 The block manager of the full nodes regularly pulls blocks from the DA network at `DABlockTime` intervals and starts off with a DA height read from the last state stored in the local store or `DAStartHeight` configuration parameter, whichever is the latest. The block manager also actively maintains and increments the `daHeight` counter after every DA pull. The pull happens by making the `RetrieveBlocks(daHeight)` request using the Data Availability Light Client (DALC) retriever, which can return either `Success`, `NotFound`, or `Error`. In the event of an error, a retry logic kicks in after a delay of 100 milliseconds delay between every retry and after 10 retries, an error is logged and the `daHeight` counter is not incremented, which basically results in the intentional stalling of the block retrieval logic. In the block `NotFound` scenario, there is no error as it is acceptable to have no rollup block at every DA height. The retrieval successfully increments the `daHeight` counter in this case. Finally, for the `Success` scenario, first, blocks that are successfully retrieved are marked as hard confirmed and are sent to be applied (or state update). A successful state update triggers fresh DA and block store pulls without respecting the `DABlockTime` and `BlockTime` intervals.
 
-#### About Soft/Hard Confirmations
+### Block Sync Service
+
+The block sync service is created during full node initialization. After that, during the block manager's initialization, a pointer to the block store inside the block sync service is passed to it. Blocks created in the block manager are then passed to the `BlockCh` channel and then sent to the [go-header] service to be gossiped blocks over the P2P network.
 
-The block manager retrieves blocks from both the p2p network and the underlying DA network because the blocks are available in the p2p network faster and DA retrieval is slower (e.g., 1 second vs 15 seconds). The blocks retrieved from the p2p network are only marked as soft confirmed until the DA retrieval succeeds on those blocks and they are marked hard confirmed. The hard confirmations can be considered to have a higher level of finality.
+### Block Publication to P2P network
 
-### Block Retrieval from BlockStore (P2P BlockSync)
+Blocks created by the sequencer that are ready to be published to the P2P network are sent to the `BlockCh` channel in Block Manager inside `publishLoop`.
+The `blockPublishLoop` in the full node continuously listens for new blocks from the `BlockCh` channel and when a new block is received, it is written to the block store and broadcasted to the network using the block sync service.
 
-The block manager of the full nodes regularly pulls blocks from the block store (which in turn uses the p2p network for syncing the blocks) at `BlockTime` intervals and starts off with a block store height of zero. Every time the block store height is higher than the last seen height, the newest blocks are pulled from the block store and sent to be applied (or state update), along with updating the last seen block store height.
+Among non-sequencer full nodes, all the block gossiping is handled by the block sync service, and they do not need to publish blocks to the P2P network using any of the block manager components.
 
-### Block Syncing
+### Block Retrieval from P2P network
 
-The block manager stores and applies the block every time a new block is retrieved either via the blockstore or DA network. Block syncing involves:
+For non-sequencer full nodes, Blocks gossiped through the P2P network are retrieved from the `Block Store` in `BlockStoreRetrieveLoop` in Block Manager.
+Starting off with a block store height of zero, for every `blockTime` unit of time, a signal is sent to the `blockStoreCh` channel in the block manager and when this signal is received, the `BlockStoreRetrieveLoop` retrieves blocks from the block store.
+It keeps track of the last retrieved block's height and every time the current block store's height is greater than the last retrieved block's height, it retrieves all blocks from the block store that are between these two heights.
+For each retrieved block, it sends a new block event to the `blockInCh` channel which is the same channel in which blocks retrieved from the DA layer are sent.
+This block is marked as soft-confirmed by the validating full node until the same block is seen on the DA layer and then marked hard-confirmed.
+
+Although a sequencer does not need to retrieve blocks from the P2P network, it still runs the `BlockStoreRetrieveLoop`.
+
+#### About Soft/Hard Confirmations
+
+The block manager retrieves blocks from both the P2P network and the underlying DA network because the blocks are available in the P2P network faster and DA retrieval is slower (e.g., 1 second vs 15 seconds). The blocks retrieved from the P2P network are only marked as soft confirmed until the DA retrieval succeeds on those blocks and they are marked hard confirmed. The hard confirmations can be considered to have a higher level of finality.
+
+### State Update after Block Retrieval
+
+The block manager stores and applies the block to update its state every time a new block is retrieved either via the P2P or DA network. State update involves:
 
 * `ApplyBlock` using executor: validates the block, executes the block (applies the transactions), captures the validator updates, and creates an updated state.
 * `Commit` using executor: commit the execution and changes, update mempool, and publish events
@@ -96,15 +145,36 @@ The communication between the full node and block manager:
 * The default mode for sequencer nodes is normal (not lazy).
 * The sequencer can produce empty blocks.
 * The block manager uses persistent storage (disk) when the `root_dir` and `db_path` configuration parameters are specified in `config.toml` file under the app directory. If these configuration parameters are not specified, the in-memory storage is used, which will not be persistent if the node stops.
-* The block manager does not re-apply the block again (in other words, create a new updated state and persist it) when a block was initially applied using p2p block sync, but later was hard confirmed by DA retrieval. The block is only set hard confirmed in this case.
+* The block manager does not re-apply the block again (in other words, create a new updated state and persist it) when a block was initially applied using P2P block sync, but later was hard confirmed by DA retrieval. The block is only set hard confirmed in this case.
+* The block sync store is created by prefixing `blockSync` on the main data store.
+* The genesis `ChainID` is used to create the `PubSubTopID` in go-header with the string `-block` appended to it. This append is because the full node also has a P2P header sync running with a different P2P network. Refer to go-header specs for more details.
+* Block sync over the P2P network works only when a full node is connected to the P2P network by specifying the initial seeds to connect to via `P2PConfig.Seeds` configuration parameter when starting the full node.
+* Node's context is passed down to all the components of the P2P block sync to control shutting down the service either abruptly (in case of failure) or gracefully (during successful scenarios).
 
 ## Implementation
 
-See [block/manager.go](https://github.com/rollkit/rollkit/blob/main/block/manager.go)
+See [block-manager]
+
+See [tutorial] for running a multi-node network with both sequencer and non-sequencer full nodes.
 
 ## References
 
+[1] [Go Header][go-header]
+
+[2] [Block Sync][block-sync]
+
+[3] [Full Node][full-node]
+
+[4] [Block Manager][block-manager]
+
+[5] [Tutorial][tutorial]
+
 [maxSubmitAttempts]: https://github.com/rollkit/rollkit/blob/main/block/manager.go#L39
 [defaultBlockTime]: https://github.com/rollkit/rollkit/blob/main/block/manager.go#L35
 [defaultDABlockTime]: https://github.com/rollkit/rollkit/blob/main/block/manager.go#L32
 [initialBackoff]: https://github.com/rollkit/rollkit/blob/main/block/manager.go#L48
+[go-header]: https://github.com/celestiaorg/go-header
+[block-sync]: https://github.com/rollkit/rollkit/blob/main/node/block_sync.go
+[full-node]: https://github.com/rollkit/rollkit/blob/main/node/full.go
+[block-manager]: https://github.com/rollkit/rollkit/blob/main/block/manager.go
+[tutorial]: https://rollkit.dev/tutorials/full-and-sequencer-node#getting-started
diff --git a/node/full_node.md b/node/full_node.md
@@ -94,16 +94,16 @@ See [full node]
 
 [13] [Block Sync Service][Block Sync Service]
 
-[full node]: ../node/full.go
+[full node]: https://github.com/rollkit/rollkit/blob/main/node/full.go
 [ABCI app connections]: https://github.com/cometbft/cometbft/blob/main/spec/abci/abci%2B%2B_basic_concepts.md
 [genesis]: https://github.com/cometbft/cometbft/blob/main/spec/core/genesis.md
-[node configuration]: ../config/config.go
-[peer-to-peer client]: ../p2p/client.go
-[Mempool]: ../mempool/mempool.go
-[Store]: ../store/store.go
-[store interface]: ../store/types.go
-[Block Manager]: ../block/manager.go
-[dalc]: ../da/da.go
-[DA registry]: ../da/registry/registry.go
-[Header Sync Service]: ../block/header_sync.go
-[Block Sync Service]: ../block/block_sync.go
+[node configuration]: https://github.com/rollkit/rollkit/blob/main/config/config.go
+[peer-to-peer client]: https://github.com/rollkit/rollkit/blob/main/p2p/client.go
+[Mempool]: https://github.com/rollkit/rollkit/blob/main/mempool/mempool.go
+[Store]: https://github.com/rollkit/rollkit/blob/main/store/store.go
+[store interface]: https://github.com/rollkit/rollkit/blob/main/store/types.go
+[Block Manager]: https://github.com/rollkit/rollkit/blob/main/block/manager.go
+[dalc]: https://github.com/rollkit/rollkit/blob/main/da/da.go
+[DA registry]: https://github.com/rollkit/rollkit/blob/main/da/registry/registry.go
+[Header Sync Service]: https://github.com/rollkit/rollkit/blob/main/block/header_sync.go
+[Block Sync Service]: https://github.com/rollkit/rollkit/blob/main/block/block_sync.go
diff --git a/specs/src/specs/header-sync.md b/specs/src/specs/header-sync.md
@@ -28,27 +28,27 @@ The sequencer node, upon successfully creating the block, publishes the signed b
 
 ## Assumptions
 
-* The header sync store is created by prefixing `headerEx` the main datastore.
+* The header sync store is created by prefixing `headerSync` the main datastore.
 * The genesis `ChainID` is used to create the `PubsubTopicID` in [go-header][go-header]. For example, for ChainID `gm`, the pubsub topic id is `/gm/header-sub/v0.0.1`. Refer to go-header specs for further details.
-* The header store must be initialized with genesis header before starting the syncer service. The genesis header can be loaded by passing the genesis header hash via `NodeConfig.TrustedHash` configuration parameter or by querying the P2P network. This imposes a time constraint that full/light nodes have to wait for the sequencer to publish the genesis header to the P2P network before starting the P2P header sync service.
+* The header store must be initialized with genesis header before starting the syncer service. The genesis header can be loaded by passing the genesis header hash via `NodeConfig.TrustedHash` configuration parameter or by querying the P2P network. This imposes a time constraint that full/light nodes have to wait for the sequencer to publish the genesis header to the P2P network before starting the header sync service.
 * The Header Sync works only when the node is connected to the P2P network by specifying the initial seeds to connect to via the `P2PConfig.Seeds` configuration parameter.
 * The node's context is passed down to all the components of the P2P header sync to control shutting down the service either abruptly (in case of failure) or gracefully (during successful scenarios).
 
 ## Implementation
 
-The header sync implementation can be found in [node/header_exchange.go][header exchange]. The full and light nodes create and start the header sync service under [full][fullnode] and [light][lightnode].
+The header sync implementation can be found in [node/header_sync.go][header sync]. The full and light nodes create and start the header sync service under [full][fullnode] and [light][lightnode].
 
 ## References
 
-[1] [Header Exchange][header exchange]
+[1] [Header Sync][header sync]
 
 [2] [Full Node][fullnode]
 
 [3] [Light Node][lightnode]
 
 [4] [go-header][go-header]
 
-[header exchange]: https://github.com/rollkit/rollkit/blob/main/block/header_exchange.go
+[header sync]: https://github.com/rollkit/rollkit/blob/main/block/header_sync.go
 [fullnode]: https://github.com/rollkit/rollkit/blob/main/node/full.go
 [lightnode]: https://github.com/rollkit/rollkit/blob/main/node/light.go
 [go-header]: https://github.com/celestiaorg/go-header