feat(node): Kona Light CL - First iteration#3199
Conversation
Wiz Scan Summary
To detect these findings earlier in the dev lifecycle, try using Wiz Code VS Code Extension. |
99c207f to
6ed2a49
Compare
befcf52 to
c626ee7
Compare
op-will
left a comment
There was a problem hiding this comment.
Looks good!
I left a few comments that lead to the question "are we doing something entirely new, or do we now have multiple ways to achieve the same thing?"
If the latter, and there's enough in common, we may opt to unify some interfaces. That said, I have not fully considered how this evolves in the future, so I may be missing something obvious that indicates that they should be kept separate.
| /// L2 follow client arguments. | ||
| #[derive(Clone, Debug, Default, clap::Args)] | ||
| pub struct FollowClientArgs { | ||
| /// URL of the L2 follow source RPC API. | ||
| /// The source must be the L2 CL RPC. | ||
| #[arg(long, visible_alias = "l2.follow.source", env = "KONA_NODE_L2_FOLLOW_SOURCE")] | ||
| pub l2_follow_source: Option<Url>, | ||
| } |
There was a problem hiding this comment.
| /// L2 follow client arguments. | |
| #[derive(Clone, Debug, Default, clap::Args)] | |
| pub struct FollowClientArgs { | |
| /// URL of the L2 follow source RPC API. | |
| /// The source must be the L2 CL RPC. | |
| #[arg(long, visible_alias = "l2.follow.source", env = "KONA_NODE_L2_FOLLOW_SOURCE")] | |
| pub l2_follow_source: Option<Url>, | |
| } | |
| /// L2 derivation delegate connection arguments. | |
| #[derive(Clone, Debug, Default, clap::Args)] | |
| pub struct DerivationDelegateArgs { | |
| /// URL of the L2 derivation delegate RPC API. | |
| /// The url must be the L2 CL RPC. | |
| #[arg(long, visible_alias = "l2.follow.url", env = "KONA_NODE_L2_DERIVATION_DELEGATE_URL")] | |
| pub l2_derivation_delegate_url: Option<Url>, | |
| } |
| /// This struct contains a subset of fields from [`kona_protocol::SyncStatus`], | ||
| /// focusing only on the fields needed for following another rollup node. | ||
| #[derive(Clone, Debug, PartialEq, Eq)] | ||
| pub struct FollowStatus { |
There was a problem hiding this comment.
| pub struct FollowStatus { | |
| pub struct DerivationStatus { |
| /// Follow client trait for querying sync status from an L2 consensus layer RPC. | ||
| /// | ||
| /// This trait defines the interface for communicating with another rollup node's | ||
| /// consensus layer to fetch its synchronization status and querying L1 blocks. | ||
| /// The main reason this trait exists is for mocking and unit testing. | ||
| #[async_trait] | ||
| pub trait FollowClient: Send + Sync { | ||
| /// Gets the synchronization status from the follow source. | ||
| /// | ||
| /// Calls the `optimism_syncStatus` RPC method on the remote rollup node, | ||
| /// extracts the essential fields, and returns a simplified status. | ||
| /// | ||
| /// # Returns | ||
| /// | ||
| /// Returns the [`FollowStatus`] containing only the current L1, safe L2, | ||
| /// and finalized L2 blocks from the remote rollup node, or an error if | ||
| /// the RPC call fails. | ||
| async fn get_follow_status(&self) -> Result<FollowStatus, FollowClientError>; | ||
|
|
||
| /// Fetches the L1 [`BlockInfo`] by block number. | ||
| /// | ||
| /// Queries the L1 execution layer for the block at the given number and | ||
| /// returns the block information. | ||
| /// | ||
| /// # Arguments | ||
| /// | ||
| /// * `number` - The L1 block number to fetch | ||
| /// | ||
| /// # Returns | ||
| /// | ||
| /// Returns the [`BlockInfo`] for the requested L1 block, or an error if | ||
| /// the block cannot be fetched or does not exist. | ||
| async fn l1_block_info_by_number(&self, number: u64) -> Result<BlockInfo, FollowClientError>; | ||
| } |
There was a problem hiding this comment.
We're not using this client for sync status (which is ambiguous), but L2 derivation status, right? Would DerivationDelegate + DerivationStatus be more accurate names? If a developer reads FollowClient, they might not have an understanding of what it does and why, whereas DerivationDelegate says exactly what we're doing (delegating derivation to a 3rd party), and it is clear that we should use this trait to get info on the status of L2 derivation.
Just commenting here, but all of the other Follow* cases would need to be updated as well.
| /// Follow client trait for querying sync status from an L2 consensus layer RPC. | |
| /// | |
| /// This trait defines the interface for communicating with another rollup node's | |
| /// consensus layer to fetch its synchronization status and querying L1 blocks. | |
| /// The main reason this trait exists is for mocking and unit testing. | |
| #[async_trait] | |
| pub trait FollowClient: Send + Sync { | |
| /// Gets the synchronization status from the follow source. | |
| /// | |
| /// Calls the `optimism_syncStatus` RPC method on the remote rollup node, | |
| /// extracts the essential fields, and returns a simplified status. | |
| /// | |
| /// # Returns | |
| /// | |
| /// Returns the [`FollowStatus`] containing only the current L1, safe L2, | |
| /// and finalized L2 blocks from the remote rollup node, or an error if | |
| /// the RPC call fails. | |
| async fn get_follow_status(&self) -> Result<FollowStatus, FollowClientError>; | |
| /// Fetches the L1 [`BlockInfo`] by block number. | |
| /// | |
| /// Queries the L1 execution layer for the block at the given number and | |
| /// returns the block information. | |
| /// | |
| /// # Arguments | |
| /// | |
| /// * `number` - The L1 block number to fetch | |
| /// | |
| /// # Returns | |
| /// | |
| /// Returns the [`BlockInfo`] for the requested L1 block, or an error if | |
| /// the block cannot be fetched or does not exist. | |
| async fn l1_block_info_by_number(&self, number: u64) -> Result<BlockInfo, FollowClientError>; | |
| } | |
| /// Derivation delegate trait for querying derivation status from an L2 consensus layer RPC. | |
| /// | |
| /// This trait defines the interface for communicating with another rollup node's | |
| /// consensus layer to fetch its derivation status and querying L1 blocks. | |
| /// The main reason this trait exists is for mocking and unit testing. | |
| #[async_trait] | |
| pub trait DerivationDelegate: Send + Sync { | |
| /// Gets the derivation status from the configured delegate. | |
| /// | |
| /// Calls the `optimism_syncStatus` RPC method on the remote rollup node, | |
| /// extracts the essential fields, and returns a simplified status. | |
| /// | |
| /// # Returns | |
| /// | |
| /// Returns the [`DerivationStatus`] containing only the current L1, safe L2, | |
| /// and finalized L2 blocks from the remote rollup node, or an error if | |
| /// the RPC call fails. | |
| async fn get_derivation_status(&self) -> Result<DerivationStatus, DerivationDelegateError>; | |
| /// Fetches the L1 [`BlockInfo`] by block number. | |
| /// | |
| /// Queries the L1 execution layer for the block at the given number and | |
| /// returns the block information. | |
| /// | |
| /// # Arguments | |
| /// | |
| /// * `number` - The L1 block number to fetch | |
| /// | |
| /// # Returns | |
| /// | |
| /// Returns the [`BlockInfo`] for the requested L1 block, or an error if | |
| /// the block cannot be fetched or does not exist. | |
| async fn l1_block_info_by_number(&self, number: u64) -> Result<BlockInfo, DerivationDelegateError>; | |
| } |
| pub l2_url: Url, | ||
| /// The L1 execution layer RPC URL. | ||
| pub l1_url: Url, | ||
| /// The timeout duration for requests. |
There was a problem hiding this comment.
Nit: The name can indicate all info from the comment above it.
| pub l2_url: Url, | |
| /// The L1 execution layer RPC URL. | |
| pub l1_url: Url, | |
| /// The timeout duration for requests. | |
| pub l2_cl_rpc_url: Url, | |
| /// The L1 execution layer RPC URL. | |
| pub l1_el_rpc_url: Url, | |
| /// The timeout duration for requests. |
| /// Fetches the L1 [`BlockInfo`] by block number. | ||
| /// | ||
| /// Queries the L1 execution layer for the block at the given number and | ||
| /// returns the block information. | ||
| /// | ||
| /// # Arguments | ||
| /// | ||
| /// * `number` - The L1 block number to fetch | ||
| /// | ||
| /// # Returns | ||
| /// | ||
| /// Returns the [`BlockInfo`] for the requested L1 block, or an error if | ||
| /// the block cannot be fetched or does not exist. | ||
| async fn l1_block_info_by_number(&self, number: u64) -> Result<BlockInfo, FollowClientError>; |
There was a problem hiding this comment.
Is there a reason to bundle this into FollowClient?
It makes it seem like we're fetching the L1 block from the 3rd party that we trust for derivation, when we never should be. It would probably be clearer to make FollowActor generic over Provider and inject a RootProvider into it. Otherwise, it looks like we're asking the follow_client to validate data that it gave us, rather than obviously validating it using our own sources:
kona/crates/node/service/src/actors/follow/actor.rs
Lines 170 to 185 in c626ee7
| state.engine.enqueue(task); | ||
| } | ||
| attributes = self.attributes_rx.recv() => { | ||
| Some(attributes) = OptionFuture::from(self.attributes_rx.as_mut().map(|rx| rx.recv())), if self.attributes_rx.is_some() => { |
There was a problem hiding this comment.
I'm wondering if it makes sense to try to unify this with the follow logic since conceptually they're both advancing the unsafe head?
See this interface question and this task question. If so, we can leave finalization as is (see this)
|
Closing due to the re-design. Following ethereum-optimism/design-docs#359 |
~Base branch: #3242 ~May conflict with #3229, #3253 Implement the kona light CL([Design doc](https://github.com/ethereum-optimism/design-docs/blob/main/protocol/kona-node-light-cl.md)): - [DerivationActor - Target Determination](https://github.com/ethereum-optimism/design-docs/blob/main/protocol/kona-node-light-cl.md#derivationactor---target-determination) - [EngineActor - Fork Choice Update](https://github.com/ethereum-optimism/design-docs/blob/main/protocol/kona-node-light-cl.md#engineactor---fork-choice-update) ```mermaid flowchart TB subgraph A ["Normal Mode (Derivation)"] direction TB subgraph A0 ["Rollup Node Service"] direction TB A_Derivation["DerivationActor<br/>(L1->L2 derivation)"] A_Engine["EngineActor"] A_UnsafeSrc["Unsafe Source<br/>(P2P gossip / Sequencer)"] end A_L1[(L1 RPC)] A_EL[(Execution Layer)] A_L1 -->|L1 info| A_Derivation A_UnsafeSrc -->|unsafe| A_Engine A_Derivation -->|"safe(attr)/finalized"| A_Engine A_Engine -->|engine API| A_EL end subgraph B ["Light CL Mode"] direction TB subgraph B0 ["Rollup Node Service"] direction TB B_DerivationX[["DerivationActor<br/>(NEW: Poll external syncStatus)"]] B_Engine["EngineActor"] B_UnsafeSrc["Unsafe Source<br/>(P2P gossip / Sequencer)"] end B_L1[(L1 RPC)] B_Ext[(External CL RPC<br/>optimism_syncStatus)] B_EL[(Execution Layer)] %% Connections B_Ext -->|safe/finalized/currentL1| B_DerivationX B_L1 -->|canonical L1 check| B_DerivationX B_DerivationX -->|"safe(blockInfo)/finalized (validated)"| B_Engine B_UnsafeSrc -->|unsafe| B_Engine %% Visual indicator for disabled actor B_Engine -->|engine API| B_EL end ``` ### Testing #### Acceptance Tests Running guidelines detailed at #3199: - [x] `TestFollowL2_Safe_Finalized_CurrentL1` - [x] `TestFollowL2_WithoutCLP2P` - [ ] `TestFollowL2_ReorgRecovery` (blocked by [kona: Check L2 reorg due to L1 reorg](ethereum-optimism/optimism#18676)) Injecting CurrentL1 is blocked by [kona: Revise SyncStatus CurrentL1 Selection](ethereum-optimism/optimism#18673) #### Local Sync Tests Validated with syncing op-sepolia between kona-node light CL <> sync tester, successfully finishing the initial EL sync and progress every safety levels reaching each tip. #### Devnet Tests Commit 0b36fdd is baked to `us-docker.pkg.dev/oplabs-tools-artifacts/dev-images/kona-node:0b36fdd-light-cl` and deployed at `changwan-0` devnet: - As a verifier: `changwan-0-kona-geth-f-rpc-3` [[grafana]](https://optimistic.grafana.net/d/nUSlc3d4k/bedrock-networks?orgId=1&refresh=30s&from=now-1h&to=now&timezone=browser&var-network=changwan-0&var-node=$__all&var-layer=$__all&var-safety=l2_finalized&var-cluster=$__all&var-konaNodes=changwan-0-kona-geth-f-rpc-3) - As a sequencer: `changwan-0-kona-geth-f-sequencer-3` [[grafana]](https://optimistic.grafana.net/d/nUSlc3d4k/bedrock-networks?orgId=1&refresh=30s&from=now-1h&to=now&timezone=browser&var-network=changwan-0&var-node=$__all&var-layer=$__all&var-safety=l2_finalized&var-cluster=$__all&var-konaNodes=changwan-0-kona-geth-f-sequencer-3) - As a standby | leader Noticed all {unsafe, safe, finalized} head progression as a kona node light CL.
~Base branch: op-rs/kona#3242 ~May conflict with op-rs/kona#3229, op-rs/kona#3253 Implement the kona light CL([Design doc](https://github.com/ethereum-optimism/design-docs/blob/main/protocol/kona-node-light-cl.md)): - [DerivationActor - Target Determination](https://github.com/ethereum-optimism/design-docs/blob/main/protocol/kona-node-light-cl.md#derivationactor---target-determination) - [EngineActor - Fork Choice Update](https://github.com/ethereum-optimism/design-docs/blob/main/protocol/kona-node-light-cl.md#engineactor---fork-choice-update) ```mermaid flowchart TB subgraph A ["Normal Mode (Derivation)"] direction TB subgraph A0 ["Rollup Node Service"] direction TB A_Derivation["DerivationActor<br/>(L1->L2 derivation)"] A_Engine["EngineActor"] A_UnsafeSrc["Unsafe Source<br/>(P2P gossip / Sequencer)"] end A_L1[(L1 RPC)] A_EL[(Execution Layer)] A_L1 -->|L1 info| A_Derivation A_UnsafeSrc -->|unsafe| A_Engine A_Derivation -->|"safe(attr)/finalized"| A_Engine A_Engine -->|engine API| A_EL end subgraph B ["Light CL Mode"] direction TB subgraph B0 ["Rollup Node Service"] direction TB B_DerivationX[["DerivationActor<br/>(NEW: Poll external syncStatus)"]] B_Engine["EngineActor"] B_UnsafeSrc["Unsafe Source<br/>(P2P gossip / Sequencer)"] end B_L1[(L1 RPC)] B_Ext[(External CL RPC<br/>optimism_syncStatus)] B_EL[(Execution Layer)] %% Connections B_Ext -->|safe/finalized/currentL1| B_DerivationX B_L1 -->|canonical L1 check| B_DerivationX B_DerivationX -->|"safe(blockInfo)/finalized (validated)"| B_Engine B_UnsafeSrc -->|unsafe| B_Engine %% Visual indicator for disabled actor B_Engine -->|engine API| B_EL end ``` ### Testing #### Acceptance Tests Running guidelines detailed at op-rs/kona#3199: - [x] `TestFollowL2_Safe_Finalized_CurrentL1` - [x] `TestFollowL2_WithoutCLP2P` - [ ] `TestFollowL2_ReorgRecovery` (blocked by [kona: Check L2 reorg due to L1 reorg](#18676)) Injecting CurrentL1 is blocked by [kona: Revise SyncStatus CurrentL1 Selection](#18673) #### Local Sync Tests Validated with syncing op-sepolia between kona-node light CL <> sync tester, successfully finishing the initial EL sync and progress every safety levels reaching each tip. #### Devnet Tests Commit op-rs/kona@0b36fdd is baked to `us-docker.pkg.dev/oplabs-tools-artifacts/dev-images/kona-node:0b36fdd-light-cl` and deployed at `changwan-0` devnet: - As a verifier: `changwan-0-kona-geth-f-rpc-3` [[grafana]](https://optimistic.grafana.net/d/nUSlc3d4k/bedrock-networks?orgId=1&refresh=30s&from=now-1h&to=now&timezone=browser&var-network=changwan-0&var-node=$__all&var-layer=$__all&var-safety=l2_finalized&var-cluster=$__all&var-konaNodes=changwan-0-kona-geth-f-rpc-3) - As a sequencer: `changwan-0-kona-geth-f-sequencer-3` [[grafana]](https://optimistic.grafana.net/d/nUSlc3d4k/bedrock-networks?orgId=1&refresh=30s&from=now-1h&to=now&timezone=browser&var-network=changwan-0&var-node=$__all&var-layer=$__all&var-safety=l2_finalized&var-cluster=$__all&var-konaNodes=changwan-0-kona-geth-f-sequencer-3) - As a standby | leader Noticed all {unsafe, safe, finalized} head progression as a kona node light CL.
Light CL
Public Design doc at ethereum-optimism/design-docs#359. The PR description may overlap, but more implementation oriented.
Summary
This PR implements "follow mode" for the Kona rollup node, enabling it to sync safe/finalized L2 heads + CurrentL1 view from an external L2 Consensus Layer (CL) node instead of deriving them from L1 data. This provides a "light CL" mode that trusts an external source while maintaining local EL execution.
CLI Flag:
--l2.follow.source <URL>or Env:KONA_NODE_L2_FOLLOW_SOURCE=<URL>Architecture Changes
Component Flow in Follow Mode
When follow mode is enabled (
--l2.follow.sourceprovided):optimism_syncStatusRPC (every 2 seconds)FollowStatusto Engine ActorKey Architectural Points
New Follow Actor: Added as a new actor in the Rollup Node Service
Bidirectional Communication with Engine Actor:
el_sync_completesignal)FollowStatuswith external safe/finalized headsCompatible with Sequencer Mode: A sequencer can use follow mode for safe/finalized heads while still producing unsafe blocks
Implementation Details
Core Components
1. FollowActor (
crates/node/service/src/actors/follow/actor.rs)optimism_syncStatusRPCFollowStatusto Engine Actor2. FollowTask (
crates/node/engine/src/task_queue/tasks/follow/task.rs)SynchronizeTaskto apply safe/finalized head updates3. 5-Case Follow Source Algorithm (
crates/node/service/src/actors/engine/actor.rs:790-880)4. Channel Wiring (
crates/node/service/src/service/node.rs:154-174)el_sync_complete_rxto FollowActorComponents in Follow Mode
L2Finalizer Behavior
The L2Finalizer component becomes inactive in follow mode but remains safely enabled:
enqueue_for_finalization()never called)Testing
Commit a50256c is baked to dev image
us-docker.pkg.dev/oplabs-tools-artifacts/dev-images/kona-node:a50256c-light-cland running as aNote: rebased so relevant commit is not included at this PR's commit trail.
Acceptance Tests:
TestFollowL2_Safe_Finalized_CurrentL1TestFollowL2_WithoutCLP2PTestFollowL2_ReorgRecoveryFor running the existing monorepo acceptance tests, build kona-node at place it at
/tmp/kona-node. Then clone monorepo(Patch https://github.com/ethereum-optimism/optimism/tree/pcw109550/temp-kona-light-cl-acceptance-test-monkey-patch) andcd op-acceptance-tests/tests/sync/follow_l2. RunDEVSTACK_L2CL_KIND=kona RUST_BINARY_PATH_KONA_NODE=/tmp/kona-node go test ./... -run ^[testname]$ -count=1 -vTestFollowL2_ReorgRecoveryfails:For op-node:
For kona-node(which makes
TestFollowL2_ReorgRecoverynot pass)ResetError::ReorgDetected.We may explicitly detect L1 reorg at follow actor and trigger a explicit reset resulting in reorg. Need to find out why there is this difference at kona node, and implement a fix at follow up PR.
All the happy case acceptance tests are passing.
Known Limitations
current_l1not mirrored: Externalcurrent_l1not propagated to local statecurrent_l1tracking (documented incurrent_l1.md)L2 Reorg not working: Light CL kona sequencer does not trigger L2 reorg when L1 reorg. Causing
TestFollowL2_ReorgRecoveryacceptance test to fail.