Skip to content

feat: Kona Rollup Node Light CL#359

Merged
pcw109550 merged 6 commits intomainfrom
pcw109550/kona-node-light-cl
Jan 11, 2026
Merged

feat: Kona Rollup Node Light CL#359
pcw109550 merged 6 commits intomainfrom
pcw109550/kona-node-light-cl

Conversation

@pcw109550
Copy link
Member

@pcw109550 pcw109550 commented Jan 6, 2026

Summary

This PR adds a public design document describing Light CL as implemented in kona-node.

Light CL is an OP Stack consensus client mode that disables local derivation and instead follows externally derived consensus data (safe, finalized, currentL1) via the optimism_syncStatus RPC, while continuing to execute the local EL and maintain unsafe chain progression.

A reviewed internal Light CL design doc already exists and was used to implement the feature in op-node (Go), which has since been merged (PR1, PR2). After that work, I independently implemented Light CL(PR) for kona-node (Rust), resulting in a non-trivial architectural change (~1000 LOC).

This PR documents that kona-node implementation.

Motivation

The kona-node Light CL implementation introduces a new operational mode with clear architectural consequences:

  • Derivation is fully disabled
  • Authority for safe / finalized / currentL1 moves to an external CL
  • Unsafe progression and engine synchronization semantics must be preserved
  • New actor and engine task boundaries are introduced

While the code is functional(acceptance tests passing / deployed to devnet), the behavior may difficult to reason about without a written design, especially for maintainers who did not author the change. Relying on code alone would make invariants, trust assumptions, and failure modes implicit and easy to misunderstand.

This design doc exists to:

  • Make Light CL semantics and invariants explicit
  • Explain how follow mode fits into kona-node’s actor-based architecture
  • Clarify how external consensus data is validated and applied
  • Serve as a durable reference for future maintenance and review

Scope

  • The document is descriptive of the kona-node implementation
  • It is normative with respect to Light CL behavior and invariants
  • It does not propose protocol changes or re-specify derivation

The goal is to record and explain an architectural mode that has already shipped in op-node, in a way that is reviewable, maintainable, and consistent with the rest of the kona-node design.

Copy link

@einar-oplabs einar-oplabs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good and thorough description of your design 👍 .

Copy link
Contributor

@geoknee geoknee left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👏

@theochap
Copy link
Member

theochap commented Jan 7, 2026

Thanks for taking care of the design doc @pcw109550!
A few general comments:

  • I generally agree with @op-will that the standard derivation actor is quite similar to the new Follow actor and we should think of a way to merge those two actors into a single entity, or having two separate structs implementing the same interface (as suggested feat: Kona Rollup Node Light CL #359 (comment)).
  • Taking a step back, it seem that the light CL essentially tries to move the derivation actor logic to a separate process (and binary) and handle communications with it through RPC instead of in-memory channels.
    • In that case, we could even take the ideas proposed above even further and just implement EngineDerivationClient and DerivationEngineClient as an RPC client instead of using in-memory channels.
    • This would provide for free an implementation of the lightCL and the derivation binary. The change should be minimal and easier to maintain.
  • It seems this document doesn't specify the implementation of the derivation component that drives the safe chain (whatever is at the other end of --l2.follow.source=[L2_CL_RPC]). It seems that specifying this component is at least as important (if not more) as specifying the light CL. Specifically, it seems we are handwaving some non-trivial derivation details:
    • Like the finalization logic (see feat: Kona Rollup Node Light CL #359 (comment)) that will have to be handled by the derivation process by interacting with the engine feat: Kona Rollup Node Light CL #359 (comment). IMO the external derivation actor should handle the finalization logic and the lightCL should not care about it (which I believe is what this document is suggesting) - if that's the case, we should modify the current implementation of the derivation/engine actors to ensure finalization happens outside of the engine.
    • Also, the relationship between the external derivation actor and the lightCL is bidirectional: the lightCL will send updates to the external derivation actor at least indirectly (by sending unsafe head FCU to the EL). Have we considered what interactions we need to reproduce through RPC and if those may cause race conditions with the current way the derivation actor is wired in?

@pcw109550
Copy link
Member Author

Also, the relationship between the external derivation actor and the lightCL is bidirectional: the lightCL will send updates to the external derivation actor at least indirectly (by sending unsafe head FCU to the EL). Have we considered what interactions we need to reproduce through RPC and if those may cause race conditions with the current way the derivation actor is wired in?

Thanks for raising this. I agree that at the network level there is a form of bidirectional interaction in sequencer deployments, especially when a Light CL sequencer follows a normal (non-light) verifier.

At a high level, the interaction looks like this: the Light CL sequencer produces and gossips unsafe blocks, which the verifier may consume as candidates, while the verifier publishes safe / finalized / currentL1 that the Light CL sequencer follows. This creates a network-level feedback relationship between the two nodes.

flowchart LR
    subgraph SEQ ["Sequencer (Light CL)"]
        direction TB
        SeqUnsafe["Unsafe blocks"]
        SeqSafe["Safe / Finalized view"]
    end

    subgraph VER ["Verifier (Normal CL)"]
        direction TB
        VerUnsafe["Unsafe blocks"]
        VerSafe["Safe / Finalized view<br/>(L1-anchored)"]
    end

    %% Proposal path
    SeqUnsafe -->|gossip proposal| VerUnsafe

    %% Confirmation path
    VerSafe -->|optimism_syncStatus| SeqSafe
Loading

However, despite this bidirectional network reality, the authority hierarchy remains strictly one-directional, and this is the key reason it does not introduce race conditions or circular control dependencies.

We can reason about this more precisely as follows:

  1. Each CL maintains two logical chains: a safe chain and an unsafe chain.
  2. The unsafe chain is a candidate extension of the safe chain. Unsafe blocks may extend ahead of safe or temporarily diverge, but unsafe state is always reconciled back to safe when safe advances.
  3. The safe chain is authoritative over the unsafe chain. Safe updates may consolidate or reorg unsafe state; unsafe state never drives safe advancement.
  4. Light CL introduces a dependency only between safe chains. Specifically, the Light CL sequencer's safe chain depends on the verifier's safe chain; unsafe chains remain local proposal spaces.

Concretely, using explicit dependencies (A -> B means B depends on A):

  • L1 -> Ver-Safe (derivation)
  • Seq-Unsafe -> Ver-Unsafe (gossip only)
  • Ver-Safe -> Seq-Safe (follow: light CL)
  • L1 -> Seq-Unsafe (L1 origin)
  • Ver-Safe -> Ver-Unsafe (consolidation)
  • Seq-Safe -> Seq-Unsafe (consolidation)
flowchart LR
 %% =========================================================
 %% Panel 1: Normal Sequencer <> Normal Verifier
 %% =========================================================
 subgraph P1 ["Normal Sequencer"]
   direction LR


   N_L1["L1 (Canonical)"]


   subgraph N_SEQ["Sequencer (Normal CL)"]
     N_SeqSafe["Seq-Safe"] -->|consolidate| N_SeqUnsafe["Seq-Unsafe"]
   end


   subgraph N_VER["Verifier (Normal CL)"]
     N_VerSafe["Ver-Safe"] -->|consolidate| N_VerUnsafe["Ver-Unsafe"]
   end


   %% Normal CL anchors
   N_L1 -->|derive| N_VerSafe
   N_L1 -->|derive| N_SeqSafe
  
   %% Requested link: Seq-Unsafe still needs L1 origin
   N_L1 -->|origin| N_SeqUnsafe


   %% Network flow
   N_SeqUnsafe -->|gossip| N_VerUnsafe
 end


 %% =========================================================
 %% Panel 2: Light CL Sequencer <> Normal Verifier
 %% =========================================================
 subgraph P2 ["Light CL Sequencer"]
   direction LR


   L_L1["L1 (Canonical)"]


   subgraph L_SEQ["Sequencer (Light CL)"]
     L_SeqSafe["Seq-Safe"] -->|consolidate| L_SeqUnsafe["Seq-Unsafe"]
   end


   subgraph L_VER["Verifier (Normal CL)"]
     L_VerSafe["Ver-Safe"] -->|consolidate| L_VerUnsafe["Ver-Unsafe"]
   end


   %% Verifier anchors
   L_L1 -->|derive| L_VerSafe


   %% Light CL differences
   L_VerSafe -.->|**follow**| L_SeqSafe
   L_L1 -->|origin| L_SeqUnsafe


   %% Network flow
   L_SeqUnsafe -->|gossip| L_VerUnsafe
 end


 %% Styling
 style P1 fill:#f9f9f9,stroke:#333,stroke-dasharray: 5 5
 style P2 fill:#f0f7ff,stroke:#005fb8,stroke-width:2px
Loading

The important point is that the only cross-node authoritative dependency is Ver-Safe -> Seq-Safe. The reverse direction (Seq-Unsafe -> Ver-Unsafe) is strictly non-authoritative and exists purely as a proposal mechanism. The verifier's safe chain remains derived from and anchored to canonical L1, not dependent on the sequencer's safe view.

As a result, although there is a bidirectional network interaction, there is no circular authority and no race condition in the correctness sense. Timing skew between locally advancing unsafe state and externally advancing safe state is expected in a distributed setting, and the follow algorithm is explicitly designed to reconcile these updates safely (consolidation, reorg, block-not-found cases, etc.).

In short, this is a proposal / confirmation loop, not a circular control dependency, and authority continues to flow in a single direction: L1 -> Ver-Safe -> Seq-Safe.

@pcw109550
Copy link
Member Author

It seems this document doesn't specify the implementation of the derivation component that drives the safe chain (whatever is at the other end of --l2.follow.source=[L2_CL_RPC]). It seems that specifying this component is at least as important (if not more) as specifying the light CL.

Good point. The doc currently focuses on the follower behavior, but it should state the contract/assumptions for --l2.follow.source. I will clarify that this is expected to be an OP Stack rollup node CL RPC implementing optimism_syncStatus and providing coherent {safe, finalized, currentL1} outputs (trusted derivation outcome, with L1 canonicality checks as a safety gate). I will also add a note that while following another Light CL is technically possible, operators must avoid cycles in the safe-follow graph to prevent circular authority. Circular dependencies in the safe-chain follow relationship (i.e., cycles in {safe, finalized, currentL1} authority) can lead to undefined behavior.

@pcw109550
Copy link
Member Author

My brain dump about the similarity between the FollowActor and the DerivationActor.

  1. DerivationActor signals {safe, derivedL1} info to the EngineActor, and EngineActor takes care of finalization using the locally observed L1 finalization info. That said, the finalization logic is embedded in the EngineActor.
  2. FollowActor signals {safe, finalized} info to the EngineActor, and injects.

So the key difference is who owns finalization:

  • Normal mode: "finalization is local (engine-driven)"
  • Follow mode: "finalization is delegated (source-driven)"

This matches the review point that the external derivation actor should own finalization and Light CL should not care about it:

Like the finalization logic (see #359 (comment)) that will have to be handled by the derivation process by interacting with the engine #359 (comment). IMO the external derivation actor should handle the finalization logic and the lightCL should not care about it (which I believe is what this document is suggesting)

So if we decide to view the FollowActor as a DerivationActor suggested at #359 (comment), then a clean first step would be to move finalization out of EngineActor entirely. i.e., if light CL's upstream is responsible for derivation+finalization, then normal mode arguably should align by ensuring finalization happens outside the engine as well.

if that's the case, we should modify the current implementation of the derivation/engine actors to ensure finalization happens outside of the engine.

Let's hypothetically introduce a FinalizationActor and "plug out" finalization from EngineActor.

Normal CL flow:

  1. DerivationActor implements trait EngineDerivationClient.
  2. FinalizationActor will receive L1->L2 block mappings (OpAttributesWithParent) from the DerivationActor.
  3. DerivationActor signals safe attributes to the EngineActor.
  4. FinalizationActor signals finalized blocks to the EngineActor.

Light CL flow:

  1. DelegatedDerivationActor implements trait EngineDerivationClient.
  2. FinalizationActor will receive safe L2 blocks (L2BlockInfo) from the DelegatedDerivationActor
  3. DelegatedDerivationActor signals safe blocks to the EngineActor.
    • ConsolidateTask must be patched to bypass safe attributes check when the update is only L2BlockInfo.
  4. FinalizationActor signals finalized blocks to the EngineActor.

So, I think the "FollowActor ~= DerivationActor" framing is plausible, but it only becomes clean if we also standardize where finalization lives. Otherwise we end up with two different finalization ownership models depending on mode.

I am open to adapting the trait pattern / interface shape here, especially if there is a minimal way to provide the necessary L1 -> L2 linkage for followed safe blocks without reintroducing derivation coupling.

@pcw109550
Copy link
Member Author

Taking a step back, it seem that the light CL essentially tries to move the derivation actor logic to a separate process (and binary) and handle communications with it through RPC instead of in-memory channels.
In that case, we could even take the ideas proposed above even further and just implement EngineDerivationClient and DerivationEngineClient as an RPC client instead of using in-memory channels.
This would provide for free an implementation of the lightCL and the derivation binary. The change should be minimal and easier to maintain.

I agree this line of thinking is consistent with the direction we’re already discussing (moving finalization out of EngineActor). Conceptually, treating derivation/finalization as an external service and formalizing the boundary via RPC is very clean.

The main distinction is that switching the existing in-process EngineDerivationClient wiring to RPC affects both normal CL and Light CL and would be a broader refactor than what this doc is trying to settle. For now, the focus is on clarifying what Light CL delegates and who owns finalization, without committing to how that boundary is implemented.

I do think this is a natural followup direction once the ownership model is clear, and I am open to moving toward an RPC-based boundary in a subsequent step.

@pcw109550
Copy link
Member Author

pcw109550 commented Jan 8, 2026

@op-will, @theochap and I had a synchronous design review and reached consensus:

[Short term for landing the light CL]

  • Remove the FollowActor / FollowTask
  • Use the external finalized info, due to lack of L1 : L2 mapping when derivation disabled
  • Implement polling the external CL inside the DerivationActor when light CL is enabled
  • Send L2BlockInfo to the EngineActor instead of safe attributes when light CL is enabled
    • DerivationActor -> EngineActor to support either L2BlockInfo | OpAttributesWithParent but we'll use the same call whether full/light client.
  • Move the finalization logic from the EngineActor to the Derivation Actor.
  • Re-use ConsolidateTask / FinalizeTask. ConsolidateTask must be tweaked because it will use L2BlockInfo instead of safe attributes. Reorg will be handled by directly using the new hash using the FCU

[Long term goal]

  • All the consolidation / finalization logic may be moved out from the EngineActor to the DerivationActor
  • Modularize the actors to make them called by RPC, instead of in-memory channels. ]

Will update the design doc with respect to above bullet point.

@pcw109550 pcw109550 requested review from op-will and theochap January 9, 2026 14:29

- Tracking system configuration updates
- Tracking L1 origin for sequencer operation
While it is technically possible for a Light CL to follow another Light CL, operators must avoid cycles in the light CL dependency graph.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why would a light CL follow another one? Aren't we going to run one lightCL per sequencer?
It would make more sense to me that two separate light CLs follow one normal validator node (like inside a conductor set)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Regarding that, we could simply remove this undefined behavior by adding an RPC endpoint to kona nodes that specify whether a node is a "normal" or a "light" CL

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is already specified as an assumption and doesn't need to be clarified.

A critical assumption of a Light CL is that it delegates derivation to a node that has accurate and up-to-date derivation information. This is just another way of stating that assumption and can be removed.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Making a Light CL follow another Light CL is defined behavior that enables a "Fan-out" scaling pattern.

For example, we can expose a light CL endpoint to a public, which the general public will run their own light CL using the endpoint.

The design is agnostic: if a node provides a valid DerivationState, it is a valid source. Prohibiting this would limit this kind of scalability and deployment flexibility.

Copy link
Member

@theochap theochap left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me. Thanks for taking the time to update the document

Copy link
Contributor

@op-will op-will left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good!

I left a number of comments around verbiage and clarity that you may or may not choose to adopt, but the design looks great!

Comment on lines 88 to 96
#### Cycles and Authority Graph Constraints

While it is technically possible for a Light CL to follow another Light CL, operators must avoid cycles in the light CL dependency graph.

Circular dependencies in `{safeL2, finalizedL2, currentL1}` authority can lead to undefined behavior and circular trust assumptions.

Operational guidance:
- The light CL dependency graph must be acyclic
- At least one node in the graph must terminate in a deriving rollup node that directly consumes L1 data
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this section need to exist?

If we change our semantics from follow to delegate derivation to, this concern becomes obvious: a CL should not delegate derivation to another CL that is also delegating derivation since it would be delegating a responsibility that is potentially unfulfilled.

Even if an operator chooses to do this, we already state the assumption that the delegate serves accurate derivation state.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe this section should remain because it defines the Terminal Authority required for safe chain progression.

While "delegation" implies trust, it doesn't explicitly guarantee termination.

@pcw109550 pcw109550 merged commit 8f49778 into main Jan 11, 2026
5 checks passed
@pcw109550 pcw109550 deleted the pcw109550/kona-node-light-cl branch January 11, 2026 10:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants

Comments