Skip to content

op-node: Light CL: Follow Source#18365

Closed
pcw109550 wants to merge 30 commits intodevelopfrom
pcw109550/light-cl-follow-source
Closed

op-node: Light CL: Follow Source#18365
pcw109550 wants to merge 30 commits intodevelopfrom
pcw109550/light-cl-follow-source

Conversation

@pcw109550
Copy link
Member

@pcw109550 pcw109550 commented Nov 24, 2025

Description

This PR introduces a new upstream sync loop (Driver → Engine Controller) that (1) detects L1 reorgs in unsafe-only mode and resets the node appropriately, and (2) optionally mirrors external L2 safe/finalized state when --l2.follow.source is enabled. It replaces the derivation-pipeline-based reorg logic for unsafe-only nodes and ensures L2 always tracks L1 and external safe/finalized sources correctly.

This PR builds on #18290 and implements two features:

  1. Task A – Follow an external L2 source for safe and finalized blocks
    Enabled with --l2.follow.source (requires --l2.unsafe-only).

  2. Task B – Trigger an L2 reset when an L1 reorg occurs
    Enabled with --l2.unsafe-only.

Note: --l2.follow.source can only be used when --l2.unsafe-only is enabled.

A new control-flow path is added:

Driver (op-node/rollup/driver/driver.go) → Engine Controller (op-node/rollup/engine/engine_controller.go)

Driver changes

We need a periodical ticker that is responsible to perform upper two tasks. Similar to the altSyncTicker which periodically tries to close the unsafe gap, I added the upstreamSyncTicker to do perform tasks. op-node/rollup/driver/driver.go. upstreamSyncTicker is only enabled when --l2.unsafe-only is enabled.

The ticker calls followUpstream() method periodically. The method implements Task B and calls Engine Controller to do Task A, by below steps

  1. We do not interfere initial EL sync. Let the CL and EL finish the initial EL Sync.
  2. We first check the L1 reorg by inspecting the unsafe head's L1Origin exists/has valid hash. If not, trigger a reset to fetch valid heads.
  3. (Optional) After the L1 sanity check, if --l2.follow.source is enabled the followUpstream() fetches external L2 info (esafe, efinalized) and tries to apply to the Engine Controller by calling s.SyncDeriver.Engine.FollowSource(eSafe, eFinalized)

Engine Controller Changes

FollowSource(eSafe, eFinalized) is implemented at Engine Controller, implementing Task A. Engine Controller has the responsibility to update its internal state based on the injected external state. The Engine Controller performs mirroring by below steps:

  1. First, we check that the external safe > local unsafe. If it is, we update the local unsafe then FCU it to the EL. By this, we may advance the safe and unsafe head together. The underlying EL may was still performing to the previous local unsafe, and we bump to the EL sync target to external safe. This covers the situation that the CLP2P is down, and op-node is advancing its unsafe head with safe head together. No harm since we must prioritize safe head.
  2. If not, external safe <= local unsafe. In this case, we query the local EL using external safe's number. If the EL Sync is complete, the external safe must be queried, because external safe <= local unsafe. In other words, if the EL sync is complete, op-node's unsafe label and the actual EL's unsafe head must match. We queried the block number before the unsafe head (external safe's block number) so it must be queried.
    • If the external safe block number is not queried, this means the EL is still EL Syncing to the unsafe head. Do not interrupt by NOT bumping the unsafe head of the op-node. In most cases, this EL syncing will complete shortly, because we only apply following source after the initial EL sync is complete (the long EL Sync that must not be interrupted).
    • If the external safe is queried, this means the EL finished EL Syncing, and CL and EL is in sync.
  3. We fetched block from local EL, using the external safe's block number. Now we can compare the external and the local blockref. If those match, consolidate. If not, trigger a reorg, and make local safe and unsafe equal to external safe.

Tests

Four tests are added for testing the --l2.follow.source:

  • TestFollowL2_ReorgRecovery: checks follow source seq / ver reorgs when L1 reorgs
  • TestFollowL2_SafeAndFinalized: happy case, external safe and finalized is mirrored when local unsafe head is ahead of external safe and finalized
  • TestFollowL2_WithoutCLP2P: CLP2P down, so the data source of safe / finalized is only the follow source. Unsafe and safe will advance together.
  • TestSyncTesterFollowL2ReachTips: Using the sync tester, test that the op-node can mirror safe / finalized of op-sepolia.

One test was added to test the --l2.unsafe-only:

  • TestUnsafeOnly_ReorgRecovery: checks unsafe only seq / ver reorgs when L1 reorgs

Additional context

Before this PR, when --l2.unsafe-only is enabled, the node did not L2 reorg when L1 reorg. We now always track the L1 reorg by the newly added upstreamSyncTicker and do a proper L2 reorg via reset.

Metadata

@wearedood

This comment was marked as outdated.

@wearedood

This comment was marked as resolved.

@pcw109550 pcw109550 force-pushed the pcw109550/light-cl-unsafe-only branch from 10f056f to 59214ce Compare November 25, 2025 13:19
@pcw109550 pcw109550 force-pushed the pcw109550/light-cl-follow-source branch from 54fbe8d to 12835bb Compare November 25, 2025 13:24
@pcw109550 pcw109550 marked this pull request as ready for review November 30, 2025 14:44
@pcw109550 pcw109550 requested review from a team as code owners November 30, 2025 14:44
@pcw109550 pcw109550 requested review from karlfloersch and sebastianst and removed request for a team November 30, 2025 14:44
Base automatically changed from pcw109550/light-cl-unsafe-only to develop December 1, 2025 18:27
Copy link
Contributor

@karlfloersch karlfloersch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Didn't have too much but some thoughts

  • I didn't think about the fact that we can test kona-node light CL with the same sync tester tests!
  • Tests seem good, testing reorgs, p2p enabled and disabled. Tried to but couldn't think of extra cases
  • Great that we can reuse the sync tester ext configs
  • The reorg detection logic is MUCH cleaner now. Do you feel good about it?

Generally looks pretty dang good! I'll throw an approval even tho I think a second pair of eyes is probably good. But the logic looked pretty solid to me

@pcw109550 pcw109550 removed the request for review from sebastianst December 2, 2025 11:15
@pcw109550 pcw109550 force-pushed the pcw109550/light-cl-follow-source branch from fe7a184 to e91771e Compare December 2, 2025 11:18
Copy link
Contributor

@axelKingsley axelKingsley left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Testing is very comprehensive and for the most part I like the way the solution looks!

The main thing to solve is our ongoing reliance on an L1 source. Removing the L1 source from the node altogether is a huge payoff for a lite-node, and even though they're not deriving with it, they are still fully reliant on a highly available L1 source.

Rather, they should be able to utilize their L2 source to supply all derivation aspects of the Safe chain, which would fully eliminate the L1 connection.

How easy do you think it would be to eliminate the L1 connection through this feature?

EDIT: I see that this PR introduces a CL based approach which eliminates the L1: #18500

But @pcw109550 what is the reason for staring with EL based following only? Seems like a feature we would not want to use since it still requires L1.

Actually -- is there a reason we don't just say that sync_status is the required API to do safe following? This would enable safe following on basically all CLs by default.

presets.WithReqRespSyncDisabled(),
presets.WithNoDiscovery(),
presets.WithCompatibleTypes(compat.SysGo),
presets.WithUnsafeOnly(),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does WithUnsafeOnly only modify one of the two Verifiers, then?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So WithUnsafeOnly modifies every op-node CLs to use unsafe only.

func WithUnsafeOnly() stack.CommonOption {
	return stack.MakeCommon(
		sysgo.WithGlobalL2CLOption(sysgo.L2CLOptionFn(
			func(_ devtest.P, id stack.L2CLNodeID, cfg *sysgo.L2CLConfig) {
				cfg.SequencerUnsafeOnly = true
				cfg.VerifierUnsafeOnly = true
			})))
}

This is a global CL option and applies to every CL. However if every CL does not do derivation, there is no safe source. Therefore at least we need a single CL that does actual derivation.

To make this preset, I implemented DefaultSingleChainTwoVerifiersFollowL2System at

func DefaultSingleChainTwoVerifiersFollowL2System(dest *DefaultSingleChainTwoVerifiersSystemIDs) stack.Option[*Orchestrator] {
with the first verifier as
// Specific options are applied after global options
// this means unsafeOnly is always disabled for the first verifier
opt.Add(WithL2CLNode(ids.L2CLB, ids.L1CL, ids.L1EL, ids.L2ELB, L2CLVerifierDisableUnsafeOnly()))

disabling unsafe only to perform derivation. This works because global option (WithUnsafeOnly) applies first and the node specific option(L2CLVerifierDisableUnsafeOnly()) applies after.

Comment on lines +65 to +75
// Make sure L1 reorged
sys.L1EL.WaitForBlockNumber(l1BlockBeforeReorg.Number)
l1BlockAfterReorg := sys.L1EL.BlockRefByNumber(l1BlockBeforeReorg.Number)
logger.Info("Triggered L1 reorg", "l1", l1BlockAfterReorg)
require.NotEqual(l1BlockAfterReorg.Hash, l1BlockBeforeReorg.Hash)

// Need to poll until the L2CL detects L1 Reorg and trigger L2 Reorg
// What happens:
// L2CL detects L1 Reorg and reset the pipeline. op-node example logs: "reset: detected L1 reorg"
// L2ELB detects L2 Reorg and reorgs. op-geth example logs: "Chain reorg detected"
sys.L2ELB.ReorgTriggered(l2BlockBeforeReorg, 30)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we verify that L2ELB does not process the reorg on its own somehow?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The L2ELB never progresses or reorgs independently; it is always driven by the L2CLB via the Engine API. So the L2ELB cannot reorg on its own.

ReorgTriggered only checks that the canonical block at the divergence height has changed (same parent, different hash). This can only happen if the CL has processed the L1 reorg and sent a forkchoice update with the new head to the EL.

Comment on lines +298 to +299
// In this mode, the node does not derive from L1; instead, it uses L1 as a mandatory
// upstream anchor for its unsafe head, and may optionally import safe/finalized state
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a lot of infrastructure value to not having these lite nodes use L1 connections at all. I need to read more closely to understand, but this comment leads me to think the L1 connection remains required?

I suggest we don't need a mandatory anchor from the L1, and can instead use the Safe Source for the L1 as well.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Responded at #18365 (comment)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comment as earlier - we should be able to follow the L1 source through the L2 Follow Source.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Responded at #18365 (comment)

Comment on lines 25 to 27
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Consider that the naming flipped between the individual and composite interface. I suggest UpstreamFollowSource for the composite.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed at eab29d5

@pcw109550
Copy link
Member Author

pcw109550 commented Dec 5, 2025

Let me share my thought process of the relation between the L1 source and the light CL feature.

So there are two features: --l2.unsafe-only and --l2.follow.source. Both must detect reorg, and follow the eventual canonical head.

As I mentioned at the PR description, L1 Reorg detection is embedded at the derivation pipeline:

func (l1t *L1Traversal) AdvanceL1Block(ctx context.Context) error {
origin := l1t.block
nextL1Origin, err := l1t.l1Blocks.L1BlockRefByNumber(ctx, origin.Number+1)
if errors.Is(err, ethereum.NotFound) {
l1t.log.Debug("can't find next L1 block info (yet)", "number", origin.Number+1, "origin", origin)
return io.EOF
} else if err != nil {
return NewTemporaryError(fmt.Errorf("failed to find L1 block info by number, at origin %s next %d: %w", origin, origin.Number+1, err))
}
if l1t.block.Hash != nextL1Origin.ParentHash {
return NewResetError(fmt.Errorf("detected L1 reorg from %s to %s with conflicting parent %s", l1t.block, nextL1Origin, nextL1Origin.ParentID()))
}

So this means if we turn off derivation (--l2.unsafe-only), we must trigger a manual reset to trigger L2 reorg due to L1 reorg. This applies both to verifier and sequencer.

Case 1: When derivation disabled && follow source disabled.

  • When L1 reorg is triggered, but there is no L1 source, the sequencer and the verifier cannot detect the L1 reorg(no follow source). This is why we need the L1 source for correctness.

There is a lot of infrastructure value to not having these lite nodes use L1 connections at all. I need to read more closely to understand, but this comment leads me to think the L1 connection remains required?

So my take is that L1 connection remains required for the light CL for Case 1.

Case 2: When derivation disabled && follow source enabled.

  • Because we already have the L1 source, not from follow source for Case 1, we use the same reorg detection, relying on the L1 source, not the follow source.
  • We technically can only rely on the follow source if the source is the CL endpoint: syncStatus. It is because the syncStatus contains HeadL1, and we can detect the reorg, examining using the algorithm https://github.com/ethereum-optimism/optimism/tree/develop/op-node#l1-reorg.
  • I chose to rely on the L1 source because
    • It always works whether follow source is disabled / follow source is EL / follow source is CL
    • We can reuse the reorging algorithm introduced at Case 1.

Food for thought: If we only allow the combination: [derivation disabled && follow source enabled && follow source is the CL endpoint: syncStatus], I agree that we can simplify.

@pcw109550
Copy link
Member Author

Because the follow up PR #18571 merges the unsafe only flag and the follow source flag, it makes sense to consolidate these two PRs. Closing this PR in favor of #18571, and making the originally stacked PR targeting the develop

@pcw109550 pcw109550 closed this Dec 12, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants

Comments