op-node/recover-mode: handle l1 origin close to tip gracefully by geoknee · Pull Request #18556 · ethereum-optimism/optimism

geoknee · 2025-12-09T14:44:30Z

Adds an action test showing how recover mode can work to recover a chain, with explanatory notes about how operators should not use span batches. This can be developed into a better runbook for recovering from a sequencing window expiry incident.

Run env.RunFaultProofProgram after computing l2SafeHead (l2SafeHead.Number.Uint64()/2) and replace the FromGenesis call with RunFaultProofProgram. Fix minor comment typos and wrap a long log line.

codecov · 2025-12-09T14:53:12Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 66.58%. Comparing base (e355cc4) to head (e1bfc22).
⚠️ Report is 5 commits behind head on develop.

❗ There is a different number of reports uploaded between BASE (e355cc4) and HEAD (e1bfc22). Click for more details.

HEAD has 8 uploads less than BASE

Flag BASE (e355cc4) HEAD (e1bfc22)

cannon-go-tests-64 3 1

contracts-bedrock-tests 6 0

Additional details and impacted files

@@             Coverage Diff             @@
##           develop   #18556      +/-   ##
===========================================
- Coverage    75.34%   66.58%   -8.77%     
===========================================
  Files          189       55     -134     
  Lines        11228     4031    -7197     
===========================================
- Hits          8460     2684    -5776     
+ Misses        2622     1203    -1419     
+ Partials       146      144       -2

Flag	Coverage Δ
cannon-go-tests-64	`66.58% <ø> (-0.82%)`	⬇️
contracts-bedrock-tests	`?`

Flags with carried forward coverage won't be shown. Click here to find out more.
see 139 files with indirect coverage changes

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

geoknee · 2025-12-09T15:01:51Z

op-node/rollup/sequencing/origin_selector.go

+		if errors.Is(err, ethereum.NotFound) {
+			// If the next origin is not found, it means we are at the end of the chain.
+			// In this case, we set the next origin to an empty block reference.
+			los.log.Error("next L1 origin not found, recover mode likely brought L1 origin up to the tip of the chain", "error", err)
+			los.nextOrigin = eth.L1BlockRef{}
+			return los.currentOrigin, los.nextOrigin, nil
+		} else if err != nil {


The action test added in this PR fails without this patch.

The failure is due to this error

l2_sequencer.go:111: Error Trace: /Users/georgeknee/code/ethereum-optimism/optimism/op-e2e/actions/helpers/l2_sequencer.go:111 /Users/georgeknee/code/ethereum-optimism/optimism/op-e2e/actions/proofs/sequence_window_expiry_test.go:90 /Users/georgeknee/code/ethereum-optimism/optimism/op-e2e/actions/proofs/helpers/matrix.go:47 Error: Received unexpected error: EOF Test: Test_ProgramAction_SequenceWindowExpired/HonestClaim-jovian Messages: failed to start block building

I'm not sure what this would mean outside of the action test environment, but I think it is probably related to what we saw with #18350.

I think this is just because the action test helper being called requires block building to succeed, and it doesn't because of the unavailability of the next l1 origin.

So it does not really tally with the sequencer drift violation, I haven't yet manage to repro that. But still, it is better to avoid the temporary errors and continue block building with the old origin when we get into this situation.

Modified the test to not get stuck on this, if it can't build a block we just wait until more l1 blocks are available. This means the test actually passes without the patch. But I added an acceptance test which still fails without the patch.

geoknee · 2025-12-09T15:02:22Z

op-e2e/actions/proofs/sequence_window_expiry_test.go

+	// Set recover mode on the sequencer:
+	env.Sequencer.ActSetRecoverMode(t, true)


I manually confirmed that the test fails without recover mode enabled.

geoknee · 2025-12-09T15:02:49Z

op-e2e/actions/proofs/sequence_window_expiry_test.go

+	// It seems more difficult (almost impossible) to recover from sequencing window expiry with span batches,
+	// since the singular batches within are invalidated _atomically_.
+	// That is to say, if the oldest batch in the span batch fails the sequencing window check
+	// (l1 origin + seq window < l1 inclusion)
+	// All following batches are invalidated / dropped as well.
+	// https://github.com/ethereum-optimism/optimism/blob/73339162d78a1ebf2daadab01736382eed6f4527/op-node/rollup/derive/batches.go#L96-L100
+	//
+	// If the same blocks were batched with singular batches, the validation rules are different
+	// https://github.com/ethereum-optimism/optimism/blob/73339162d78a1ebf2daadab01736382eed6f4527/op-node/rollup/derive/batches.go#L83-L86
+	// In the case of recover mode, the noTxPool=true condition means autoderviation actually fills
+	// the gap with identical blocks anyway, meaning the following batches are actually still valid.
+	bc := helpers.NewBatcherCfg()
+	bc.ForceSubmitSingularBatch = true


I manually confirmed the test fails without forcing singular batches.

Replace manual error assertions around FindL1Origin with requireL1OriginAt and remove the now-unused derive import.

sysgo only

Set default L1 block time to 12s to match action helper assumptions. Increase sequencer window size in the test to 50. Compute drift from UnsafeL2 headers (use UnsafeL2.L1Origin). Simplify L1 mining to always BatchMineAndSync and remove the extra numL1Blocks > 10 lag guard.

geoknee · 2025-12-11T10:34:30Z

op-e2e/actions/proofs/sequence_window_expiry_test.go

+	switch {
+	case drift == 0:
+		t.Fatal("drift is zero, this implies the unsafe l2 head is pinned to the l1 head")
+	case drift > int(tp.MaxSequencerDrift):


Note that tp.MaxSequencerDrift is probably a complete red herring her since this was changed to a protocol constant with Fjord https://specs.optimism.io/protocol/fjord/derivation.html?highlight=drift#constant-maximum-sequencer-drift.

geoknee · 2025-12-12T16:55:53Z

In favour of #18589

geoknee added 6 commits December 8, 2025 16:40

WIP

543df5b

wip

ea66bee

WIP

dbaa3ab

Treat NotFound next L1 origin as chain end

7333916

Use recover mode in sequence window expiry test

acc6c33

Invoke fault proof earlier and fix typos

e1bfc22

Run env.RunFaultProofProgram after computing l2SafeHead (l2SafeHead.Number.Uint64()/2) and replace the FromGenesis call with RunFaultProofProgram. Fix minor comment typos and wrap a long log line.

reduce diff

0aa6adc

geoknee marked this pull request as ready for review December 9, 2025 15:00

geoknee requested review from a team as code owners December 9, 2025 15:00

geoknee requested review from ajsutton, stevennevins and teddyknox December 9, 2025 15:00

geoknee commented Dec 9, 2025

View reviewed changes

Use requireL1OriginAt helper in test

9e95a84

Replace manual error assertions around FindL1Origin with requireL1OriginAt and remove the now-unused derive import.

geoknee marked this pull request as draft December 9, 2025 22:49

geoknee added 5 commits December 10, 2025 09:21

Introduce L2Sequencer.ActMaybeL2StartBlock

90abe3d

add TestRecoverModeWhenChainHealthy acceptance test

16f5198

sysgo only

Add SetSequencerRecoverMode and enable debug logs

dd9492e

restore stub

3b22b34

geoknee commented Dec 11, 2025

View reviewed changes

geoknee closed this Dec 12, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

op-node/recover-mode: handle l1 origin close to tip gracefully#18556

op-node/recover-mode: handle l1 origin close to tip gracefully#18556
geoknee wants to merge 13 commits intodevelopfrom
gk/recover-mode

geoknee commented Dec 9, 2025

Uh oh!

codecov bot commented Dec 9, 2025 •

edited

Loading

Uh oh!

geoknee Dec 9, 2025

Uh oh!

geoknee Dec 9, 2025

Uh oh!

geoknee Dec 9, 2025

Uh oh!

geoknee Dec 10, 2025

Uh oh!

geoknee Dec 9, 2025

Uh oh!

geoknee Dec 9, 2025

Uh oh!

geoknee Dec 11, 2025

Uh oh!

geoknee commented Dec 12, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Comments

		// Set recover mode on the sequencer:
		env.Sequencer.ActSetRecoverMode(t, true)

Conversation

geoknee commented Dec 9, 2025

Uh oh!

codecov bot commented Dec 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

geoknee Dec 9, 2025

Choose a reason for hiding this comment

Uh oh!

geoknee Dec 9, 2025

Choose a reason for hiding this comment

Uh oh!

geoknee Dec 9, 2025

Choose a reason for hiding this comment

Uh oh!

geoknee Dec 10, 2025

Choose a reason for hiding this comment

Uh oh!

geoknee Dec 9, 2025

Choose a reason for hiding this comment

Uh oh!

geoknee Dec 9, 2025

Choose a reason for hiding this comment

Uh oh!

geoknee Dec 11, 2025

Choose a reason for hiding this comment

Uh oh!

geoknee commented Dec 12, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Comments

codecov bot commented Dec 9, 2025 •

edited

Loading