op-node/recover-mode: handle l1 origin close to tip gracefully#18556
op-node/recover-mode: handle l1 origin close to tip gracefully#18556
Conversation
Run env.RunFaultProofProgram after computing l2SafeHead (l2SafeHead.Number.Uint64()/2) and replace the FromGenesis call with RunFaultProofProgram. Fix minor comment typos and wrap a long log line.
Codecov Report✅ All modified and coverable lines are covered by tests.
Additional details and impacted files@@ Coverage Diff @@
## develop #18556 +/- ##
===========================================
- Coverage 75.34% 66.58% -8.77%
===========================================
Files 189 55 -134
Lines 11228 4031 -7197
===========================================
- Hits 8460 2684 -5776
+ Misses 2622 1203 -1419
+ Partials 146 144 -2
Flags with carried forward coverage won't be shown. Click here to find out more. 🚀 New features to boost your workflow:
|
| if errors.Is(err, ethereum.NotFound) { | ||
| // If the next origin is not found, it means we are at the end of the chain. | ||
| // In this case, we set the next origin to an empty block reference. | ||
| los.log.Error("next L1 origin not found, recover mode likely brought L1 origin up to the tip of the chain", "error", err) | ||
| los.nextOrigin = eth.L1BlockRef{} | ||
| return los.currentOrigin, los.nextOrigin, nil | ||
| } else if err != nil { |
There was a problem hiding this comment.
The action test added in this PR fails without this patch.
There was a problem hiding this comment.
The failure is due to this error
l2_sequencer.go:111:
Error Trace: /Users/georgeknee/code/ethereum-optimism/optimism/op-e2e/actions/helpers/l2_sequencer.go:111
/Users/georgeknee/code/ethereum-optimism/optimism/op-e2e/actions/proofs/sequence_window_expiry_test.go:90
/Users/georgeknee/code/ethereum-optimism/optimism/op-e2e/actions/proofs/helpers/matrix.go:47
Error: Received unexpected error:
EOF
Test: Test_ProgramAction_SequenceWindowExpired/HonestClaim-jovian
Messages: failed to start block building
I'm not sure what this would mean outside of the action test environment, but I think it is probably related to what we saw with #18350.
There was a problem hiding this comment.
I think this is just because the action test helper being called requires block building to succeed, and it doesn't because of the unavailability of the next l1 origin.
So it does not really tally with the sequencer drift violation, I haven't yet manage to repro that. But still, it is better to avoid the temporary errors and continue block building with the old origin when we get into this situation.
There was a problem hiding this comment.
Modified the test to not get stuck on this, if it can't build a block we just wait until more l1 blocks are available. This means the test actually passes without the patch. But I added an acceptance test which still fails without the patch.
| // Set recover mode on the sequencer: | ||
| env.Sequencer.ActSetRecoverMode(t, true) |
There was a problem hiding this comment.
I manually confirmed that the test fails without recover mode enabled.
| // It seems more difficult (almost impossible) to recover from sequencing window expiry with span batches, | ||
| // since the singular batches within are invalidated _atomically_. | ||
| // That is to say, if the oldest batch in the span batch fails the sequencing window check | ||
| // (l1 origin + seq window < l1 inclusion) | ||
| // All following batches are invalidated / dropped as well. | ||
| // https://github.com/ethereum-optimism/optimism/blob/73339162d78a1ebf2daadab01736382eed6f4527/op-node/rollup/derive/batches.go#L96-L100 | ||
| // | ||
| // If the same blocks were batched with singular batches, the validation rules are different | ||
| // https://github.com/ethereum-optimism/optimism/blob/73339162d78a1ebf2daadab01736382eed6f4527/op-node/rollup/derive/batches.go#L83-L86 | ||
| // In the case of recover mode, the noTxPool=true condition means autoderviation actually fills | ||
| // the gap with identical blocks anyway, meaning the following batches are actually still valid. | ||
| bc := helpers.NewBatcherCfg() | ||
| bc.ForceSubmitSingularBatch = true |
There was a problem hiding this comment.
I manually confirmed the test fails without forcing singular batches.
Replace manual error assertions around FindL1Origin with requireL1OriginAt and remove the now-unused derive import.
Set default L1 block time to 12s to match action helper assumptions. Increase sequencer window size in the test to 50. Compute drift from UnsafeL2 headers (use UnsafeL2.L1Origin). Simplify L1 mining to always BatchMineAndSync and remove the extra numL1Blocks > 10 lag guard.
| switch { | ||
| case drift == 0: | ||
| t.Fatal("drift is zero, this implies the unsafe l2 head is pinned to the l1 head") | ||
| case drift > int(tp.MaxSequencerDrift): |
There was a problem hiding this comment.
Note that tp.MaxSequencerDrift is probably a complete red herring her since this was changed to a protocol constant with Fjord https://specs.optimism.io/protocol/fjord/derivation.html?highlight=drift#constant-maximum-sequencer-drift.
|
In favour of #18589 |
Closes #18350
Adds an action test showing how recover mode can work to recover a chain, with explanatory notes about how operators should not use span batches. This can be developed into a better runbook for recovering from a sequencing window expiry incident.