op-acceptance-tests: add test to base gate for unsafe gossip / head progression#18031
op-acceptance-tests: add test to base gate for unsafe gossip / head progression#18031
Conversation
| // Stop batcher so there is no sync from deriving from L1 | ||
| // This won't work on external networks. | ||
| sys.L2Batcher.Stop() |
There was a problem hiding this comment.
If this test doesn't make sense to run on external networks, how does the acceptance test framework know to skip it? Is there some helper we should call, or preset we should use for that? @scharissis
There was a problem hiding this comment.
You'd do this with presets.WithCompatibleTypes(compat.SysGo) in your TestMain.
Good call-out.
There was a problem hiding this comment.
Can this be done per test or only per package?
There was a problem hiding this comment.
I believe only per package currently.
There was a problem hiding this comment.
Ok in that case I should probably pull this out into it's own package, since I don't want to avoid running the neighbouring tests on external devnets.
There was a problem hiding this comment.
What prevents it from being run on an external devnet?
There was a problem hiding this comment.
It's the control of the batcher, see other thread #18031 (comment). I think we could potentially work towards it running on an external network, if we can be sure that it is safe (in case someone points it at a production network) and won't interfere with other tests.
| func TestMain(m *testing.M) { | ||
| presets.DoMain(m, | ||
| presets.WithMinimal(), | ||
| presets.WithSingleChainMultiNode(), |
There was a problem hiding this comment.
Should we be using this for all base tests? Does it give us extra coverage in existing tests?
There was a problem hiding this comment.
Quite possibly, yes!
Also, this is how you'd mark this test to be skipped on external devnets:
| presets.WithSingleChainMultiNode(), | |
| presets.WithSingleChainMultiNode(), | |
| presets.WithCompatibleTypes(compat.SysGo), |
| func TestMain(m *testing.M) { | ||
| presets.DoMain(m, | ||
| presets.WithMinimal(), | ||
| presets.WithSingleChainMultiNode(), |
There was a problem hiding this comment.
Quite possibly, yes!
Also, this is how you'd mark this test to be skipped on external devnets:
| presets.WithSingleChainMultiNode(), | |
| presets.WithSingleChainMultiNode(), | |
| presets.WithCompatibleTypes(compat.SysGo), |
| // Stop batcher so there is no sync from deriving from L1 | ||
| // This won't work on external networks. | ||
| sys.L2Batcher.Stop() |
There was a problem hiding this comment.
You'd do this with presets.WithCompatibleTypes(compat.SysGo) in your TestMain.
Good call-out.
| indexOfSequencer = i | ||
| break | ||
| } | ||
| } |
There was a problem hiding this comment.
nit: Instead of doing that check manually, can't we add a special case in the preset for the L2ELSequencer and the L2ELValidator?
No need to do it in this PR, I don't think it's worth blocking that change
There was a problem hiding this comment.
match.WithSequencerActive+match.EngineFor is probably what we want here.
| verifierUnsafeHead := el.ChainSyncStatus(l2chainID, types.LocalUnsafe) | ||
| if verifierUnsafeHead.Number < initialUnsafeHeadNumber+NUM_UNSAFE_BLOCKS { | ||
| return false | ||
| } |
There was a problem hiding this comment.
Can we use el.AdvancedFn or el.ReachedFn here, and wait in parallel using the CheckAll method?
There was a problem hiding this comment.
quite possibly! I can have a play with those utils.
| sys := presets.NewSingleChainMultiNode(t) | ||
|
|
||
| // Stop batcher so there is no sync from deriving from L1 | ||
| // This won't work on external networks. |
There was a problem hiding this comment.
From what I can tell, it calls admin_stopBatcher. If we enable the admin namespace, can we run this on external networks?
There was a problem hiding this comment.
In principle, yes -- but I think that admin_stopBatcher is quite destructive and the batcher would need to be manually restarted (to restore the chain to a state where it can run additional ATs) with the current implementation.
| const TIMEOUT = 5 * time.Second // time to wait for all verifier nodes to catch up to the sequencer | ||
|
|
||
| // In order for this test to be valid, we need at least 2 L2 EL nodes. | ||
| t.Require().Greater(len(sys.L2Chain.L2ELNodes()), 1, "expected at least 2 L2 EL nodes") |
There was a problem hiding this comment.
This should already be guaranteed by the preset when DEVNET_EXPECT_PRECONDITIONS_MET is true (the default).
There was a problem hiding this comment.
OK. I'm tempted to leave this in as a sanity check / protect against regressions though.
| indexOfSequencer = i | ||
| break | ||
| } | ||
| } |
There was a problem hiding this comment.
match.WithSequencerActive+match.EngineFor is probably what we want here.
| continue | ||
| } | ||
| verifierUnsafeHead := el.ChainSyncStatus(l2chainID, types.LocalUnsafe) | ||
| if verifierUnsafeHead.Number < initialUnsafeHeadNumber+NUM_UNSAFE_BLOCKS { |
There was a problem hiding this comment.
Should we check if the hash matches? Not sure if it matters in this case.
There was a problem hiding this comment.
We have Matched and MatchedFn that checks both hash and number. We can just use it.
|
This pr has been automatically marked as stale and will be closed in 5 days if no updates |
|
This pr has been automatically marked as stale and will be closed in 5 days if no updates |
|
This pr has been automatically marked as stale and will be closed in 5 days if no updates |
We recently realised that acceptance tests do not give us good coverage for unsafe head progression via p2p gossip. See, for example, #17940 (a bug which wasn't caught by ATs) as well as a recent op-reth bug we only caught on devnet.
This test adds that coverage, and simply asserts that a) there are some non-sequencer verifier nodes and b) they catch up to the sequencer within a timeout period.
The test disables the sysgo batcher, because otherwise safe chain derivation would allow the test to pass even if gossip was broken. Therefore this test should not run against external networks.
I confirmed manually that introducing a bug, where op-node just rejects gossip payloads unconditionally, does indeed cause this test to fail.