op-supernode: Block Replacement#19091
Conversation
Wiz Scan Summary
To detect these findings earlier in the dev lifecycle, try using Wiz Code VS Code Extension. |
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## supernode/messageValidation #19091 +/- ##
==============================================================
- Coverage 76.0% 76.0% -0.1%
==============================================================
Files 188 188
Lines 10943 10943
==============================================================
- Hits 8326 8323 -3
- Misses 2473 2476 +3
Partials 144 144
Flags with carried forward coverage won't be shown. Click here to find out more. 🚀 New features to boost your workflow:
|
fb2860d to
309ed88
Compare
d889adf to
31ea948
Compare
347794a to
b6cdd73
Compare
- Add DenyList using bbolt for persistent storage of invalid block hashes - Add InvalidateBlock method to ChainContainer interface - InvalidateBlock adds hash to denylist and triggers rewind if current block matches - Add IsDenied helper to check if a block is on the deny list - Add acceptance test skeleton for block replacement flow
- Add table-driven tests for DenyList (Add, Contains, Persistence, GetDeniedHashes) - Add tests for InvalidateBlock (rewind triggers, no-rewind cases, timestamp calculation) - Add tests for IsDenied helper - Fix OpenDenyList to create parent directories - Add InvalidateBlock/IsDenied stubs to mock ChainContainers in existing tests
When the interop activity detects an invalid executing message, it now calls InvalidateBlock on the chain container. This: 1. Adds the block to the chain's denylist (persisted via bbolt) 2. Checks if the chain is currently using that block 3. Triggers a rewind to the prior timestamp if so The acceptance test is renamed to TestSupernodeInteropInvalidMessageReset and temporarily skipped due to pre-existing logsDB consistency issues blocking interop progression.
- Use eth.Unsafe label in BlockAtTimestamp call (was using empty string) - Make acceptance test resilient to block not existing after rewind - Test now passes: detects reset when block 8 is rewound away
Introduce SuperAuthority interface that allows external authority (like op-supernode) to deny payloads before they are inserted into the engine. - Define SuperAuthority interface in op-node/node/node.go with IsDenied method - Add SuperAuthority to InitializationOverrides for injection - Wire SuperAuthority through Driver to EngineController - Check IsDenied before NewPayload in payload_process.go - If denied during Holocene derivation, request deposits-only replacement - Update tests and op-program to pass nil for SuperAuthority
Add ResetOn(chainID, timestamp) to Activity interface so activities can clean up cached state when a chain container rewinds. - Activity.ResetOn called by Supernode when any chain resets - Supernode.onChainReset distributes reset to all activities - Heartbeat and Superroot implement no-op ResetOn (no cached state) - Update test mocks to implement ResetOn
Add reset callback mechanism to ChainContainer so it can notify the supernode when a chain rewinds due to block invalidation. - Add ResetCallback type and onReset field to simpleChainContainer - Add SetResetCallback to ChainContainer interface - Call onReset after successful RewindEngine in InvalidateBlock - Update test mocks to implement SetResetCallback
When a chain rewinds, the Interop activity must clean up its cached state: - Rewind logsDB for the chain to before the reset timestamp - Rewind verifiedDB to remove entries at or after reset timestamp - Log error if verified results were deleted (unexpected) Also adds: - Rewind/Clear methods to LogsDB interface - RewindTo method to VerifiedDB - noopInvalidator for calling logsDB.Rewind - Update test mocks for new interface methods
Add assertions to verify block replacement behavior: - Replacement block hash differs from invalid block hash - Invalid transaction is NOT present in replacement block These checks confirm the deposits-only replacement mechanism works correctly after an invalid executing message is detected.
Add comprehensive unit tests for block invalidation sub-features: - TestDenyList_ConcurrentAccess: Verify concurrent Add/Contains operations - TestInvalidateBlock: Test invalidateBlock wiring from Interop to ChainContainer - TestResetOn: Test Interop activity reset behavior on chain rewind - TestVerifiedDB_RewindTo: Test VerifiedDB rewind functionality - TestSuperAuthority_*: Test payload denial in op-node engine controller Also includes BlockInvalidation_Feature.md diary documenting all sub-features.
- Add guard check rejecting InvalidateBlock for height=0 (genesis block) - Document SetResetCallback must only be called during initialization - Add test case for genesis block invalidation error
- Rename halt/ package to reorg/ (reflects actual behavior) - Move invalid_message_replacement_test.go into reorg package - Delete obsolete invalid_message_halt_test.go (superseded by reorg test) The halt test tested old behavior where invalid messages caused the chain to halt. With block invalidation, the chain now rewinds and replaces the invalid block with a deposits-only block.
Add test-only control methods for the interop activity to support precise timing control in acceptance tests. Specification: - PauseInterop(ts): When called, interop activity returns early when it would process the given timestamp, without making progress - ResumeInterop(): Clears the pause, allowing normal processing - Zero value indicates 'not paused' (always process all values) - Atomic read/write for concurrent safety Implementation layers: - Interop: pauseAtTimestamp atomic.Uint64 field + check in progressInterop() - Supernode: delegates to interop activity by type assertion - sysgo.SuperNode: exposes methods, delegates to supernode.Supernode - stack.InteropTestControl: interface defining the test control contract - Orchestrator: InteropTestControl(id) method to get control for a supernode - DSL Supernode: NewSupernodeWithTestControl() + wrapper methods - Preset: wires up test control in NewTwoL2SupernodeInterop() This enables acceptance tests to: sys.Supernode.PauseInterop(targetTimestamp + 1) // ... perform test setup ... sys.Supernode.ResumeInterop()
- Consolidate SuperAuthority interface to op-node/rollup/iface.go (remove duplicate definitions from node.go, engine_controller.go, resources/) - Move onReset callback invocation from InvalidateBlock to RewindEngine (fires on any engine reset, not just block invalidation) - Move helper methods (SetResetCallback, blockNumberToTimestamp, IsDenied) from invalidation.go to chain_container.go - Delete unused op-supernode/supernode/resources/super_authority.go
84b876d to
0a814ac
Compare
axelKingsley
left a comment
There was a problem hiding this comment.
Doing a cycle of self-review. These comments are made while editing, meaning they likely are immediately outdated for the better.
op-acceptance-tests/tests/supernode/interop/halt/invalid_message_halt_test.go
Outdated
Show resolved
Hide resolved
op-acceptance-tests/tests/supernode/interop/reorg/invalid_message_reorg_test.go
Show resolved
Hide resolved
op-acceptance-tests/tests/supernode/interop/reorg/invalid_message_reorg_test.go
Outdated
Show resolved
Hide resolved
ajsutton
left a comment
There was a problem hiding this comment.
Nice to see the behaviour being validated with an AT (which is the only code I really read...). Some non-blocking suggestions to help build up a library of utility method that will make writing future ATs easier and keep things maintainable.
op-acceptance-tests/tests/supernode/interop/reorg/invalid_message_reorg_test.go
Outdated
Show resolved
Hide resolved
op-acceptance-tests/tests/supernode/interop/reorg/invalid_message_reorg_test.go
Outdated
Show resolved
Hide resolved
op-acceptance-tests/tests/supernode/interop/reorg/invalid_message_reorg_test.go
Outdated
Show resolved
Hide resolved
Plan expiredYour subscription has expired. Please renew your subscription to continue using CI/CD integration and other features. |
geoknee
left a comment
There was a problem hiding this comment.
Review Part 1, Part 2 to follow tomorrow.
op-acceptance-tests/tests/supernode/interop/reorg/invalid_message_reorg_test.go
Outdated
Show resolved
Hide resolved
op-acceptance-tests/tests/supernode/interop/reorg/invalid_message_reorg_test.go
Outdated
Show resolved
Hide resolved
op-acceptance-tests/tests/supernode/interop/reorg/invalid_message_reorg_test.go
Outdated
Show resolved
Hide resolved
op-acceptance-tests/tests/supernode/interop/reorg/invalid_message_reorg_test.go
Show resolved
Hide resolved
op-acceptance-tests/tests/supernode/interop/reorg/invalid_message_reorg_test.go
Show resolved
Hide resolved
DSL changes: - Add TimestampForBlockNum helper to L2Network - Simplify SendInvalidExecMessage to just increment log index Test changes: - invalid_message_reorg_test: use TimestampForBlockNum, check eth.NotFound - Move SuperAuthority tests to super_authority_deny_test.go - Fix denied payload test to expect DepositsOnlyPayloadAttributesRequestEvent - Make denyBlock private in test mock Interop activity: - Extract resetLogsDB and resetVerifiedDB helper functions
… (superAuthority) (#19110) * op-supernode: add SuperAuthority interface and wire into op-node - Add SuperAuthority interface in supernode resources package - Add SupernodeAuthority implementation (empty for now) - Add SuperAuthority field to InitializationOverrides in op-node - Attach SuperAuthority to OpNode struct during initialization - Wire SupernodeAuthority through supernode to all chain containers * Replace `safeHead` with dynamic method `SafeL2Head()` * WIP * WIP * Move SuperAuthority interface and related code to chain_container package * WIP * Implement SafeL2Head method for chain container. Establish a new method LatestVerifiedTimestamp() (uint64, bool) on the verifier interface: * Update LatestVerifiedTimestamp method to LatestVerifiedL2Block * Update SuperAuthority interface method name Rename SafeL2Head to FullyVerifiedL2Head in SuperAuthority interface * Fix SafeL2Head to use superAuthority field add TODOs * Refactor BlockAtTimestamp to use TargetBlockNumber and L2BlockRefByNumber * Rename BlockAtTimestamp to LocalSafeBlockAtTimestamp * Remove BlockLabel parameter from LocalSafeBlockAtTimestamp * remove accidental ressurected code from rebasing * Refactor BlockAtTimestamp method to LocalSafeBlockAtTimestamp * Update sync status logging to use LocalSafeL2 fields * Update sync status struct field names in timestamp progression test * Remove unused mock struct fields in chain container test * Replace SafeL2 with LocalSafeL2 in Superroot activity * Add VerifiedSafeBlockAtTimestamp method to chain container The new method computes and returns the safe L2 block at a given timestamp, with additional checks against the current safe head. * Replace VerifiedSafeBlockAtTimestamp with LocalSafeBlockAtTimestamp * enhance error msg * remove unused getter * Add local safe head tracking for SafeDB in interop mode * simplify Simplify safe head reset logic in sync deriver * Update comments to clarify local safe head references * Update test comments to clarify "safe head" terminology * Remove local safe update handling in SyncDeriver * Add block time to test chain config to prevent division by zero * Add deprecated safe head field to EngineController * Remove commented-out test functions * Add additional assertions for L2 block references * Move check on safe head to a test which actually advances it and conditionalize the check on the safe head being nonzero * move again * fix test * Change FullyVerifiedL2Head to return BlockID instead of L2BlockRef * Updatee Safe Heamd Retrioeval with Context aned Improved Panic Message * Add tests for SafeL2Head in engine controller * Add tests for LocalSafeBlockAtTimestamp method * Update FullyVerifiedL2Head to track earliest verified block * Add safe head progression tests for supernode interop * Add test coverage for FullyVerifiedL2Head method update implementation to panic if verifiers disagree on block hash for the same timestamp * Add test mocks for engine controller forkchoice update * adopt superAuthority wiring from #19091 * fix compiler errors after merging with main branch * Gracefully Handle SuperAuthority L2 Head Ahead of Local Safe Head Modify SafeL2Head() to log a debug message and return the local safe head when the SuperAuthority's verified head is ahead of the local safe head, instead of panicking. This change prevents unexpected crashes and provides a fallback mechanism that maintains node stability while logging the discrepancy for investigation. * Add safe head progression test for supernode interop This commit updates the safe head progression test in op-acceptance-tests to validate cross-chain safe head synchronization behavior. Key changes include: - Pause and resume interop verification to test safe head stalling - Add assertions for cross-safe and local-safe head progression - Check execution layer safe label consistency - Simplify test structure for better readability and reliability * speed up test by 2x * DSL: Coordinate Interop Activity Pause for Acceptance Testing * self review fixes * finish moving code * Remove BlockAtTimestamp method from EngineController interface * Fix chain container test temp directory config * Add robust pause timestamp calculation for supernode interop test allows the test to work on chains with different block times * Refactor Safe Head Progression Test for Conciseness * Update safe head progression test comment for clarity * Remove unnecessary imports and batcher configuration in supernode init test * add safety-labels.md * Add safe head validation check in supernode interop test * Update MockEngineForInvalidation to Simplify Block Retrieval * Refactor EnsureInteropPaused to simplify timestamp selection * Refactor Safe Head Progression Test for Dynamic Timestamp Handling * Increase timeout for supernode timestamp validation Extends the context timeout in `AwaitValidatedTimestamp` from `DefaultTimeout` to `5*DefaultTimeout` to provide more time for timestamp validation, reducing potential race conditions. Removes verbose commentary in safety-labels.md, simplifying the explanation while preserving the core concept of avoiding multiple stateful controllers. * Update safety-labels.md * Handle logsDB reset by early return when timestamp not sealed * op-supernode: pass invalidated BlockRef through Reset callback chain This change modifies the Reset callback to pass the invalidated BlockRef directly to activities, allowing resetLogsDB to use the block info without making RPC calls to LocalSafeBlockAtTimestamp during reset. The problem was that during a reset, LocalSafeBlockAtTimestamp could fail if the chain container's safe head hadn't yet caught up after the rewind, causing the logsDB to be cleared entirely instead of rewound to the correct position. Changes: - Activity.Reset now takes invalidatedBlock eth.BlockRef parameter - ResetCallback type updated to include invalidatedBlock - RewindEngine accepts invalidatedBlock and passes to callback - InvalidateBlock constructs BlockRef from the invalidated block - Interop.resetLogsDB uses BlockRef directly (Number-1, ParentHash) - verifiedDB.RewindAfter keeps reset timestamp, removes only after - All test mocks updated for new signatures * Make Proofs wait on LocalSafe instead of Safe * Adjust 4 More Proofs Tests to LocalSafe --------- Co-authored-by: axelKingsley <axel.kingsley@gmail.com>
… (superAuthority) (#19110) * op-supernode: add SuperAuthority interface and wire into op-node - Add SuperAuthority interface in supernode resources package - Add SupernodeAuthority implementation (empty for now) - Add SuperAuthority field to InitializationOverrides in op-node - Attach SuperAuthority to OpNode struct during initialization - Wire SupernodeAuthority through supernode to all chain containers * Replace `safeHead` with dynamic method `SafeL2Head()` * WIP * WIP * Move SuperAuthority interface and related code to chain_container package * WIP * Implement SafeL2Head method for chain container. Establish a new method LatestVerifiedTimestamp() (uint64, bool) on the verifier interface: * Update LatestVerifiedTimestamp method to LatestVerifiedL2Block * Update SuperAuthority interface method name Rename SafeL2Head to FullyVerifiedL2Head in SuperAuthority interface * Fix SafeL2Head to use superAuthority field add TODOs * Refactor BlockAtTimestamp to use TargetBlockNumber and L2BlockRefByNumber * Rename BlockAtTimestamp to LocalSafeBlockAtTimestamp * Remove BlockLabel parameter from LocalSafeBlockAtTimestamp * remove accidental ressurected code from rebasing * Refactor BlockAtTimestamp method to LocalSafeBlockAtTimestamp * Update sync status logging to use LocalSafeL2 fields * Update sync status struct field names in timestamp progression test * Remove unused mock struct fields in chain container test * Replace SafeL2 with LocalSafeL2 in Superroot activity * Add VerifiedSafeBlockAtTimestamp method to chain container The new method computes and returns the safe L2 block at a given timestamp, with additional checks against the current safe head. * Replace VerifiedSafeBlockAtTimestamp with LocalSafeBlockAtTimestamp * enhance error msg * remove unused getter * Add local safe head tracking for SafeDB in interop mode * simplify Simplify safe head reset logic in sync deriver * Update comments to clarify local safe head references * Update test comments to clarify "safe head" terminology * Remove local safe update handling in SyncDeriver * Add block time to test chain config to prevent division by zero * Add deprecated safe head field to EngineController * Remove commented-out test functions * Add additional assertions for L2 block references * Move check on safe head to a test which actually advances it and conditionalize the check on the safe head being nonzero * move again * fix test * Change FullyVerifiedL2Head to return BlockID instead of L2BlockRef * Updatee Safe Heamd Retrioeval with Context aned Improved Panic Message * Add tests for SafeL2Head in engine controller * Add tests for LocalSafeBlockAtTimestamp method * Update FullyVerifiedL2Head to track earliest verified block * Add safe head progression tests for supernode interop * Add test coverage for FullyVerifiedL2Head method update implementation to panic if verifiers disagree on block hash for the same timestamp * Add test mocks for engine controller forkchoice update * adopt superAuthority wiring from #19091 * fix compiler errors after merging with main branch * Gracefully Handle SuperAuthority L2 Head Ahead of Local Safe Head Modify SafeL2Head() to log a debug message and return the local safe head when the SuperAuthority's verified head is ahead of the local safe head, instead of panicking. This change prevents unexpected crashes and provides a fallback mechanism that maintains node stability while logging the discrepancy for investigation. * Add safe head progression test for supernode interop This commit updates the safe head progression test in op-acceptance-tests to validate cross-chain safe head synchronization behavior. Key changes include: - Pause and resume interop verification to test safe head stalling - Add assertions for cross-safe and local-safe head progression - Check execution layer safe label consistency - Simplify test structure for better readability and reliability * speed up test by 2x * DSL: Coordinate Interop Activity Pause for Acceptance Testing * self review fixes * finish moving code * Remove BlockAtTimestamp method from EngineController interface * Fix chain container test temp directory config * Add robust pause timestamp calculation for supernode interop test allows the test to work on chains with different block times * Refactor Safe Head Progression Test for Conciseness * Update safe head progression test comment for clarity * Remove unnecessary imports and batcher configuration in supernode init test * add safety-labels.md * Add safe head validation check in supernode interop test * Update MockEngineForInvalidation to Simplify Block Retrieval * Refactor EnsureInteropPaused to simplify timestamp selection * Refactor Safe Head Progression Test for Dynamic Timestamp Handling * Increase timeout for supernode timestamp validation Extends the context timeout in `AwaitValidatedTimestamp` from `DefaultTimeout` to `5*DefaultTimeout` to provide more time for timestamp validation, reducing potential race conditions. Removes verbose commentary in safety-labels.md, simplifying the explanation while preserving the core concept of avoiding multiple stateful controllers. * Update safety-labels.md * Handle logsDB reset by early return when timestamp not sealed * op-supernode: pass invalidated BlockRef through Reset callback chain This change modifies the Reset callback to pass the invalidated BlockRef directly to activities, allowing resetLogsDB to use the block info without making RPC calls to LocalSafeBlockAtTimestamp during reset. The problem was that during a reset, LocalSafeBlockAtTimestamp could fail if the chain container's safe head hadn't yet caught up after the rewind, causing the logsDB to be cleared entirely instead of rewound to the correct position. Changes: - Activity.Reset now takes invalidatedBlock eth.BlockRef parameter - ResetCallback type updated to include invalidatedBlock - RewindEngine accepts invalidatedBlock and passes to callback - InvalidateBlock constructs BlockRef from the invalidated block - Interop.resetLogsDB uses BlockRef directly (Number-1, ParentHash) - verifiedDB.RewindAfter keeps reset timestamp, removes only after - All test mocks updated for new signatures * Make Proofs wait on LocalSafe instead of Safe * Adjust 4 More Proofs Tests to LocalSafe * Revert "Adjust 4 More Proofs Tests to LocalSafe" This reverts commit 0b28c12. * Revert "Make Proofs wait on LocalSafe instead of Safe" This reverts commit cbdb215. * Improve fallback mechanism for fully verified L2 head Enhance `FullyVerifiedL2Head()` to handle scenarios with no verifiers or incomplete verification: - Add explicit fallback to local-safe head when no verifiers are registered - Improve logging for better traceability of fallback mechanism - Gracefully handle cases where rollup client or sync status retrieval fails - Prevent potential nil pointer dereference * tidy * Update SuperAuthority interface to support local-safe fallback * Remove unnecessary nil checks on logger --------- Co-authored-by: axelKingsley <axel.kingsley@gmail.com>
Tool Use Notice
This code was generated with a TDD pattern, iterated and then ultimately rewritten through self-review by the author. Tests remain largely generative. All architecture patterns and directions were previously well defined, so none of the solutions in this PR are themselves AI generated. Wherever possible the "jump" from point A to point B by the model was as small as possible.
A "feature.md" diary file can be found in the PR presently, which can be read to understand the tool use better. I only started the diary part way though this work so some of it is backfill.
Block Invalidation Feature
Overview
This PR implements block invalidation for the op-supernode. When the Interop activity detects an invalid cross-chain executing message, it invalidates the containing block, triggers a chain rewind, and rebuilds a replacement block using Holocene's deposits-only mechanism.
Data Flow: Invalid Message Detection → Block Replacement
Changes
Work Breakdown
Because this work was done generatively, I am able to produce good record of the order and ways in which each piece was created.
Milestone 1: DenyList + InvalidateBlock
Synopsis
A bbolt-backed persistence layer for tracking invalid block payload hashes, plus the mechanism to trigger chain rewinds when an invalid block is currently in use. The DenyList stores hashes keyed by block height, allowing multiple denied hashes per height. When
InvalidateBlockis called, the hash is persisted and the chain container checks if the current chain tip matches—if so, a rewind is triggered.Specified Behavior
(true, nil)(false, nil)Tests Introduced
TestDenyList_AddAndContainsTestDenyList_PersistenceTestDenyList_GetDeniedHashesTestDenyList_ConcurrentAccessTestInvalidateBlockTestIsDeniedMilestone 2: Wire Interop → ChainContainer
Synopsis
Connects the Interop activity's invalid message detection to the ChainContainer's invalidation mechanism created in the previous milestone. When the Interop activity identifies an executing message that references a non-existent initiating message, it calls
invalidateBlockwhich looks up the appropriate chain container and delegates to itsInvalidateBlockmethod.Specified Behavior
"chain %s not found""chain rewound due to invalid block""block added to denylist"Tests Introduced
TestInvalidateBlock(interop)Milestone 3: SuperAuthority Injection
Synopsis
Allows the op-supernode to inject a "SuperAuthority" into the op-node's engine controller. This authority is consulted before payload insertion—if it denies a payload, the engine controller either emits a
PayloadInvalidEvent(pre-Holocene) or requests a deposits-only replacement block (Holocene, the actual expected behavior). This prevents denied blocks from being re-inserted after a rewind.Specified Behavior
IsDeniedreturns(true, nil)emitDepositsOnlyPayloadAttributesRequestcalledPayloadInvalidEventemittedIsDeniedreturns(false, nil)IsDeniedreturns errorTests Introduced
TestSuperAuthority_DeniedPayload_EmitsInvalidEventTestSuperAuthority_AllowedPayload_ProceedsTestSuperAuthority_Error_ProceedsWithPayloadTestSuperAuthority_NilAuthority_ProceedsMilestone 4: Activity Reset Notification Chain
Synopsis
When a chain container rewinds, it notifies the supernode via a callback, which then broadcasts
ResetOn(chainID, timestamp)to all activities. This allows activities to clear any cached state that may have become stale. The Interop activity specifically rewinds its logsDB (clearing logs at or after the timestamp) and verifiedDB (clearing verification results at or after the timestamp).Specified Behavior
logsDB.Rewind(prevBlockID)logsDB.Clear()logsDB.Clear()Tests Introduced
TestVerifiedDB_RewindToTestResetOnAcceptance Test
Synopsis
End-to-end test with two L2 chains and a supernode. Sends a valid initiating message on chain A, then sends an invalid executing message on chain B (referencing a non-existent log index). Observes that the supernode detects the invalid message, rewinds chain B, and builds a replacement block without the invalid transaction.
Notably, this test utilizes a test-only pause functionality on the interop activity, which allows it to stall at a specific timestamp. this allows for Acceptance tests to positively witness the original block and the replacement separately.
TestSupernodeInteropInvalidMessageReplacementop-acceptance-tests/tests/supernode/interop/reorg/invalid_message_reorg_test.goAssertions
resetDetected == truereplacementDetected == truereplacementHash ≠ invalidHashinvalidTx NOT in replacementTxsverified == true