fix(supervisor/core): l1_exhaust and reorg race condition#2779
fix(supervisor/core): l1_exhaust and reorg race condition#2779dhyaniarun1993 merged 27 commits intomainfrom
l1_exhaust and reorg race condition#2779Conversation
|
Claude finished @dhyaniarun1993's task —— View job PR Review: Race Condition FixTodo List:
Review SummaryThis PR fixes a race condition between L1 exhaustion and reorganization events in the supervisor core. The implementation adds defensive checks to ensure that when a parent hash mismatch is detected (indicating a potential reorg), the system validates whether the reorg has been properly handled before proceeding. ✅ Strengths
|
Codecov Report❌ Patch coverage is
☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
emhane
left a comment
There was a problem hiding this comment.
ideally we only want info log messages in the main program loop, and anywhere else only trace, debug, warn or error. see how it's done for sigp/discv5.
|
Claude finished @dhyaniarun1993's task —— View job PR Review: Race Condition FixTodo List:
Review SummaryThis PR fixes a race condition between L1 exhaustion and reorganization events in the supervisor core. The implementation adds defensive checks to ensure that when a parent hash mismatch is detected (indicating a potential reorg), the system validates whether the reorg has been properly handled before proceeding. ✅ Strengths
|
There was a problem hiding this comment.
Pull Request Overview
This PR fixes a race condition between l1_exhaust and reorg events by adding canonical block validation to prevent resets with non-canonical source blocks.
Key Changes
- Added canonical block validation to the reset process using L1 provider
- Modified log levels from
warntodebugfor parent hash mismatches - Updated test infrastructure to include L1 provider mocking
Reviewed Changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
crates/supervisor/core/src/syncnode/resetter.rs |
Added L1 provider field and canonical block validation logic with comprehensive test coverage |
crates/supervisor/core/src/syncnode/node.rs |
Updated constructor to pass L1 provider to resetter and adjusted logging levels |
Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.
| #[derive(Debug)] | ||
| pub(super) struct Resetter<DB, C> { | ||
| client: Arc<C>, | ||
| l1_provider: RootProvider<Ethereum>, |
There was a problem hiding this comment.
isn't l1_provider essentially an abstraction over db_provider? wondering if we can make the field naming here more helpful or add some docs
There was a problem hiding this comment.
Nope. those are two different data providers.
db_provider - Interface to interact with the supervisor database
l1_provide - Interface to interact with the L1 layer
There was a problem hiding this comment.
ah, i see. but the db also has l1 blocks right? perhaps rename l1_provider to l1_rpc_client?
There was a problem hiding this comment.
Yep, the database also have l1 but they are more like derivation mapping.
emhane
left a comment
There was a problem hiding this comment.
was meant to be a comment not request change explicitly, my bad
| // check if the source of valid local_safe is canonical | ||
| let source = self.db_provider.derived_to_source(local_safe.id())?; | ||
| if !self.is_canonical(chain_id, source.id()).await? { | ||
| warn!(target: "supervisor::syncnode_resetter", %chain_id, %source, "Source block for the valid local safe is not canonical"); |
There was a problem hiding this comment.
Can you explain what was the issue happening and how erroring in reset resolving the issue?
There was a problem hiding this comment.
Since we are dependent on L1 polling for detecting the reorg, there is a scenario where L1 reorg might have occurred and not yet know to the supervisor. Adding canonical check during the reset makes sure that we only reset the node to the valid canonical state.
There was a problem hiding this comment.
The race condition between l1_reorg and l1-exhaust or op-node reset is handled by withholding the reset information until the supervisor has processed the L1 reorg and its state has been rewound.
Closes #2777