feat: replace SharedDomainType with SharedForkLifecycle for fork-aware peer selection#824
Conversation
This comment was marked as outdated.
This comment was marked as outdated.
Add a new lifecycle module to the fork crate that models fork transition states as a single enum with three variants: - Normal: operating on a single fork (pre-fork or post-grace-period) - WarmUp: preparing for an upcoming fork (dual-subscribing to new topics) - GracePeriod: fork activated but keeping old subscriptions for late messages SharedForkLifecycle wraps this in Arc<RwLock<T>> for cross-component sharing, following the same pattern as the SharedDomainType it will replace. Each variant carries domain_type, making invalid states unrepresentable. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
ForkMonitor now accepts a SharedForkLifecycle and updates it right
before broadcasting each ForkPhase event:
- Preparing → WarmUp { current, upcoming, domain_type }
- Activated → GracePeriod { current, previous, domain_type }
- GracePeriodEnded → Normal { current, domain_type }
This makes ForkMonitor the single writer of fork lifecycle state.
Components that previously needed to react to events just to cache
derived values can now read the shared state directly.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Replace the SharedDomainType abstraction with SharedForkLifecycle across all network components, making peer selection fork-aware. Key changes: - Remove SharedDomainType from network crate entirely - Discovery and Handshake read domain_type via fork_lifecycle - Network::on_fork_phase simplified to only update ENR (domain type updates are now handled by ForkMonitor via SharedForkLifecycle) - ConnectionManager uses fork lifecycle to filter peer subnets: in Normal state only the current fork's bitmap is returned; during WarmUp/GracePeriod both forks are aggregated - Client creates SharedForkLifecycle and passes to both ForkMonitor and Network This prevents peers subscribed only to a defunct fork from appearing useful for subnet coverage after the grace period ends. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
47555e3 to
50378e6
Compare
|
@claude review this PR |
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
1 similar comment
|
I'll analyze this and get back to you. |
This comment was marked as outdated.
This comment was marked as outdated.
| /// | ||
| /// Both forks' contexts are relevant — peers subscribed to either | ||
| /// the current or upcoming fork are useful. | ||
| WarmUp { |
There was a problem hiding this comment.
What about collapsing WarmUp and GracePeriod variants into something like
Transitioning {
current,
domain_type
}
just an idea since consumers don't seem to care whether they're in WarmUp or GracePeriod, just that current and domain_type are correct. and upcoming/previous fields seem unused
| ) -> Option<Bitfield<Fixed<U128>>> { | ||
| let fork_map = self.observed_peer_subnets.get(peer)?; | ||
| Some(Self::aggregate_fork_bitmaps(fork_map)) | ||
| match lifecycle { |
There was a problem hiding this comment.
when fork_map is empty (newly connected peer, no gossipsub yet), Normal returns None (triggering ENR fallback) but WarmUp/GracePeriod returns Some(all-zeros) (skipping ENR fallback). Adding an early if fork_map.is_empty() { return None } would make the behavior consistent
|
|
||
| /// Fork lifecycle state. Updated only by ForkMonitor. | ||
| #[derive(Clone, Debug, PartialEq, Eq)] | ||
| pub enum ForkLifecycle { |
There was a problem hiding this comment.
thinking of ways we could simplify this further @dknopik. I think one way would be to convert this enum to a flat struct:
pub struct ForkState {
pub domain_type: DomainType,
pub current_fork: Fork,
pub in_transition: bool,
}
since network components seem to just care about what is the current domain type and whether the network is in a fork transition period (warm up/grace period) vs if normal operation
another idea would just be to extend ForkSchedule with a current field:
pub struct ForkSchedule {
configs: BTreeMap<Fork, ForkConfig>,
network_name: String,
// Runtime: updated by ForkMonitor, read by
network components
current: RwLock<ActiveFork>,
}
struct ActiveFork {
fork: Fork,
in_transition: bool,
}
benefit of this is killing need for new ForkState module. con is that it mixes config (current ForkSchedule struct) with runtime characteristics (new current field)
or perhaps you wanted to take a look and come up with something. if so, feel free to lmk if you want me to try out whatever you're thinking
There was a problem hiding this comment.
Good call on simplification. I’d keep runtime state separate from ForkSchedule (it’s config-only today), so I don’t think adding mutable current there is the right direction.
I’m fine simplifying the consumer API with an is_transition() helper, but I’d keep ForkLifecycle as an enum for now so we retain WarmUp vs GracePeriod semantics if/when we need them
This comment was marked as outdated.
This comment was marked as outdated.
|
I think we can simplify this - I am just not sure how exactly. Here are a few possibilities:
Right now, this is the rough order of my preference. Happy to discuss all of these approaches! |
@shane-moore it seems claude review stopped working when mentioned |
Review: 1 issue found (2 related inline comments)Subnet service lifecycle transitions don't trigger topic subscribe/unsubscribe events — The old code called |
interesting! will look into this |
|
Test coverage gap in subnet subscription flow The monitor internals ( However, the subnet subscription flow and network-side ENR update path have zero test coverage. To be fair, these paths weren't tested before the refactor either. But the refactor changed their behavior (see inline comments on At minimum, a test that sends a lifecycle transition through the |
If a node restarts during an already-activated fork's grace window, ForkMonitor now correctly emits GracePeriod as the initial lifecycle instead of Normal. The main transition loop only considers future forks (fork_epoch > current_epoch), so this adds an explicit check before the loop. Extracts the logic into detect_grace_period_restart for clarity. Adds boundary tests at activation, activation+1, and grace end.
Add prev_lifecycle field to Network to compare domain types across lifecycle transitions. Only update the ENR when the domain type actually changes, avoiding redundant updates during WarmUp or other transitions that don't change the active domain. Transition logging is owned by the fork monitor; Network only logs when it acts (ENR update).
Generate all transitions for every non-genesis fork unconditionally, sort by slot, then split at current_slot using partition_point. The last past transition becomes the initial lifecycle; remaining are future transitions for the run loop. This removes detect_grace_period_restart and the warmup special-case branch — both are now handled uniformly by the split.
Add step-by-step comments explaining the generate-sort-split algorithm and a prep_slot boundary test to complete restart scenario coverage.
Merge Queue StatusRule:
This pull request spent 11 minutes 17 seconds in the queue, including 9 minutes 28 seconds running CI. Required conditions to merge
|

Issue Addressed
Follows up on PR #820 (fork-aware
observed_peer_subnets). While #820 correctly stores per-fork bitmaps, all query methods still aggregate across forks with a blindunion(). After the Boole grace period ends, peers only subscribed to Alan topics appear useful for Boole subnets — leading to incorrect peer selection.Proposed Changes
ForkLifecycleenum (Normal,WarmUp,GracePeriod) andSharedForkLifecycleshared state in the fork crate, replacingSharedDomainTypeForkMonitorupdates the shared lifecycle state before broadcasting eachForkPhaseeventConnectionManagernow consult the lifecycle: inNormalstate only the current fork's bitmap is considered; duringWarmUp/GracePeriodboth forks are aggregatedSharedDomainType(Discovery, Handshake) now readSharedForkLifecycle::domain_type()insteadNetwork::on_fork_phasesimplified — no longer sets domain type (monitor handles it), only updates ENRAdditional Info
SharedForkLifecyclefollows the sameArc<RwLock<T>>single-writer pattern as theSharedDomainTypeit replacesNetwork::on_fork_phasebecause they require explicit discv5 mutationcount_observed_peers_for_subnetshoists the lifecycle read outside the per-peer loop to avoid repeated lock acquisition