Skip to content

feat: replace SharedDomainType with SharedForkLifecycle for fork-aware peer selection#824

Merged
mergify[bot] merged 20 commits intosigp:unstablefrom
diegomrsantos:feat/shared-fork-lifecycle
Feb 27, 2026
Merged

feat: replace SharedDomainType with SharedForkLifecycle for fork-aware peer selection#824
mergify[bot] merged 20 commits intosigp:unstablefrom
diegomrsantos:feat/shared-fork-lifecycle

Conversation

@diegomrsantos
Copy link
Member

Issue Addressed

Follows up on PR #820 (fork-aware observed_peer_subnets). While #820 correctly stores per-fork bitmaps, all query methods still aggregate across forks with a blind union(). After the Boole grace period ends, peers only subscribed to Alan topics appear useful for Boole subnets — leading to incorrect peer selection.

Proposed Changes

  • Introduce ForkLifecycle enum (Normal, WarmUp, GracePeriod) and SharedForkLifecycle shared state in the fork crate, replacing SharedDomainType
  • ForkMonitor updates the shared lifecycle state before broadcasting each ForkPhase event
  • Peer selection queries in ConnectionManager now consult the lifecycle: in Normal state only the current fork's bitmap is considered; during WarmUp/GracePeriod both forks are aggregated
  • All components that previously read SharedDomainType (Discovery, Handshake) now read SharedForkLifecycle::domain_type() instead
  • Network::on_fork_phase simplified — no longer sets domain type (monitor handles it), only updates ENR

Additional Info

@claude-code-actions-sigp

This comment was marked as outdated.

diegomrsantos and others added 4 commits February 11, 2026 13:13
Add a new lifecycle module to the fork crate that models fork transition
states as a single enum with three variants:

- Normal: operating on a single fork (pre-fork or post-grace-period)
- WarmUp: preparing for an upcoming fork (dual-subscribing to new topics)
- GracePeriod: fork activated but keeping old subscriptions for late messages

SharedForkLifecycle wraps this in Arc<RwLock<T>> for cross-component
sharing, following the same pattern as the SharedDomainType it will
replace. Each variant carries domain_type, making invalid states
unrepresentable.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
ForkMonitor now accepts a SharedForkLifecycle and updates it right
before broadcasting each ForkPhase event:

- Preparing  → WarmUp { current, upcoming, domain_type }
- Activated  → GracePeriod { current, previous, domain_type }
- GracePeriodEnded → Normal { current, domain_type }

This makes ForkMonitor the single writer of fork lifecycle state.
Components that previously needed to react to events just to cache
derived values can now read the shared state directly.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Replace the SharedDomainType abstraction with SharedForkLifecycle
across all network components, making peer selection fork-aware.

Key changes:
- Remove SharedDomainType from network crate entirely
- Discovery and Handshake read domain_type via fork_lifecycle
- Network::on_fork_phase simplified to only update ENR (domain type
  updates are now handled by ForkMonitor via SharedForkLifecycle)
- ConnectionManager uses fork lifecycle to filter peer subnets:
  in Normal state only the current fork's bitmap is returned;
  during WarmUp/GracePeriod both forks are aggregated
- Client creates SharedForkLifecycle and passes to both ForkMonitor
  and Network

This prevents peers subscribed only to a defunct fork from appearing
useful for subnet coverage after the grace period ends.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@diegomrsantos diegomrsantos force-pushed the feat/shared-fork-lifecycle branch from 47555e3 to 50378e6 Compare February 11, 2026 16:14
@diegomrsantos diegomrsantos marked this pull request as draft February 11, 2026 16:14
@diegomrsantos diegomrsantos marked this pull request as ready for review February 11, 2026 16:14
@diegomrsantos diegomrsantos self-assigned this Feb 11, 2026
@diegomrsantos diegomrsantos marked this pull request as draft February 11, 2026 16:21
@diegomrsantos diegomrsantos marked this pull request as ready for review February 11, 2026 16:21
@diegomrsantos
Copy link
Member Author

@claude review this PR

@sigp sigp deleted a comment from claude-code-actions-sigp bot Feb 11, 2026
@sigp sigp deleted a comment from claude-code-actions-sigp bot Feb 11, 2026
@claude-code-actions-sigp

This comment was marked as outdated.

@diegomrsantos diegomrsantos marked this pull request as draft February 11, 2026 19:15
@diegomrsantos diegomrsantos marked this pull request as ready for review February 11, 2026 19:15
@claude-code-actions-sigp

This comment was marked as outdated.

1 similar comment
@claude-code-actions-sigp
Copy link

Claude Code is working…

I'll analyze this and get back to you.

View job run

@diegomrsantos diegomrsantos marked this pull request as draft February 11, 2026 19:46
@diegomrsantos diegomrsantos marked this pull request as ready for review February 11, 2026 19:46
@claude-code-actions-sigp

This comment was marked as outdated.

///
/// Both forks' contexts are relevant — peers subscribed to either
/// the current or upcoming fork are useful.
WarmUp {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about collapsing WarmUp and GracePeriod variants into something like

Transitioning {
   current,
   domain_type
}

just an idea since consumers don't seem to care whether they're in WarmUp or GracePeriod, just that current and domain_type are correct. and upcoming/previous fields seem unused

) -> Option<Bitfield<Fixed<U128>>> {
let fork_map = self.observed_peer_subnets.get(peer)?;
Some(Self::aggregate_fork_bitmaps(fork_map))
match lifecycle {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

when fork_map is empty (newly connected peer, no gossipsub yet), Normal returns None (triggering ENR fallback) but WarmUp/GracePeriod returns Some(all-zeros) (skipping ENR fallback). Adding an early if fork_map.is_empty() { return None } would make the behavior consistent


/// Fork lifecycle state. Updated only by ForkMonitor.
#[derive(Clone, Debug, PartialEq, Eq)]
pub enum ForkLifecycle {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thinking of ways we could simplify this further @dknopik. I think one way would be to convert this enum to a flat struct:

pub struct ForkState {
    pub domain_type: DomainType,
    pub current_fork: Fork,
    pub in_transition: bool,
}

since network components seem to just care about what is the current domain type and whether the network is in a fork transition period (warm up/grace period) vs if normal operation

another idea would just be to extend ForkSchedule with a current field:

  pub struct ForkSchedule {
      configs: BTreeMap<Fork, ForkConfig>,
      network_name: String,
      // Runtime: updated by ForkMonitor, read by
  network components
      current: RwLock<ActiveFork>,
  }

  struct ActiveFork {
      fork: Fork,
      in_transition: bool,
  }

benefit of this is killing need for new ForkState module. con is that it mixes config (current ForkSchedule struct) with runtime characteristics (new current field)

or perhaps you wanted to take a look and come up with something. if so, feel free to lmk if you want me to try out whatever you're thinking

Copy link
Member Author

@diegomrsantos diegomrsantos Feb 23, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good call on simplification. I’d keep runtime state separate from ForkSchedule (it’s config-only today), so I don’t think adding mutable current there is the right direction.
I’m fine simplifying the consumer API with an is_transition() helper, but I’d keep ForkLifecycle as an enum for now so we retain WarmUp vs GracePeriod semantics if/when we need them

@claude-code-actions-sigp

This comment was marked as outdated.

@dknopik
Copy link
Member

dknopik commented Feb 20, 2026

I think we can simplify this - I am just not sure how exactly. Here are a few possibilities:

  1. Roll back fix: share domain type across network components to fix handshake after fork activation #814. Considering the original motivation was a bugfix for keeping the current domain type updated, this has grown quite complex - hundreds of lines and a lot of code to test. Instead, I think the original approach is way simpler and at this point less error prone.
  2. Kind of this approach, but less complex: I think having a RwLock for the current ForkConfig is simple enough. Any parts of the code interested in the lifecycle can use the existing broadcast channel.
  3. Finally, avoid the shared state, but instead pass around the ForkSchedule to parts of the code. Or maybe a new wrapper that brings along a SlotClock for easier querying of the current fork.

Right now, this is the rough order of my preference. Happy to discuss all of these approaches!

@diegomrsantos
Copy link
Member Author

@claude review the current PR

@shane-moore it seems claude review stopped working when mentioned

@diegomrsantos diegomrsantos marked this pull request as draft February 26, 2026 01:22
@diegomrsantos diegomrsantos marked this pull request as ready for review February 26, 2026 01:23
@claude-code-actions-sigp
Copy link

Review: 1 issue found (2 related inline comments)

Subnet service lifecycle transitions don't trigger topic subscribe/unsubscribe events — The old code called handle_subnet_changes after every ForkPhase event and explicitly called send_unsubscribes when removing a fork. The new lifecycle_rx.changed() handler only updates ServiceState via on_lifecycle_transition but never emits the actual gossipsub Join/Leave events. Additionally, set_subscribed_forks silently drops removed forks' currently_subscribed data via retain, losing the information needed to unsubscribe from old topics.

@shane-moore
Copy link
Member

@claude review the current PR

@shane-moore it seems claude review stopped working when mentioned

interesting! will look into this

@diegomrsantos
Copy link
Member Author

Test coverage gap in subnet subscription flow

The monitor internals (ForkMonitor::new, run) have solid test coverage — the pre-computed transition model is well verified.

However, the subnet subscription flow and network-side ENR update path have zero test coverage. To be fair, these paths weren't tested before the refactor either. But the refactor changed their behavior (see inline comments on subscriptions.rs:100 about the missing handle_subnet_changes call and the retain data loss), which makes the lack of coverage riskier now — there's no safety net to catch the regressions.

At minimum, a test that sends a lifecycle transition through the watch channel and asserts on the TopicEvents coming out of the mpsc receiver would catch both issues. The setup is straightforward since all inputs (db, lifecycle_rx, slot_clock) are already injectable.

If a node restarts during an already-activated fork's grace window,
ForkMonitor now correctly emits GracePeriod as the initial lifecycle
instead of Normal. The main transition loop only considers future
forks (fork_epoch > current_epoch), so this adds an explicit check
before the loop.

Extracts the logic into detect_grace_period_restart for clarity.
Adds boundary tests at activation, activation+1, and grace end.
Add prev_lifecycle field to Network to compare domain types across
lifecycle transitions. Only update the ENR when the domain type
actually changes, avoiding redundant updates during WarmUp or
other transitions that don't change the active domain.

Transition logging is owned by the fork monitor; Network only logs
when it acts (ENR update).
@diegomrsantos diegomrsantos marked this pull request as draft February 26, 2026 19:23
@diegomrsantos diegomrsantos marked this pull request as ready for review February 26, 2026 19:23
Generate all transitions for every non-genesis fork unconditionally,
sort by slot, then split at current_slot using partition_point. The
last past transition becomes the initial lifecycle; remaining are
future transitions for the run loop.

This removes detect_grace_period_restart and the warmup special-case
branch — both are now handled uniformly by the split.
Add step-by-step comments explaining the generate-sort-split
algorithm and a prep_slot boundary test to complete restart
scenario coverage.
Copy link
Member

@dknopik dknopik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks!

@dknopik dknopik added ready-for-merge v2.0.0 The release shipping the next network upgrade labels Feb 27, 2026
@mergify mergify bot added the queued label Feb 27, 2026
@mergify
Copy link

mergify bot commented Feb 27, 2026

Merge Queue Status

Rule: default


This pull request spent 11 minutes 17 seconds in the queue, including 9 minutes 28 seconds running CI.

Required conditions to merge

mergify bot added a commit that referenced this pull request Feb 27, 2026
@mergify mergify bot merged commit 683c197 into sigp:unstable Feb 27, 2026
22 checks passed
@mergify mergify bot removed the queued label Feb 27, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ready-for-merge v2.0.0 The release shipping the next network upgrade

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants