Skip to content

fix(slasher): handle multiple blocks per slot in epoch prune watcher#20088

Merged
PhilWindle merged 1 commit intonextfrom
palla/epoch-prune-checkpoints
Feb 4, 2026
Merged

fix(slasher): handle multiple blocks per slot in epoch prune watcher#20088
PhilWindle merged 1 commit intonextfrom
palla/epoch-prune-checkpoints

Conversation

@spalladino
Copy link
Contributor

Update to e2e test pending.

@spalladino spalladino added the ci-no-fail-fast Sets NO_FAIL_FAST in the CI so the run is not aborted on the first failure label Jan 30, 2026
@PhilWindle PhilWindle added this pull request to the merge queue Feb 4, 2026
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to failed status checks Feb 4, 2026
@PhilWindle PhilWindle added this pull request to the merge queue Feb 4, 2026
Merged via the queue into next with commit de84de5 Feb 4, 2026
23 of 25 checks passed
@PhilWindle PhilWindle deleted the palla/epoch-prune-checkpoints branch February 4, 2026 23:05
github-actions bot pushed a commit that referenced this pull request Feb 4, 2026
ludamad added a commit that referenced this pull request Feb 23, 2026
Slide 19 (§4 insights · PR correlation): two-column layout showing which
PRs caused each weekly flake spike and which fixes produced each recovery:

Spikes:
- W02 (2,647 flakes): Santiago refactors #19532/#19509/#19564 exposed
  timing races across p2p/epoch simultaneously
- W04 (935 flakes): PhilWindle #19982 added cross-chain mbps tests
  without pre-deflaking — valid_epoch_pruned_slash 0→346 events
- W06 (850 flakes): three high-risk PRs merged same day (#20047 peer
  scoring, #20241 max checkpoints→32, #20257 hash constants)

Fixes:
- W03 recovery: Santiago #19914 — checkpointed chain tip for PXE
  (root fix; PXE was using latest not checkpointed block)
- W05 recovery: Santiago #20088 slasher multi-block fix + #20140
  discv5 deflake + GCP step-down (−6 testbed namespaces)
- W07 improvement: Santiago #20351 mbps fix (p2p_client 311→0),
  #20462 remove hardcoded 10s timeout, ludamad #20613 CI parallelism

Also: correct three factual errors spotted during full review —
- Summary: next P50 is growing (+10% in 3 weeks), not stable
- Flake trend W07 note: e2e-p2p-epoch-flakes dropped 373×, not just
  "251 flakes lowest since December"
- Gaps slide: replaced stale "ci_phases broken" card with GCP egress
  costs gap (bc→awk fix is deployed; egress attribution is the gap now)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ci-no-fail-fast Sets NO_FAIL_FAST in the CI so the run is not aborted on the first failure

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants