Skip to content

fix: Deflake discv5 test#20140

Merged
AztecBot merged 1 commit intonextfrom
pw/fix-discv5-flake
Feb 3, 2026
Merged

fix: Deflake discv5 test#20140
AztecBot merged 1 commit intonextfrom
pw/fix-discv5-flake

Conversation

@PhilWindle
Copy link
Collaborator

This PR attempts to deflake the discv5 test.

* This is more resilient than fixed iteration loops as it adapts to varying DHT propagation times.
*/
const runDiscoveryUntil = async (nodes: DiscV5Service[], condition: () => boolean, timeout = 60, interval = 0.2) => {
await sleep(DISCV5_START_DELAY_MS);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this needed though? Since we're now running this in a retryUntil, we'll just run a couple more iterations of this, right?

This PR attempts to deflake the discv5 test.
@AztecBot AztecBot force-pushed the pw/fix-discv5-flake branch from c6ecac1 to e51cb7f Compare February 3, 2026 17:59
@AztecBot AztecBot added this pull request to the merge queue Feb 3, 2026
Merged via the queue into next with commit 2a9dd27 Feb 3, 2026
17 checks passed
@AztecBot AztecBot deleted the pw/fix-discv5-flake branch February 3, 2026 18:48
ludamad added a commit that referenced this pull request Feb 23, 2026
Slide 19 (§4 insights · PR correlation): two-column layout showing which
PRs caused each weekly flake spike and which fixes produced each recovery:

Spikes:
- W02 (2,647 flakes): Santiago refactors #19532/#19509/#19564 exposed
  timing races across p2p/epoch simultaneously
- W04 (935 flakes): PhilWindle #19982 added cross-chain mbps tests
  without pre-deflaking — valid_epoch_pruned_slash 0→346 events
- W06 (850 flakes): three high-risk PRs merged same day (#20047 peer
  scoring, #20241 max checkpoints→32, #20257 hash constants)

Fixes:
- W03 recovery: Santiago #19914 — checkpointed chain tip for PXE
  (root fix; PXE was using latest not checkpointed block)
- W05 recovery: Santiago #20088 slasher multi-block fix + #20140
  discv5 deflake + GCP step-down (−6 testbed namespaces)
- W07 improvement: Santiago #20351 mbps fix (p2p_client 311→0),
  #20462 remove hardcoded 10s timeout, ludamad #20613 CI parallelism

Also: correct three factual errors spotted during full review —
- Summary: next P50 is growing (+10% in 3 weeks), not stable
- Flake trend W07 note: e2e-p2p-epoch-flakes dropped 373×, not just
  "251 flakes lowest since December"
- Gaps slide: replaced stale "ci_phases broken" card with GCP egress
  costs gap (bc→awk fix is deployed; egress attribution is the gap now)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants