Skip to content

fix(e2e): use checkpointed chain tip for PXE in epochs and p2p tests#19914

Merged
spalladino merged 1 commit intonextfrom
palla/fix-e2e-pxe-sync-checkpointed
Jan 23, 2026
Merged

fix(e2e): use checkpointed chain tip for PXE in epochs and p2p tests#19914
spalladino merged 1 commit intonextfrom
palla/fix-e2e-pxe-sync-checkpointed

Conversation

@spalladino
Copy link
Contributor

Summary

Fixes flaky test failures in e2e_p2p and e2e_epochs tests caused by transactions being lost when blocks are pruned due to failed checkpoint proposals.

Problem

The CI test e2e_p2p_valid_epoch_pruned_slash (and potentially other epochs/p2p tests) was failing with a timeout error:

TimeoutError: Timeout awaiting isMined
  at retryUntil (../../foundation/dest/retry/index.js:90:19)
  at DeploySentTx.waitForReceipt (../../aztec.js/dest/contract/sent_tx.js:78:16)

Root Cause

The failure sequence was:

  1. Block 2 was built at slot 16 with a deployment transaction
  2. Checkpoint proposal failed L1 validation with HeaderLib__InvalidSlotNumber(17,16) - by the time the proposal was submitted, L1 had already moved to slot 17
  3. Block 2 was pruned when the archiver detected slot 16 wasn't checkpointed:
    WARN: archiver:l1-sync - Pruning blocks after block 1 due to slot 16 not being checkpointed
    
  4. The deployment transaction was lost because it was in the pruned block
  5. PXE was syncing to the "proposed" chain tip (default behavior), so waitForReceipt kept waiting for a transaction that would never be mined

Solution

Configure PXE to sync to the checkpointed chain tip instead of the proposed one in both test contexts:

  • P2PNetworkTest.setup() - used by all e2e_p2p tests
  • EpochsTestContext.setup() - used by all e2e_epochs tests

This uses the newly added syncChainTip PXE config option:

{ syncChainTip: 'checkpointed' }

Why This Fixes the Issue

With syncChainTip: 'checkpointed':

  • PXE only considers blocks that have been successfully submitted to L1 (checkpointed)
  • Blocks that are proposed but not yet checkpointed are not visible to PXE
  • When a checkpoint fails and blocks are pruned, PXE already wasn't tracking those blocks
  • Transactions sent through PXE will use anchor blocks from the checkpointed chain, which are stable and won't be pruned

This provides a more stable view of the chain for tests that involve block pruning scenarios.

Changes

  • end-to-end/src/e2e_p2p/p2p_network.ts: Pass { syncChainTip: 'checkpointed' } to setup()
  • end-to-end/src/e2e_epochs/epochs_test.ts: Pass { syncChainTip: 'checkpointed' } to setup()

🤖 Generated with Claude Code

Fixes flaky test failures in e2e_p2p and e2e_epochs tests caused by
transactions being lost when blocks are pruned due to failed checkpoint
proposals.

Root cause:
- Block N is built for slot S and includes a transaction
- The checkpoint proposal fails L1 validation (e.g., HeaderLib__InvalidSlotNumber)
  because by the time it's submitted, L1 has already moved to slot S+1
- When L1 moves forward and slot S isn't checkpointed, the archiver
  prunes block N
- The transaction in block N is lost, causing waitForReceipt to timeout

Fix:
Configure PXE to sync to the checkpointed chain tip instead of the
proposed one in both P2PNetworkTest and EpochsTestContext. This ensures
the PXE only sees blocks that have been successfully submitted to L1,
providing a stable view of the chain unaffected by pruned blocks.

This uses the newly added syncChainTip PXE config option.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
@spalladino spalladino enabled auto-merge January 23, 2026 22:34
@spalladino spalladino added this pull request to the merge queue Jan 23, 2026
Merged via the queue into next with commit 338868f Jan 23, 2026
16 checks passed
@spalladino spalladino deleted the palla/fix-e2e-pxe-sync-checkpointed branch January 23, 2026 23:14
ludamad added a commit that referenced this pull request Feb 23, 2026
Slide 19 (§4 insights · PR correlation): two-column layout showing which
PRs caused each weekly flake spike and which fixes produced each recovery:

Spikes:
- W02 (2,647 flakes): Santiago refactors #19532/#19509/#19564 exposed
  timing races across p2p/epoch simultaneously
- W04 (935 flakes): PhilWindle #19982 added cross-chain mbps tests
  without pre-deflaking — valid_epoch_pruned_slash 0→346 events
- W06 (850 flakes): three high-risk PRs merged same day (#20047 peer
  scoring, #20241 max checkpoints→32, #20257 hash constants)

Fixes:
- W03 recovery: Santiago #19914 — checkpointed chain tip for PXE
  (root fix; PXE was using latest not checkpointed block)
- W05 recovery: Santiago #20088 slasher multi-block fix + #20140
  discv5 deflake + GCP step-down (−6 testbed namespaces)
- W07 improvement: Santiago #20351 mbps fix (p2p_client 311→0),
  #20462 remove hardcoded 10s timeout, ludamad #20613 CI parallelism

Also: correct three factual errors spotted during full review —
- Summary: next P50 is growing (+10% in 3 weeks), not stable
- Flake trend W07 note: e2e-p2p-epoch-flakes dropped 373×, not just
  "251 flakes lowest since December"
- Gaps slide: replaced stale "ci_phases broken" card with GCP egress
  costs gap (bc→awk fix is deployed; egress attribution is the gap now)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants