fix(e2e): use checkpointed chain tip for PXE in epochs and p2p tests by spalladino · Pull Request #19914 · AztecProtocol/aztec-packages

spalladino · 2026-01-23T22:18:06Z

Summary

Fixes flaky test failures in e2e_p2p and e2e_epochs tests caused by transactions being lost when blocks are pruned due to failed checkpoint proposals.

Problem

The CI test e2e_p2p_valid_epoch_pruned_slash (and potentially other epochs/p2p tests) was failing with a timeout error:

TimeoutError: Timeout awaiting isMined
  at retryUntil (../../foundation/dest/retry/index.js:90:19)
  at DeploySentTx.waitForReceipt (../../aztec.js/dest/contract/sent_tx.js:78:16)

Root Cause

The failure sequence was:

Block 2 was built at slot 16 with a deployment transaction
Checkpoint proposal failed L1 validation with HeaderLib__InvalidSlotNumber(17,16) - by the time the proposal was submitted, L1 had already moved to slot 17

Block 2 was pruned when the archiver detected slot 16 wasn't checkpointed:

WARN: archiver:l1-sync - Pruning blocks after block 1 due to slot 16 not being checkpointed

The deployment transaction was lost because it was in the pruned block
PXE was syncing to the "proposed" chain tip (default behavior), so waitForReceipt kept waiting for a transaction that would never be mined

Solution

Configure PXE to sync to the checkpointed chain tip instead of the proposed one in both test contexts:

P2PNetworkTest.setup() - used by all e2e_p2p tests
EpochsTestContext.setup() - used by all e2e_epochs tests

This uses the newly added syncChainTip PXE config option:

{ syncChainTip: 'checkpointed' }

Why This Fixes the Issue

With syncChainTip: 'checkpointed':

PXE only considers blocks that have been successfully submitted to L1 (checkpointed)
Blocks that are proposed but not yet checkpointed are not visible to PXE
When a checkpoint fails and blocks are pruned, PXE already wasn't tracking those blocks
Transactions sent through PXE will use anchor blocks from the checkpointed chain, which are stable and won't be pruned

This provides a more stable view of the chain for tests that involve block pruning scenarios.

Changes

end-to-end/src/e2e_p2p/p2p_network.ts: Pass { syncChainTip: 'checkpointed' } to setup()
end-to-end/src/e2e_epochs/epochs_test.ts: Pass { syncChainTip: 'checkpointed' } to setup()

🤖 Generated with Claude Code

Fixes flaky test failures in e2e_p2p and e2e_epochs tests caused by transactions being lost when blocks are pruned due to failed checkpoint proposals. Root cause: - Block N is built for slot S and includes a transaction - The checkpoint proposal fails L1 validation (e.g., HeaderLib__InvalidSlotNumber) because by the time it's submitted, L1 has already moved to slot S+1 - When L1 moves forward and slot S isn't checkpointed, the archiver prunes block N - The transaction in block N is lost, causing waitForReceipt to timeout Fix: Configure PXE to sync to the checkpointed chain tip instead of the proposed one in both P2PNetworkTest and EpochsTestContext. This ensures the PXE only sees blocks that have been successfully submitted to L1, providing a stable view of the chain unaffected by pruned blocks. This uses the newly added syncChainTip PXE config option. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

Slide 19 (§4 insights · PR correlation): two-column layout showing which PRs caused each weekly flake spike and which fixes produced each recovery: Spikes: - W02 (2,647 flakes): Santiago refactors #19532/#19509/#19564 exposed timing races across p2p/epoch simultaneously - W04 (935 flakes): PhilWindle #19982 added cross-chain mbps tests without pre-deflaking — valid_epoch_pruned_slash 0→346 events - W06 (850 flakes): three high-risk PRs merged same day (#20047 peer scoring, #20241 max checkpoints→32, #20257 hash constants) Fixes: - W03 recovery: Santiago #19914 — checkpointed chain tip for PXE (root fix; PXE was using latest not checkpointed block) - W05 recovery: Santiago #20088 slasher multi-block fix + #20140 discv5 deflake + GCP step-down (−6 testbed namespaces) - W07 improvement: Santiago #20351 mbps fix (p2p_client 311→0), #20462 remove hardcoded 10s timeout, ludamad #20613 CI parallelism Also: correct three factual errors spotted during full review — - Summary: next P50 is growing (+10% in 3 weeks), not stable - Flake trend W07 note: e2e-p2p-epoch-flakes dropped 373×, not just "251 flakes lowest since December" - Gaps slide: replaced stale "ci_phases broken" card with GCP egress costs gap (bc→awk fix is deployed; egress attribution is the gap now)

ludamad approved these changes Jan 23, 2026

View reviewed changes

spalladino enabled auto-merge January 23, 2026 22:34

spalladino added this pull request to the merge queue Jan 23, 2026

Merged via the queue into next with commit 338868f Jan 23, 2026
16 checks passed

spalladino deleted the palla/fix-e2e-pxe-sync-checkpointed branch January 23, 2026 23:14

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(e2e): use checkpointed chain tip for PXE in epochs and p2p tests#19914

fix(e2e): use checkpointed chain tip for PXE in epochs and p2p tests#19914
spalladino merged 1 commit intonextfrom
palla/fix-e2e-pxe-sync-checkpointed

spalladino commented Jan 23, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

spalladino commented Jan 23, 2026

Summary

Problem

Root Cause

Solution

Why This Fixes the Issue

Changes

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants