Skip to content

chore: port P2P mesh topic deflake fix to v4-next#21825

Merged
ludamad merged 1 commit intobackport-to-v4-next-stagingfrom
claudebox/fix-flaky-duplicate-attestation-slash
Mar 20, 2026
Merged

chore: port P2P mesh topic deflake fix to v4-next#21825
ludamad merged 1 commit intobackport-to-v4-next-stagingfrom
claudebox/fix-flaky-duplicate-attestation-slash

Conversation

@AztecBot
Copy link
Collaborator

@AztecBot AztecBot commented Mar 19, 2026

Summary

Ports the P2P mesh connectivity fix from next to v4-next for the duplicate_attestation_slash and duplicate_proposal_slash e2e tests.

Cherry-picked commit: 8680abcca7 — chore: deflake duplicate proposals and attestations (#20990)

Root Cause

waitForP2PMeshConnectivity only waited for the tx GossipSub topic mesh to form. The slash tests also need block_proposal and checkpoint_proposal meshes ready before sequencers start proposing, otherwise proposals get dropped and offenses are never detected.

Fix

  • Added topics parameter to waitForP2PMeshConnectivity (defaults to [TopicType.tx] for backward compat)
  • Slash tests now wait for all 3 relevant topics before proceeding
  • Also added advanceToEpochBeforeProposer helper so sequencers start before the target epoch arrives

@AztecBot AztecBot added ci-draft Run CI on draft PRs. claudebox Owned by claudebox. it can push to this PR. labels Mar 19, 2026
@AztecBot AztecBot changed the title chore: port duplicate attestation/proposal slash test deflakes to v4-next chore: port P2P mesh topic deflake fix to v4-next Mar 19, 2026
@AztecBot AztecBot force-pushed the claudebox/fix-flaky-duplicate-attestation-slash branch from 2bd13d3 to 3be84bd Compare March 19, 2026 23:12
@spalladino spalladino marked this pull request as ready for review March 20, 2026 01:34
@spalladino spalladino requested a review from nventuro as a code owner March 20, 2026 01:34
Fix flakiness in duplicate_proposal_slash and
duplicate_attestation_slash e2e tests.

Both tests flake at a 3-13% rate because the malicious proposer (1 of 4
validators) is never selected within the timeout window. Three root
causes are addressed:

1. **GossipSub mesh checked only for TX topic** — The tests need block
proposals and checkpoint proposals to propagate, but
`waitForP2PMeshConnectivity` only verified the `tx` topic mesh. Added a
`topics` parameter so callers can specify which topics to wait on. Both
slash tests now wait on `tx`, `block_proposal`, and
`checkpoint_proposal`.

2. **Proposer selection is probabilistic** — With 4 validators and 2
slots/epoch, the malicious proposer has ~25% chance per slot. Added
`awaitEpochWithProposer` helper that advances epochs (via L1 time warp)
until the target proposer is deterministically selected for at least one
slot in the current epoch.

3. **Race between node startup and first proposal** — Nodes started
sequencing immediately upon creation, potentially proposing before the
P2P mesh was ready. Now all nodes are created with `dontStartSequencer:
true`, and sequencers are started simultaneously only after mesh
formation, committee existence, and epoch advancement are confirmed.

Fixes A-593
Fixes A-595
@AztecBot AztecBot force-pushed the claudebox/fix-flaky-duplicate-attestation-slash branch from 3be84bd to 3b502b3 Compare March 20, 2026 02:13
@AztecBot
Copy link
Collaborator Author

Flakey Tests

🤖 says: This CI run detected 1 tests that failed, but were tolerated due to a .test_patterns.yml entry.

\033FLAKED\033 (8;;http://ci.aztec-labs.com/bc2f82ea4a7624fb�bc2f82ea4a7624fb8;;�):  yarn-project/end-to-end/scripts/run_test.sh simple src/e2e_epochs/epochs_l1_reorgs.parallel.test.ts "updates L1 to L2 messages changed due to an L1 reorg" (67s) (code: 0) group:e2e-p2p-epoch-flakes

@ludamad ludamad merged commit c9513dd into backport-to-v4-next-staging Mar 20, 2026
9 checks passed
@ludamad ludamad deleted the claudebox/fix-flaky-duplicate-attestation-slash branch March 20, 2026 02:56
AztecBot added a commit that referenced this pull request Mar 21, 2026
BEGIN_COMMIT_OVERRIDE
chore: backport #21754 (feat!: make isContractInitialized a tri-state
enum) to v4-next (#21792)
fix(stdlib): zero-pad bufferFromFields when declared length exceeds
payload (#21802)
test(protocol-contracts): verify max-size bytecode fits in contract
class log (#21818)
chore: port P2P mesh topic deflake fix to v4-next (#21825)
fix(archiver): throw on duplicate contract class or instance additions
(#21799)
feat: sync poseidon in the browser (#21833)
chore: backport #21824 (fix(aztec-up): add sensible defaults to
installer y/n prompts) to v4-next (#21844)
fix(sequencer): backport wall-clock time for slot estimation to v4-next
(#21769) (#21847)
chore: backport PR #21788 (feat(p2p): add tx validation for contract
class id verification) to v4-next (#21852)
feat: sync poseidon browser (#21851)
END_COMMIT_OVERRIDE
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ci-draft Run CI on draft PRs. claudebox Owned by claudebox. it can push to this PR.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants