Conversation
bfe83fc to
4a23ba3
Compare
| | `tx` | Unpredictable | N/A | P3/P3b disabled | | ||
| | `block_proposal` | N-1 | 3 slots | N = blocks per slot (MBPS mode) | | ||
| | `checkpoint_proposal` | 1 | 5 slots | One per slot | | ||
| | `checkpoint_attestation` | C (~48) | 2 slots | C = committee size | |
There was a problem hiding this comment.
Could this expectation be too high? I'm just thinking if a percentage of validators are non-responsive then we would penalize honest peers through no fault of their own.
There was a problem hiding this comment.
These are just the numbers expressing the ideal scenario - more on penalization for under-delivery can be found here: https://github.com/AztecProtocol/aztec-packages/blob/feature/peer-scoring/yarn-project/p2p/src/services/gossipsub/README.md#how-p3-handles-under-delivery
There was a problem hiding this comment.
And the main impact for underdelivering peers is the will be pruned from the mesh
There was a problem hiding this comment.
And one more remark: to actually get penalized for under-delivery the score for the topic need to be below the threshold: https://github.com/AztecProtocol/aztec-packages/blob/feature/peer-scoring/yarn-project/p2p/src/services/gossipsub/README.md#threshold-calculation
Currently that is 30% of expected score that is calculated over 5 slots.
yarn-project/p2p/src/config.ts
Outdated
| 'Whether to run in fisherman mode: validates all proposals and attestations but does not broadcast attestations or participate in consensus.', | ||
| ...booleanConfigHelper(false), | ||
| }, | ||
| blockDurationMs: { |
There was a problem hiding this comment.
I don't think this should be duplicated here. Can this env var mapping be moved from SequencerClientConfig to SequencerConfig (in stdlib) and then Pick<> into this config?
fc701ef to
6ef0c21
Compare
3f01630 to
ebb9b04
Compare
19ac2c7 to
f7c6e8e
Compare
alexghr
left a comment
There was a problem hiding this comment.
Looks good to me. I'll let @PhilWindle to do the final approval
yarn-project/archiver/src/factory.ts
Outdated
| @@ -100,6 +100,7 @@ export async function createArchiver( | |||
| slotDuration, | |||
| ethereumSlotDuration, | |||
| proofSubmissionEpochs: Number(proofSubmissionEpochs), | |||
| targetCommitteeSize: config.aztecTargetCommitteeSize, | |||
There was a problem hiding this comment.
I think this should be read from the rollup. See above code block
const [l1StartBlock, l1GenesisTime, proofSubmissionEpochs, genesisArchiveRoot, slashingProposerAddress] =
await Promise.all([
rollup.getL1StartBlock(),
rollup.getL1GenesisTime(),
rollup.getProofSubmissionEpochs(),
rollup.getGenesisArchiveTreeRoot(),
rollup.getSlashingProposerAddress(),
] as const);
There was a problem hiding this comment.
good catch! fixed
a3b04c4 to
1041ec6
Compare
# Gossipsub Peer Scoring ## Summary This PR implements comprehensive gossipsub peer scoring improvements for the Aztec P2P network: - Balanced P1/P2/P3 configuration following Lodestar's approach (P3 > P1+P2 for mesh pruning) - Dynamic per-topic scoring parameters based on expected message rates - Tightened gossipsub score thresholds aligned with application-level scoring - Documented application score weight for gossipsub integration - Reviewed and documented application-level penalties - Network outage analysis showing non-contributing peers are pruned but not disconnected ## Motivation Previously, all gossipsub topics used identical hardcoded scoring parameters. This doesn't account for the vastly different message frequencies across topics: - **Transactions**: Unpredictable rate - **Block proposals**: N-1 per slot (where N = blocks per slot in MBPS mode) - **Checkpoint proposals**: 1 per slot - **Checkpoint attestations**: ~48 per slot (committee size) Additionally, the gossipsub thresholds were borrowed from Lighthouse (Ethereum beacon chain) and were too lax for our scoring system. A banned peer (app score -100) only contributed -1000 to gossipsub, far above the -4000 gossipThreshold, so **banned peers still received gossip**. ## Changes ### New Shared Module: `@aztec/stdlib/timetable` Created a shared timetable constants module that both `p2p` and `sequencer-client` import from: - `CHECKPOINT_INITIALIZATION_TIME` (1s) - `CHECKPOINT_ASSEMBLE_TIME` (1s) - `DEFAULT_P2P_PROPAGATION_TIME` (2s) - `DEFAULT_L1_PUBLISHING_TIME` (12s) - `MIN_EXECUTION_TIME` (2s) - `calculateMaxBlocksPerSlot()` - shared calculation for blocks per slot ### Added `targetCommitteeSize` to `L1RollupConstants` The committee size is needed to calculate expected attestation rates. Added to: - `L1RollupConstants` type and schema - `EpochCache.create()` to fetch from rollup contract - `EpochCacheInterface.getL1Constants()` method ### New Topic Scoring Module: `@aztec/p2p/services/gossipsub/topic_score_params.ts` Implements dynamic scoring parameter calculation with balanced P1/P2/P3 configuration following Lodestar's approach: | Parameter | Max Score | Configuration | |-----------|-----------|---------------| | **P1: timeInMesh** | +8 per topic | Slot-based, caps at 1 hour | | **P2: firstMessageDeliveries** | +25 per topic | Convergence-based, fast decay | | **P3: meshMessageDeliveries** | -34 per topic | Must exceed P1+P2 for pruning | | **P3b: meshFailurePenalty** | -34 per topic | Sticky penalty after pruning | | **P4: invalidMessageDeliveries** | -20 per message | Attack detection | | Topic | Expected/Slot | Decay Window | P1/P2/P3 | |-------|--------------|--------------|----------| | `tx` | Unpredictable | N/A | **Disabled** (only P4) | | `block_proposal` | N-1 (MBPS) | 3 slots | Enabled | | `checkpoint_proposal` | 1 | 5 slots | Enabled | | `checkpoint_attestation` | ~48 | 2 slots | Enabled | Key features: - **Score balance for mesh pruning**: P3 max (-34) exceeds P1+P2 max (+33), ensuring non-contributors get pruned - **No free positive scores**: tx topic has P1/P2 disabled to prevent offsetting penalties from other topics - **P3b total**: -102 across 3 topics (well above -500 gossipThreshold, so network issues don't cause disconnection) - **Multi-slot decay windows**: Low-frequency topics decay over more slots to accumulate meaningful counter values - **Conservative thresholds**: Set at 30% of convergence to avoid penalizing honest peers - **5-second delivery window**: Balanced for TypeScript runtime (between Go implementations at 2s and Lodestar at 12s); accounts for JavaScript I/O latency while limiting replay attacks - **5× activation multiplier**: Extra grace period during network bootstrap (activation timer starts at mesh join, not first message) ### Tightened Gossipsub Thresholds Updated `scoring.ts` with thresholds aligned to application-level scoring: | Threshold | Old Value | New Value | Alignment | |-----------|-----------|-----------|-----------| | gossipThreshold | -4000 | -500 | Matches Disconnect state (-50 × 10) | | publishThreshold | -8000 | -1000 | Matches Ban state (-100 × 10) | | graylistThreshold | -16000 | -2000 | For severe attacks (ban + topic penalties) | The 1:2:4 ratio follows Lodestar's approach and gossipsub spec recommendations. ### Application Score Weight Verified `appSpecificWeight = 10` creates perfect alignment: - Disconnect (-50) × 10 = -500 = gossipThreshold - Ban (-100) × 10 = -1000 = publishThreshold Added documentation in `libp2p_service.ts` explaining this alignment. ### Application Penalties The existing penalties are well-designed and unchanged: | Severity | Points | Errors to Disconnect | Errors to Ban | |----------|--------|----------------------|---------------| | HighToleranceError | 2 | 25 | 50 | | MidToleranceError | 10 | 5 | 10 | | LowToleranceError | 50 | 1 | 2 | Added documentation in `peer_scoring.ts` explaining the alignment with gossipsub thresholds. ## How the Systems Work Together ### Score Flow ``` Total Gossipsub Score = TopicScore + (AppScore × 10) + IPColocationPenalty ``` ### Peer State Alignment | App Score State | App Score | Gossipsub Contribution | Effect | |-----------------|-----------|------------------------|--------| | Healthy | 0 to -49 | 0 to -490 | Full participation | | Disconnect | -50 | -500 | Stops receiving gossip | | Ban | -100 | -1000 | Cannot publish | | Attack | -100 + P4 | -2000+ | Graylisted | ### Topic Score Contribution Topic scores are balanced for mesh pruning while allowing recovery from network issues: | Parameter | Per Topic | Total (3 topics) | Notes | |-----------|-----------|------------------|-------| | P1 (timeInMesh) | +8 max | +24 | Caps at 1 hour, resets on mesh leave | | P2 (firstMessageDeliveries) | +25 max | +75 | Fast decay, negligible after mesh leave | | P3 (under-delivery) | -34 max | -102 | Must exceed P1+P2 (+33) for pruning | | P4 (invalid messages) | -20 each | Unlimited | Can spike to -2000+ during attacks | **Key insight**: P3 max (-34) > P1+P2 max (+33), so non-contributors are always pruned regardless of how long they've been in mesh. **After pruning**: P3b = -102 total, which is well above gossipThreshold (-500), so network issues don't cause disconnection. ### Example Scenarios 1. **Honest peer**: Score ~0, full participation 2. **Validation failures**: Gets LowToleranceError → app score -50 → stops receiving gossip 3. **Banned peer**: App score -100 → cannot publish messages 4. **Active attack**: Banned + 10 invalid messages → -3000+ → graylisted ## Technical Details ### Decay Calculation Counters decay to ~1% over the decay window: ``` heartbeatsPerSlot = slotDurationMs / heartbeatIntervalMs heartbeatsInWindow = heartbeatsPerSlot * decayWindowSlots decay = 0.01^(1 / heartbeatsInWindow) ``` ### Convergence and Threshold Steady-state counter value and conservative threshold: ``` messagesPerHeartbeat = expectedPerSlot * (heartbeatMs / slotDurationMs) convergence = messagesPerHeartbeat / (1 - decay) threshold = convergence * 0.3 // 30% conservative factor ``` ### Blocks Per Slot Calculated from timetable constants (same formula used by sequencer): ``` timeAvailable = slotDuration - initOffset - blockDuration - finalizationTime blocksPerSlot = floor(timeAvailable / blockDuration) ``` ## Files Changed ### New Files - `yarn-project/stdlib/src/timetable/index.ts` - Shared timetable constants - `yarn-project/stdlib/src/config/sequencer-config.ts` - Shared sequencer config mappings (e.g., `blockDurationMs`) - `yarn-project/p2p/src/services/gossipsub/topic_score_params.ts` - Topic scoring logic - `yarn-project/p2p/src/services/gossipsub/topic_score_params.test.ts` - Unit tests for scoring params - `yarn-project/p2p/src/services/gossipsub/index.ts` - Module exports - `yarn-project/p2p/src/services/gossipsub/README.md` - Documentation ### Modified Files - `yarn-project/stdlib/src/epoch-helpers/index.ts` - Added `targetCommitteeSize` - `yarn-project/stdlib/package.json` - Added timetable export - `yarn-project/epoch-cache/src/epoch_cache.ts` - Fetch committee size, add `getL1Constants()` - `yarn-project/p2p/src/config.ts` - Added `blockDurationMs` to P2P config via `Pick<SequencerConfig, 'blockDurationMs'>` (uses shared mapping from `@aztec/stdlib/config`) - `yarn-project/p2p/src/services/libp2p/libp2p_service.ts` - Use dynamic topic params, pass `blockDurationMs` from config, added appSpecificWeight documentation - `yarn-project/p2p/src/services/gossipsub/scoring.ts` - Updated thresholds with documentation - `yarn-project/p2p/src/services/peer-manager/peer_scoring.ts` - Added alignment documentation - `yarn-project/sequencer-client/src/config.ts` - Import timetable constants and shared sequencer config mappings from stdlib - `yarn-project/sequencer-client/src/sequencer/timetable.ts` - Import from stdlib - `yarn-project/archiver/src/factory.ts` - Include `targetCommitteeSize` - Test files updated with `targetCommitteeSize` and `getL1Constants` mocks ## Testing - All existing tests pass - Comprehensive unit tests for `topic_score_params.ts` (46 tests) verify: - `calculateBlocksPerSlot` - single block mode and MBPS mode - `getDecayWindowSlots` - frequency-based decay window selection - `computeDecay` - mathematical correctness (decays to ~1% over window) - `computeConvergence` - geometric series formula - `computeThreshold` - conservative threshold calculation - `getExpectedMessagesPerSlot` - per-topic expected rates - `TopicScoreParamsFactory` - shared value computation, per-topic params - Mathematical properties - decay, convergence, penalty calculations - Realistic network scenarios - checkpoint_proposal and checkpoint_attestation configs - **P1/P2/P3 score balance** - verifies max scores, non-contributor pruning, P3b limits ## Documentation Added comprehensive README at `yarn-project/p2p/src/services/gossipsub/README.md` covering: - Gossipsub scoring overview - P1-P4 parameters explained with Lodestar-style normalization - P1 slot-based configuration (caps at 1 hour) - P2 convergence-based configuration (fast decay) - P3 weight formula ensuring max penalty = -34 per topic - Score balance: P3 (-34) > P1+P2 (+33) for mesh pruning - Decay mechanics and multi-slot windows - Threshold calculations - Per-topic configuration rationale (tx topic has P1/P2/P3 disabled) - Tuning guidelines - Global score thresholds and their alignment with application scoring - Non-contributing peers analysis (why they're not disconnected, mesh pruning behavior) - **Network outage analysis** (what happens during connectivity loss, recovery timeline) - Application-level penalties (what triggers each severity level) - Score calculation examples (6 detailed scenarios from honest peer to attack recovery) Fixes A-265
1041ec6 to
44b9068
Compare
Flakey Tests🤖 says: This CI run detected 2 tests that failed, but were tolerated due to a .test_patterns.yml entry. |
Slide 19 (§4 insights · PR correlation): two-column layout showing which PRs caused each weekly flake spike and which fixes produced each recovery: Spikes: - W02 (2,647 flakes): Santiago refactors #19532/#19509/#19564 exposed timing races across p2p/epoch simultaneously - W04 (935 flakes): PhilWindle #19982 added cross-chain mbps tests without pre-deflaking — valid_epoch_pruned_slash 0→346 events - W06 (850 flakes): three high-risk PRs merged same day (#20047 peer scoring, #20241 max checkpoints→32, #20257 hash constants) Fixes: - W03 recovery: Santiago #19914 — checkpointed chain tip for PXE (root fix; PXE was using latest not checkpointed block) - W05 recovery: Santiago #20088 slasher multi-block fix + #20140 discv5 deflake + GCP step-down (−6 testbed namespaces) - W07 improvement: Santiago #20351 mbps fix (p2p_client 311→0), #20462 remove hardcoded 10s timeout, ludamad #20613 CI parallelism Also: correct three factual errors spotted during full review — - Summary: next P50 is growing (+10% in 3 weeks), not stable - Flake trend W07 note: e2e-p2p-epoch-flakes dropped 373×, not just "251 flakes lowest since December" - Gaps slide: replaced stale "ci_phases broken" card with GCP egress costs gap (bc→awk fix is deployed; egress attribution is the gap now)
Gossipsub Peer Scoring
Summary
This PR implements comprehensive gossipsub peer scoring improvements for the Aztec P2P network:
Motivation
Previously, all gossipsub topics used identical hardcoded scoring parameters. This doesn't account for the vastly different message frequencies across topics:
Additionally, the gossipsub thresholds were borrowed from Lighthouse (Ethereum beacon chain) and were too lax for our scoring system. A banned peer (app score -100) only contributed -1000 to gossipsub, far above the -4000 gossipThreshold, so banned peers still received gossip.
Changes
New Shared Module:
@aztec/stdlib/timetableCreated a shared timetable constants module that both
p2pandsequencer-clientimport from:CHECKPOINT_INITIALIZATION_TIME(1s)CHECKPOINT_ASSEMBLE_TIME(1s)DEFAULT_P2P_PROPAGATION_TIME(2s)DEFAULT_L1_PUBLISHING_TIME(12s)MIN_EXECUTION_TIME(2s)calculateMaxBlocksPerSlot()- shared calculation for blocks per slotAdded
targetCommitteeSizetoL1RollupConstantsThe committee size is needed to calculate expected attestation rates. Added to:
L1RollupConstantstype and schemaEpochCache.create()to fetch from rollup contractEpochCacheInterface.getL1Constants()methodNew Topic Scoring Module:
@aztec/p2p/services/gossipsub/topic_score_params.tsImplements dynamic scoring parameter calculation with balanced P1/P2/P3 configuration following Lodestar's approach:
txblock_proposalcheckpoint_proposalcheckpoint_attestationKey features:
Tightened Gossipsub Thresholds
Updated
scoring.tswith thresholds aligned to application-level scoring:The 1:2:4 ratio follows Lodestar's approach and gossipsub spec recommendations.
Application Score Weight
Verified
appSpecificWeight = 10creates perfect alignment:Added documentation in
libp2p_service.tsexplaining this alignment.Application Penalties
The existing penalties are well-designed and unchanged:
Added documentation in
peer_scoring.tsexplaining the alignment with gossipsub thresholds.How the Systems Work Together
Score Flow
Peer State Alignment
Topic Score Contribution
Topic scores are balanced for mesh pruning while allowing recovery from network issues:
Key insight: P3 max (-34) > P1+P2 max (+33), so non-contributors are always pruned regardless of how long they've been in mesh.
After pruning: P3b = -102 total, which is well above gossipThreshold (-500), so network issues don't cause disconnection.
Example Scenarios
Technical Details
Decay Calculation
Counters decay to ~1% over the decay window:
Convergence and Threshold
Steady-state counter value and conservative threshold:
Blocks Per Slot
Calculated from timetable constants (same formula used by sequencer):
Files Changed
New Files
yarn-project/stdlib/src/timetable/index.ts- Shared timetable constantsyarn-project/stdlib/src/config/sequencer-config.ts- Shared sequencer config mappings (e.g.,blockDurationMs)yarn-project/p2p/src/services/gossipsub/topic_score_params.ts- Topic scoring logicyarn-project/p2p/src/services/gossipsub/topic_score_params.test.ts- Unit tests for scoring paramsyarn-project/p2p/src/services/gossipsub/index.ts- Module exportsyarn-project/p2p/src/services/gossipsub/README.md- DocumentationModified Files
yarn-project/stdlib/src/epoch-helpers/index.ts- AddedtargetCommitteeSizeyarn-project/stdlib/package.json- Added timetable exportyarn-project/epoch-cache/src/epoch_cache.ts- Fetch committee size, addgetL1Constants()yarn-project/p2p/src/config.ts- AddedblockDurationMsto P2P config viaPick<SequencerConfig, 'blockDurationMs'>(uses shared mapping from@aztec/stdlib/config)yarn-project/p2p/src/services/libp2p/libp2p_service.ts- Use dynamic topic params, passblockDurationMsfrom config, added appSpecificWeight documentationyarn-project/p2p/src/services/gossipsub/scoring.ts- Updated thresholds with documentationyarn-project/p2p/src/services/peer-manager/peer_scoring.ts- Added alignment documentationyarn-project/sequencer-client/src/config.ts- Import timetable constants and shared sequencer config mappings from stdlibyarn-project/sequencer-client/src/sequencer/timetable.ts- Import from stdlibyarn-project/archiver/src/factory.ts- IncludetargetCommitteeSizetargetCommitteeSizeandgetL1ConstantsmocksTesting
topic_score_params.ts(46 tests) verify:calculateBlocksPerSlot- single block mode and MBPS modegetDecayWindowSlots- frequency-based decay window selectioncomputeDecay- mathematical correctness (decays to ~1% over window)computeConvergence- geometric series formulacomputeThreshold- conservative threshold calculationgetExpectedMessagesPerSlot- per-topic expected ratesTopicScoreParamsFactory- shared value computation, per-topic paramsDocumentation
Added comprehensive README at
yarn-project/p2p/src/services/gossipsub/README.mdcovering:Fixes A-265