chore: Backport of fixes into v2 by PhilWindle · Pull Request #17206 · AztecProtocol/aztec-packages

PhilWindle · 2025-09-22T18:04:58Z

This PR is a backport of the following into V2.

When deciding whether to slash committee members for an epoch, the epoch prune watcher tries reexecuting all blocks in the pruned epoch. It also uses that to decide whether to slash for data withholding, if not all data is available. However, the blocks being reexecuted (and the txs being gathered) were from ALL pruned epochs, which could be more than one if the proof submission window was long enough. So we ended up slashing committee members from the first epoch for data withholding offenses perpetrated by members in future epochs.

This PR rolls back uncommitted state upon a failed world state block sync.

@alexghr

Handles malformed proposals where the archive root is not properly derived from the other tree roots, aka _the @alexghr Friday attack_. Also checks that the block proposal is not for a block that already exists, since the proposer had complete freedom to set any past block number for the reexecution (though this would later fail in L1). Fixes https://linear.app/aztec-labs/issue/A-84/validate-full-block-proposal-before-attesting

The sequencer publisher enforces a max size for a block, depending on the size it takes up in blobs. If the block exceeds that size, it's rejected by the publisher. https://github.com/AztecProtocol/aztec-packages/blob/bb87ea4a58a63771e61d551d105d8b52ba2014e6/yarn-project/stdlib/src/block/body.ts#L56-L70 This PR adds a check during block building to ensure that we don't go past that limit.

Do not acknowledge an L1 to L2 message as synced until the rollup pending block number has caught up with the message block. The inbox block number may drift way ahead of the rollup block number in the event of a reorg or if there are too many l1 to l2 messages being inserted. Note that the existing approach used throughout the codebase of waiting for two blocks if flawed, since if there was an earlier reorg on the chain, then the inbox will have drifted and the message will require more blocks to become available. This PR does NOT remove the existing isL1ToL2MessageSynced call, since it's used all over the place, but rather flags it as deprecated. Instead, the node and pxe now expose a function that returns the block in which the message is to be available, and aztecjs provides a helper to wait until the block is reached. The bot factory is updated to use this new approach.

Second attempt at #17207 The issue was that the `start` method on p2p client would overwrite all previously registered subprotocol handlers, hence the errors we were seeing on CI: ``` 22:00:04 [22:00:04.697] WARN: p2p:4506:libp2p_service:4506:libp2p_service:4506:reqresp:4506 Unknown stream error while handling the stream, aborting {"protocol":"/aztec/req/auth/1.0.0"} 22:00:04 err: { 22:00:04 "type": "TypeError", 22:00:04 "message": "handler is not a function", 22:00:04 "stack": 22:00:04 TypeError: handler is not a function 22:00:04 at /home/aztec-dev/aztec-packages/yarn-project/p2p/dest/services/reqresp/reqresp.js:452:40 22:00:04 at processTicksAndRejections (node:internal/process/task_queues:105:5) 22:00:04 at duplex.sink (/home/aztec-dev/aztec-packages/yarn-project/node_modules/it-byte-stream/src/index.ts:86:22) 22:00:04 } ``` The test was also failing locally (it had passed before I submitted due to a build issue I missed), and since it was tagged as flake it was greenlighted by CI on the PR.

While we keep slashing rounds for a long time (lifetime in rounds is currently defined as 100), we only attempt to execute them during the first round they become executable. If for whatever reason they don't get executed, they are just forgotten and never actually triggered. This PR adds a new config `slashExecuteRoundsLookBack` (defaults to 4) with how many execution rounds to look back from the latest executable round to see if there was any round pending execution. Each round is checked in sequence, so setting this value too high can introduce performance issues. Setting this value to zero keeps the same behaviour as we have today. This PR also fixes another issue: we were re-checking if a round was executable based on the isReadyToExecute flag returned from the contract. However, that flag was computed based on the current at the time of the call, and not based on the time in which the tx would land. This meant that we always failed to execute the slash payload on the first slot of a round. These two issues combined were triggering the flakes in the `inactivity-slash` tests. Since we had small rounds (each round was 4 slots), and the first slot was consistently missed, it was a matter of being unlucky enough such that the inactive validator was picked 3 slots in a row as a sequencer. Given we have 6 validators, this happened roughly once every 216 runs. See [this run](http://ci.aztec-labs.com/3bff0b862dd4156f) for an example.

Adds a flag to always reexecute block proposals. If set, a validator node will always reexecute, even if not part of the committee, though they will not attest. If the node is not a validator, they will just log the result fo the execution. Note that this does NOT affect p2p propagation, since the reexecution is done after the attestation is propagated, as it happens on a separate handler and not in a p2p-registered validator. To handle reexecutions in a non-validator node, reexecution was moved to a block-proposal-handler class, which is instantiated instead of a validator client in non-validators. This PR also causes validators to reexecute a proposal if they are not in the committee if there is a slash penalty defined for broadcasting invalid block proposals. Since this feature is not yet properly tested, I've disabled the default slash for these offenses for the time being (they were not working at the moment). See A-57 for more info. Fixes A-54

As part of #17273 I had added a cleanup to the gossip network test to delete data dirs for the prover. However, the `stop` method on the prover failed to await for all operations, so when the test finished successfully, it would still try to use the db (in particular, it seems to be for the proving broker database `getEpochDatabase`) and abort with a core dump. This reverts the folder cleanup.

(cherry picked from commit 339596a)

(cherry picked from commit 9a5cfa1)

(cherry picked from commit 2edc51a)

Fix A-43 (cherry picked from commit 61ad020)

# v2.0.3..v2.1.0-rc.1 Notes ## Significant L1 Changes ### 1. **Rollup Contract Interface Changes** - **`propose()` function signature changed**: Now requires an additional `_attestationsAndSignersSignature` parameter - **`validateHeaderWithAttestations()` function signature changed**: Also requires the new signature parameter - This affects any code that directly calls these functions on the rollup contract ### 2. **New Required Configuration Parameters** Several new configuration parameters are now required for deployment: - `localEjectionThreshold`: Stricter ejection threshold local to specific rollup (default: 196,000 tokens) - `slashingDisableDuration`: How long slashing can be disabled in seconds (default: 5 days) ### 3. **GSE Contract Changes** - **New function**: `setProofOfPossessionGasLimit()` \- allows governance to adjust gas limits for BLS proof validation - **Gas-limited proof validation**: Proof of possession validation now has configurable gas limits (default: 200,000 gas) ### 4. **Validator Queue Management Changes** - **`flushEntryQueue()` behavior changed**: Now has an overload accepting a `_toAdd` parameter to limit validator additions - **New validator flush accounting**: System now tracks available validator flushes per epoch Significant Non-Breaking Changes -------------------------------- ### 1. **Enhanced Slashing Controls** - **Temporary slashing disable**: Vetoers can now temporarily disable slashing for the configured duration - **New function**: `setSlashingEnabled(bool)` for controlling slashing state ### 2. **Improved Validator Selection** - **Configurable lag period**: Validator sampling now uses configurable epoch lag instead of fixed 2-epoch delay - **Better bootstrapping**: Enhanced validator set bootstrapping with improved flush size calculations ### 3. **Updated Default Values** - **Coin issuer rate**: Updated to `25,000,000,000 tokens / year` (approximately 793 tokens per second) - **Local ejection threshold**: Set to 196,000 tokens (stricter than global 50,000 threshold) ## Significant Node Changes ### Fixes - Rollback world state on failed block sync – Prevents bad state persistence by rolling back uncommitted data if block sync fails. [(#17158)](github.com//pull/17158) - Early rejection of duplicate nullifiers – Detects and rejects transactions with duplicate nullifiers before inclusion. [(#17157)](github.com//pull/17157) - Watcher pruning fix – Watcher now re-executes only blocks from the relevant pruned epoch, avoiding cross-epoch slashing issues. [(#17145)](github.com//pull/17145) - Improved proposal validation – Fully validates proposal headers (including archive root derivation) and blocks attempts to reuse existing block numbers. [(#17144)](github.com//pull/17144) - L1 to L2 message sync reliability – Waits for rollup to reach the inbox block before marking L1→L2 messages as synced; adds helpers to track message readiness. [(#17132)](github.com//pull/17132) - Slashing round recovery – Executes pending slashing rounds skipped during the first executable round; adds slashExecuteRoundsLookBack to control re-check depth. [(#17125)](github.com//pull/17125) - Broker restart on rollup change – Ensures broker restarts when rollup chain changes to stay synchronized. [(#17120)](github.com//pull/17120) - Remote signer readiness check – Verifies that a remote signer is available before use. [(#17119)](github.com//pull/17119) - Orchestrator and agent retry improvements – Makes connections to the broker more robust under transient failures. [(#17117)](github.com//pull/17117) - Telemetry cleanup – Fixes incorrect or spammy telemetry warnings. [(#17155)](github.com//pull/17155) ### Features - Network configuration support – Introduces centralized configuration for network parameters. [(#17113)](github.com//pull/17113) ## Full Changelog You can generate this yourself with `./scripts/commits v2.0.3..v2.1.0-rc.1 1000 -m -g`. #### Fixes - fix: use archiveAt(0) instead of getBlock to get genesis archive tree - backport v2 ([#17447](#17447)) — spypsy, 5 days ago - fix: add keystoreDirectory option to sequencer ([#17265](#17265)) — spypsy, 13 days ago - fix: testnet archival node - v2 ([#17142](#17142)) — Aztec Bot, 3 weeks ago #### Chores - chore: bump minor version — Mitch, 4 days ago — [dbc243f](dbc243f) - chore: backport dependabot deps ([#17463](#17463)) — Aztec Bot, 5 days ago - chore: Backport slack alerts ([#17460](#17460)) — PhilWindle, 5 days ago - chore(backport-to-v2): chore: New salt for staging-ignition (#17453) ([#17453](#17453)) — Aztec Bot, 5 days ago - chore(backport-to-v2): fix: improve libp2p connection limits for network discovery (#17425) ([#17425](#17425)) — Aztec Bot, 5 days ago - chore(backport-to-v2): feat: add flushing rewarder (#17335) ([#17335](#17335)) — Aztec Bot, 6 days ago - chore(backport-to-v2): feat: add date gated relayer (#17323) ([#17323](#17323)) — Aztec Bot, 6 days ago - chore(backport-to-v2): feat: support using existing ERC20 token for fee and staking (#17413) ([#17413](#17413)) — Aztec Bot, 6 days ago - chore: Delete contract addresses from chain l2 config ([#17430](#17430)) — PhilWindle, 6 days ago - chore: More updated staging public config ([#17364](#17364)) — PhilWindle, 7 days ago - chore(backport-to-V2): L1 backports ([#17365](#17365)) — Lasse Herskind, 7 days ago - chore: Ensure DB map sizes are configured for networks ([#17383](#17383)) — PhilWindle, 7 days ago - chore: Backport of fixes into v2 ([#17206](#17206)) — PhilWindle, 8 days ago - chore: update zkpassport version ([#17339](#17339)) — saleel, 8 days ago - chore: Backport of workflow fix ([#17333](#17333)) — PhilWindle, 11 days ago - chore: Streamline staging deployments ([#17328](#17328)) — PhilWindle, 11 days ago - chore(backport-to-v2): fix: avm gracefully handles shifts (shl) with huge bit sizes (#17171) ([#17171](#17171)) — Aztec Bot, 12 days ago - chore(backport-to-v2): chore: remove unconstrained generics from trait impls (#17075) ([#17075](#17075)) — Aztec Bot, 12 days ago - chore: Backport deployment refactor ([#17280](#17280)) — PhilWindle, 12 days ago - chore(backport-to-v2): fix(docs): Update Counter contract tutorial imports and remove unnecessary sections (#17241) ([#17241](#17241)) — Aztec Bot, 13 days ago - chore: remove ACCEPT_DISABLED_AVM_VK_TREE_ROOT ([#17238](#17238)) — Alex Gherghisan, 13 days ago - chore: remove bad rollup-version default ([#17223](#17223)) — Alex Gherghisan, 2 weeks ago - chore(docs): node docs to v2 ([#17205](#17205)) — esau, 2 weeks ago - chore(backport-to-v2): chore(avm)!: Fix a misleading log in recursive verifier related to public input (#17184) ([#17184](#17184)) — Aztec Bot, 2 weeks ago - chore: Backport of ignition fix attempt 2 ([#17201](#17201)) — PhilWindle, 2 weeks ago - chore: turn on testnet compat test ([#17195](#17195)) — Alex Gherghisan, 2 weeks ago - chore: Backport fix to staging-ignition to v2 ([#17159](#17159)) — PhilWindle, 3 weeks ago - chore: kubectl ([#17140](#17140)) — Alex Gherghisan, 3 weeks ago #### Other - backport dependabots p2 ([#17488](#17488)) — mralj, 4 days ago --------- Co-authored-by: AztecBot <tech@aztecprotocol.com>

alexghr and others added 4 commits September 22, 2025 17:56

fix: telemetry warnings

b5f2616

fix: Rejecting txs with duplicate nullifiers

d623839

fix: Rollback on failed blocks

5112f76

This PR rolls back uncommitted state upon a failed world state block sync.

PhilWindle requested review from dbanks12, fcarreiro and sirasistant as code owners September 22, 2025 18:04

fcarreiro removed their request for review September 23, 2025 08:34

spalladino approved these changes Sep 23, 2025

View reviewed changes

spalladino and others added 5 commits September 25, 2025 11:55

spalladino added ci-no-squash ci-no-fail-fast Sets NO_FAIL_FAST in the CI so the run is not aborted on the first failure labels Sep 25, 2025

spalladino added 2 commits September 25, 2025 15:12

chore: Fix bad cherry-picks from backporting

7bf931e

spalladino force-pushed the pw/bp-friday-fixes branch from f30d434 to 7bf931e Compare September 25, 2025 18:13

fix: orchestrator and agents retry more when connecting to the broker

9ad621c

(cherry picked from commit 339596a)

alexghr mentioned this pull request Sep 26, 2025

chore(backport-to-v2): fix: orchestrator and agents retry more when connecting to the broker (#17285) #17302

Closed

feat: check remote-signer is available

96d38f9

(cherry picked from commit 9a5cfa1)

alexghr mentioned this pull request Sep 26, 2025

chore(backport-to-v2): feat: check remote-signer is available (#17225) #17296

Closed

fix: sequencer metrics

83d1380

(cherry picked from commit 2edc51a)

alexghr mentioned this pull request Sep 26, 2025

chore(backport-to-v2): fix: sequencer metrics (#17192) #17286

Closed

fix: broker restarts on rollup change

a5d996e

Fix A-43 (cherry picked from commit 61ad020)

alexghr mentioned this pull request Sep 26, 2025

chore(backport-to-v2): fix: broker restarts on rollup change (#17194) #17274

Closed

alexghr added 2 commits September 26, 2025 13:15

feat: optionally disable publishing L1 txs

30c33e9

feat: add network config

594bfdf

alexghr requested a review from charlielye as a code owner September 26, 2025 13:16

chore: lint

66441d9

alexghr enabled auto-merge (squash) September 29, 2025 15:58

alexghr approved these changes Sep 29, 2025

View reviewed changes

alexghr disabled auto-merge September 29, 2025 15:59

PhilWindle enabled auto-merge (squash) September 29, 2025 15:59

PhilWindle removed request for charlielye, dbanks12 and sirasistant September 29, 2025 16:00

PhilWindle disabled auto-merge September 29, 2025 16:00

PhilWindle merged commit 5f821fd into v2 Sep 29, 2025
6 checks passed

PhilWindle deleted the pw/bp-friday-fixes branch September 29, 2025 16:01

AztecBot mentioned this pull request Sep 29, 2025

chore(v2): release 2.1.1 #17145

Merged

AztecBot mentioned this pull request Nov 5, 2025

chore(v2): release 2.1.2 #18226

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

chore: Backport of fixes into v2#17206

chore: Backport of fixes into v2#17206
PhilWindle merged 19 commits intov2from
pw/bp-friday-fixes

PhilWindle commented Sep 22, 2025 •

edited by spalladino

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

PhilWindle commented Sep 22, 2025 • edited by spalladino Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

PhilWindle commented Sep 22, 2025 •

edited by spalladino

Loading