Merged
Conversation
When deciding whether to slash committee members for an epoch, the epoch prune watcher tries reexecuting all blocks in the pruned epoch. It also uses that to decide whether to slash for data withholding, if not all data is available. However, the blocks being reexecuted (and the txs being gathered) were from ALL pruned epochs, which could be more than one if the proof submission window was long enough. So we ended up slashing committee members from the first epoch for data withholding offenses perpetrated by members in future epochs.
This PR rolls back uncommitted state upon a failed world state block sync.
Handles malformed proposals where the archive root is not properly derived from the other tree roots, aka _the @alexghr Friday attack_. Also checks that the block proposal is not for a block that already exists, since the proposer had complete freedom to set any past block number for the reexecution (though this would later fail in L1). Fixes https://linear.app/aztec-labs/issue/A-84/validate-full-block-proposal-before-attesting
spalladino
approved these changes
Sep 23, 2025
The sequencer publisher enforces a max size for a block, depending on the size it takes up in blobs. If the block exceeds that size, it's rejected by the publisher. https://github.com/AztecProtocol/aztec-packages/blob/bb87ea4a58a63771e61d551d105d8b52ba2014e6/yarn-project/stdlib/src/block/body.ts#L56-L70 This PR adds a check during block building to ensure that we don't go past that limit.
Do not acknowledge an L1 to L2 message as synced until the rollup pending block number has caught up with the message block. The inbox block number may drift way ahead of the rollup block number in the event of a reorg or if there are too many l1 to l2 messages being inserted. Note that the existing approach used throughout the codebase of waiting for two blocks if flawed, since if there was an earlier reorg on the chain, then the inbox will have drifted and the message will require more blocks to become available. This PR does NOT remove the existing isL1ToL2MessageSynced call, since it's used all over the place, but rather flags it as deprecated. Instead, the node and pxe now expose a function that returns the block in which the message is to be available, and aztecjs provides a helper to wait until the block is reached. The bot factory is updated to use this new approach.
Second attempt at #17207 The issue was that the `start` method on p2p client would overwrite all previously registered subprotocol handlers, hence the errors we were seeing on CI: ``` 22:00:04 [22:00:04.697] WARN: p2p:4506:libp2p_service:4506:libp2p_service:4506:reqresp:4506 Unknown stream error while handling the stream, aborting {"protocol":"/aztec/req/auth/1.0.0"} 22:00:04 err: { 22:00:04 "type": "TypeError", 22:00:04 "message": "handler is not a function", 22:00:04 "stack": 22:00:04 TypeError: handler is not a function 22:00:04 at /home/aztec-dev/aztec-packages/yarn-project/p2p/dest/services/reqresp/reqresp.js:452:40 22:00:04 at processTicksAndRejections (node:internal/process/task_queues:105:5) 22:00:04 at duplex.sink (/home/aztec-dev/aztec-packages/yarn-project/node_modules/it-byte-stream/src/index.ts:86:22) 22:00:04 } ``` The test was also failing locally (it had passed before I submitted due to a build issue I missed), and since it was tagged as flake it was greenlighted by CI on the PR.
While we keep slashing rounds for a long time (lifetime in rounds is currently defined as 100), we only attempt to execute them during the first round they become executable. If for whatever reason they don't get executed, they are just forgotten and never actually triggered. This PR adds a new config `slashExecuteRoundsLookBack` (defaults to 4) with how many execution rounds to look back from the latest executable round to see if there was any round pending execution. Each round is checked in sequence, so setting this value too high can introduce performance issues. Setting this value to zero keeps the same behaviour as we have today. This PR also fixes another issue: we were re-checking if a round was executable based on the isReadyToExecute flag returned from the contract. However, that flag was computed based on the current at the time of the call, and not based on the time in which the tx would land. This meant that we always failed to execute the slash payload on the first slot of a round. These two issues combined were triggering the flakes in the `inactivity-slash` tests. Since we had small rounds (each round was 4 slots), and the first slot was consistently missed, it was a matter of being unlucky enough such that the inactive validator was picked 3 slots in a row as a sequencer. Given we have 6 validators, this happened roughly once every 216 runs. See [this run](http://ci.aztec-labs.com/3bff0b862dd4156f) for an example.
Adds a flag to always reexecute block proposals. If set, a validator node will always reexecute, even if not part of the committee, though they will not attest. If the node is not a validator, they will just log the result fo the execution. Note that this does NOT affect p2p propagation, since the reexecution is done after the attestation is propagated, as it happens on a separate handler and not in a p2p-registered validator. To handle reexecutions in a non-validator node, reexecution was moved to a block-proposal-handler class, which is instantiated instead of a validator client in non-validators. This PR also causes validators to reexecute a proposal if they are not in the committee if there is a slash penalty defined for broadcasting invalid block proposals. Since this feature is not yet properly tested, I've disabled the default slash for these offenses for the time being (they were not working at the moment). See A-57 for more info. Fixes A-54
As part of #17273 I had added a cleanup to the gossip network test to delete data dirs for the prover. However, the `stop` method on the prover failed to await for all operations, so when the test finished successfully, it would still try to use the db (in particular, it seems to be for the proving broker database `getEpochDatabase`) and abort with a core dump. This reverts the folder cleanup.
f30d434 to
7bf931e
Compare
(cherry picked from commit 339596a)
(cherry picked from commit 9a5cfa1)
(cherry picked from commit 2edc51a)
Fix A-43 (cherry picked from commit 61ad020)
alexghr
approved these changes
Sep 29, 2025
alexghr
pushed a commit
that referenced
this pull request
Nov 5, 2025
# v2.0.3..v2.1.0-rc.1 Notes ## Significant L1 Changes ### 1. **Rollup Contract Interface Changes** - **`propose()` function signature changed**: Now requires an additional `_attestationsAndSignersSignature` parameter - **`validateHeaderWithAttestations()` function signature changed**: Also requires the new signature parameter - This affects any code that directly calls these functions on the rollup contract ### 2. **New Required Configuration Parameters** Several new configuration parameters are now required for deployment: - `localEjectionThreshold`: Stricter ejection threshold local to specific rollup (default: 196,000 tokens) - `slashingDisableDuration`: How long slashing can be disabled in seconds (default: 5 days) ### 3. **GSE Contract Changes** - **New function**: `setProofOfPossessionGasLimit()` \- allows governance to adjust gas limits for BLS proof validation - **Gas-limited proof validation**: Proof of possession validation now has configurable gas limits (default: 200,000 gas) ### 4. **Validator Queue Management Changes** - **`flushEntryQueue()` behavior changed**: Now has an overload accepting a `_toAdd` parameter to limit validator additions - **New validator flush accounting**: System now tracks available validator flushes per epoch Significant Non-Breaking Changes -------------------------------- ### 1. **Enhanced Slashing Controls** - **Temporary slashing disable**: Vetoers can now temporarily disable slashing for the configured duration - **New function**: `setSlashingEnabled(bool)` for controlling slashing state ### 2. **Improved Validator Selection** - **Configurable lag period**: Validator sampling now uses configurable epoch lag instead of fixed 2-epoch delay - **Better bootstrapping**: Enhanced validator set bootstrapping with improved flush size calculations ### 3. **Updated Default Values** - **Coin issuer rate**: Updated to `25,000,000,000 tokens / year` (approximately 793 tokens per second) - **Local ejection threshold**: Set to 196,000 tokens (stricter than global 50,000 threshold) ## Significant Node Changes ### Fixes - Rollback world state on failed block sync – Prevents bad state persistence by rolling back uncommitted data if block sync fails. [(#17158)](github.com//pull/17158) - Early rejection of duplicate nullifiers – Detects and rejects transactions with duplicate nullifiers before inclusion. [(#17157)](github.com//pull/17157) - Watcher pruning fix – Watcher now re-executes only blocks from the relevant pruned epoch, avoiding cross-epoch slashing issues. [(#17145)](github.com//pull/17145) - Improved proposal validation – Fully validates proposal headers (including archive root derivation) and blocks attempts to reuse existing block numbers. [(#17144)](github.com//pull/17144) - L1 to L2 message sync reliability – Waits for rollup to reach the inbox block before marking L1→L2 messages as synced; adds helpers to track message readiness. [(#17132)](github.com//pull/17132) - Slashing round recovery – Executes pending slashing rounds skipped during the first executable round; adds slashExecuteRoundsLookBack to control re-check depth. [(#17125)](github.com//pull/17125) - Broker restart on rollup change – Ensures broker restarts when rollup chain changes to stay synchronized. [(#17120)](github.com//pull/17120) - Remote signer readiness check – Verifies that a remote signer is available before use. [(#17119)](github.com//pull/17119) - Orchestrator and agent retry improvements – Makes connections to the broker more robust under transient failures. [(#17117)](github.com//pull/17117) - Telemetry cleanup – Fixes incorrect or spammy telemetry warnings. [(#17155)](github.com//pull/17155) ### Features - Network configuration support – Introduces centralized configuration for network parameters. [(#17113)](github.com//pull/17113) ## Full Changelog You can generate this yourself with `./scripts/commits v2.0.3..v2.1.0-rc.1 1000 -m -g`. #### Fixes - fix: use archiveAt(0) instead of getBlock to get genesis archive tree - backport v2 ([#17447](#17447)) — spypsy, 5 days ago - fix: add keystoreDirectory option to sequencer ([#17265](#17265)) — spypsy, 13 days ago - fix: testnet archival node - v2 ([#17142](#17142)) — Aztec Bot, 3 weeks ago #### Chores - chore: bump minor version — Mitch, 4 days ago — [dbc243f](dbc243f) - chore: backport dependabot deps ([#17463](#17463)) — Aztec Bot, 5 days ago - chore: Backport slack alerts ([#17460](#17460)) — PhilWindle, 5 days ago - chore(backport-to-v2): chore: New salt for staging-ignition (#17453) ([#17453](#17453)) — Aztec Bot, 5 days ago - chore(backport-to-v2): fix: improve libp2p connection limits for network discovery (#17425) ([#17425](#17425)) — Aztec Bot, 5 days ago - chore(backport-to-v2): feat: add flushing rewarder (#17335) ([#17335](#17335)) — Aztec Bot, 6 days ago - chore(backport-to-v2): feat: add date gated relayer (#17323) ([#17323](#17323)) — Aztec Bot, 6 days ago - chore(backport-to-v2): feat: support using existing ERC20 token for fee and staking (#17413) ([#17413](#17413)) — Aztec Bot, 6 days ago - chore: Delete contract addresses from chain l2 config ([#17430](#17430)) — PhilWindle, 6 days ago - chore: More updated staging public config ([#17364](#17364)) — PhilWindle, 7 days ago - chore(backport-to-V2): L1 backports ([#17365](#17365)) — Lasse Herskind, 7 days ago - chore: Ensure DB map sizes are configured for networks ([#17383](#17383)) — PhilWindle, 7 days ago - chore: Backport of fixes into v2 ([#17206](#17206)) — PhilWindle, 8 days ago - chore: update zkpassport version ([#17339](#17339)) — saleel, 8 days ago - chore: Backport of workflow fix ([#17333](#17333)) — PhilWindle, 11 days ago - chore: Streamline staging deployments ([#17328](#17328)) — PhilWindle, 11 days ago - chore(backport-to-v2): fix: avm gracefully handles shifts (shl) with huge bit sizes (#17171) ([#17171](#17171)) — Aztec Bot, 12 days ago - chore(backport-to-v2): chore: remove unconstrained generics from trait impls (#17075) ([#17075](#17075)) — Aztec Bot, 12 days ago - chore: Backport deployment refactor ([#17280](#17280)) — PhilWindle, 12 days ago - chore(backport-to-v2): fix(docs): Update Counter contract tutorial imports and remove unnecessary sections (#17241) ([#17241](#17241)) — Aztec Bot, 13 days ago - chore: remove ACCEPT_DISABLED_AVM_VK_TREE_ROOT ([#17238](#17238)) — Alex Gherghisan, 13 days ago - chore: remove bad rollup-version default ([#17223](#17223)) — Alex Gherghisan, 2 weeks ago - chore(docs): node docs to v2 ([#17205](#17205)) — esau, 2 weeks ago - chore(backport-to-v2): chore(avm)!: Fix a misleading log in recursive verifier related to public input (#17184) ([#17184](#17184)) — Aztec Bot, 2 weeks ago - chore: Backport of ignition fix attempt 2 ([#17201](#17201)) — PhilWindle, 2 weeks ago - chore: turn on testnet compat test ([#17195](#17195)) — Alex Gherghisan, 2 weeks ago - chore: Backport fix to staging-ignition to v2 ([#17159](#17159)) — PhilWindle, 3 weeks ago - chore: kubectl ([#17140](#17140)) — Alex Gherghisan, 3 weeks ago #### Other - backport dependabots p2 ([#17488](#17488)) — mralj, 4 days ago --------- Co-authored-by: AztecBot <tech@aztecprotocol.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This PR is a backport of the following into V2.