Skip to content

fix(validator): process block proposals from own validator keys in HA setups#21603

Merged
PhilWindle merged 1 commit intomerge-train/spartanfrom
palla/ha-process-self-proposals
Mar 17, 2026
Merged

fix(validator): process block proposals from own validator keys in HA setups#21603
PhilWindle merged 1 commit intomerge-train/spartanfrom
palla/ha-process-self-proposals

Conversation

@spalladino
Copy link
Contributor

@spalladino spalladino commented Mar 16, 2026

Motivation

In an HA setup, two nodes (A and B) share the same validator keys. When node A proposes a block, node B receives it via gossipsub but ignores it because validateBlockProposal detects the proposer address matches its own validator keys and returns early. This means node B never re-executes the block, never pushes it to its archiver, and falls behind the proposed chain.

Additionally, both HA peers independently try to build and propose blocks for the same slot. If the losing peer commits its block to the archiver before signing fails, it ends up with a stale block that prevents it from accepting the winning peer's proposal.

Approach

Three changes work together to fix HA proposed chain sync:

  1. Remove self-filtering: Remove the early return in validateBlockProposal for self-proposals, letting them flow through the normal re-execution path so the HA peer pushes the winning block to its archiver.

  2. Sign before syncing to archiver: Reorder the checkpoint proposal job so that non-last blocks are signed via createBlockProposal before being synced to the archiver. If the shared slashing protection DB rejects signing (because the HA peer already signed), the block is never added to the archiver, keeping it clean to accept the winning peer's block via gossipsub.

  3. Shared slashing protection for testing: Add createSharedSlashingProtectionDb (backed by a shared LMDB store) and createSignerFromSharedDb factories, and thread an optional slashingProtectionDb through the validator creation chain. This allows e2e tests to simulate HA signing coordination without PostgreSQL.

Changes

  • validator-client: Remove self-proposal filtering in validateBlockProposal. Add optional slashingProtectionDb parameter to ValidatorClient.new and createValidatorClient factory for injecting a shared signing protection DB.
  • validator-client (tests): Add unit test verifying block proposals signed with the validator's own key are processed and forwarded to handleBlockProposal.
  • sequencer-client: Reorder checkpoint_proposal_job so non-last blocks call createBlockProposal before syncProposedBlockToArchiver. If signing fails (HA signer rejects), the block is never added to the archiver.
  • validator-ha-signer: Add createSharedSlashingProtectionDb and createSignerFromSharedDb factory functions for testing HA setups with a shared in-memory LMDB store.
  • aztec-node: Thread slashingProtectionDb through AztecNodeService.createAndSync deps.
  • end-to-end: Add epochs_ha_sync e2e test with 4 nodes in 2 HA pairs (each pair sharing validator keys and a slashing protection DB), different coinbase addresses per node, MBPS enabled, checkpoint publishing disabled. Asserts all 4 nodes converge on the same proposed block hash before any checkpoint is published.

Fixes A-675

@spalladino spalladino added ci-no-fail-fast Sets NO_FAIL_FAST in the CI so the run is not aborted on the first failure backport-to-v4-next labels Mar 16, 2026
@spalladino spalladino force-pushed the palla/ha-process-self-proposals branch 4 times, most recently from 4cd76ef to 8df7c64 Compare March 16, 2026 16:53
… setups

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@spalladino spalladino force-pushed the palla/ha-process-self-proposals branch from 8df7c64 to 5eedb7d Compare March 16, 2026 17:09
@PhilWindle PhilWindle merged commit bbcefc8 into merge-train/spartan Mar 17, 2026
11 checks passed
@PhilWindle PhilWindle deleted the palla/ha-process-self-proposals branch March 17, 2026 09:50
@AztecBot
Copy link
Collaborator

❌ Failed to cherry-pick to v4-next due to conflicts. (🤖) View backport run.

AztecBot added a commit that referenced this pull request Mar 17, 2026
…e proposal test

When PR #21603 changed the validator to process (not ignore) block proposals
from HA peers (same validator key), the duplicate_proposal_slash test broke.
The second malicious node now processes the first node's proposal, adds the
block to its archiver, and the sequencer sees the slot as taken - preventing
it from ever building its own conflicting proposal.

Fix: set skipPushProposedBlocksToArchiver=true on the malicious nodes so
that HA peer proposals are re-executed but not added to the archiver. This
allows both malicious nodes to independently build and broadcast proposals
for the same slot, which is what the test needs for equivocation detection.
spalladino pushed a commit that referenced this pull request Mar 17, 2026
…e proposal test (#21673)

## Summary
When PR #21603 changed the validator to process (not ignore) block
proposals from HA peers (same validator key), the
`duplicate_proposal_slash` test broke. The second malicious node now
processes the first node's proposal, adds the block to its archiver via
`blockSource.addBlock()`, and the sequencer sees "slot was taken" —
preventing it from ever building its own conflicting proposal.

**Root cause**: `validateBlockProposal` no longer returns `false` for
self-proposals (changed to process them for HA support). The
block_proposal_handler re-executes the proposal and pushes it to the
archiver. The sequencer then skips the slot.

**Fix**: Set `skipPushProposedBlocksToArchiver=true` on the malicious
nodes. This allows:
1. Node 1 builds and broadcasts its proposal
2. Node 2 receives it, re-executes (as HA peer), but does NOT add to
archiver
3. Node 2's sequencer doesn't see "slot taken" → builds its own block
with different coinbase
4. Node 2 broadcasts (allowed by `broadcastEquivocatedProposals=true`)
5. Honest nodes see both proposals → detect duplicate → offense recorded

## Test plan
- The `duplicate_proposal_slash` e2e test should now pass consistently
- Other slashing tests should be unaffected (only malicious nodes in
this test are changed)

ClaudeBox log: https://claudebox.work/s/ced449aa0eabbcb4?run=1
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

backport-to-v4-next ci-no-fail-fast Sets NO_FAIL_FAST in the CI so the run is not aborted on the first failure

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants