Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
22 commits
Select commit Hold shift + click to select a range
622048e
grandpa: Add debug logs for rounds and set IDs
lexnv Jun 27, 2025
caff0bc
grandpa: Add debug for previous set decoding
lexnv Jun 27, 2025
a86313f
grandpa: Check signatures of justifications against prev sets
lexnv Jun 27, 2025
a820d02
consensus: Adjust code to the new SignatureResult
lexnv Jun 27, 2025
3ae2eed
consensus: Propagate outdated justification errors
lexnv Jun 27, 2025
46934a3
sync: Only ban invalid justifications, not outdated ones
lexnv Jun 27, 2025
c2951d1
consensus: Introduce a JustificationImportResult
lexnv Jun 27, 2025
56afc58
grandpa: Check if the signature is valid via wrapper fn
lexnv Jun 27, 2025
57d0234
Update from github-actions[bot] running command 'prdoc --audience nod…
github-actions[bot] Jun 30, 2025
1484009
prdoc: Add node operator
lexnv Jun 30, 2025
199a2be
Merge branch 'master' into lexnv/debug-grandpa
lexnv Jun 30, 2025
0622c43
prdoc: Fix format
lexnv Jun 30, 2025
4b20186
Merge branch 'master' into lexnv/debug-grandpa
lexnv Jun 30, 2025
2028753
prdoc: Change bumps
lexnv Jun 30, 2025
f6600e6
prdoc: sp-consensus adjust to minor
lexnv Jun 30, 2025
dc04fc8
Update substrate/primitives/consensus/grandpa/src/lib.rs
lexnv Jul 1, 2025
cf0081c
Update substrate/primitives/consensus/grandpa/src/lib.rs
lexnv Jul 1, 2025
bf9ac86
Update substrate/client/network/sync/src/engine.rs
lexnv Jul 1, 2025
62eec60
Update substrate/client/network/sync/src/engine.rs
lexnv Jul 1, 2025
368df03
Update substrate/client/network/sync/src/engine.rs
lexnv Jul 1, 2025
5773ae6
grandpa: Check directly for zero
lexnv Jul 21, 2025
6099519
Merge branch 'master' into lexnv/debug-grandpa
lexnv Jul 21, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -306,7 +306,9 @@ trait JustificationVerifier<Header: HeaderT> {
justification.round,
context.authority_set_id,
&mut signature_buffer,
) {
)
.is_valid()
{
self.process_invalid_signature_vote(precommit_idx).map_err(Error::Precommit)?;
continue
}
Expand Down
45 changes: 45 additions & 0 deletions prdoc/pr_9015.prdoc
Original file line number Diff line number Diff line change
@@ -0,0 +1,45 @@
title: 'consensus/grandpa: Fix high number of peer disconnects with invalid justification'
doc:
- audience: [Node Dev, Node Operator]
description: |-
A grandpa race-casse has been identified in the versi-net stack around authority set changes, which leads to the following:

- T0 / Node A: Completes round (15)
- T1 / Node A: Applies new authority set change and increments the SetID (from 0 to 1)
- T2 / Node B: Sends Precommit for round (15) with SetID (0) -- previous set ID
- T3 / Node B: Applies new authority set change and increments the SetID (1)

In this scenario, Node B is not aware at the moment of sending justifications that the Set ID has changed.
The downstream effect is that Node A will not be able to verify the signature of justifications, since a different SetID is taken into account. This will cascade through the sync engine, where the Node B is wrongfully banned and disconnected.

This PR aims to fix the edge-case by making the grandpa resilient to verifying prior setIDs for signatures.
When the signature of the grandpa justification fails to decode, the prior SetID is also verified. If the prior SetID produces a valid signature, then the outdated justification error is propagated through the code (ie `SignatureResult::OutdatedSet`).

The sync engine will handle the outdated justifications as invalid, but without banning the peer. This leads to increased stability of the network during authority changes, which caused frequent disconnects to versi-net in the past.

### Review Notes
- Main changes that verify prior SetId on failures are placed in [check_message_signature_with_buffer](https://github.com/paritytech/polkadot-sdk/pull/9015/files#diff-359d7a46ea285177e5d86979f62f0f04baabf65d595c61bfe44b6fc01af70d89R458-R501)
- Sync engine no longer disconnects outdated justifications in [process_service_command](https://github.com/paritytech/polkadot-sdk/pull/9015/files#diff-9ab3391aa82ee2b2868ece610100f84502edcf40638dba9ed6953b6e572dfba5R678-R703)

### Testing Done
- Deployed the PR to versi-net with 40 validators
- Prior we have noticed 10/40 validators disconnecting every 15-20 minutes, leading to instability
- Over past 24h the issue has been mitigated: https://grafana.teleport.parity.io/goto/FPNWlmsHR?orgId=1
- Note: bootnodes 0 and 1 are currently running outdated versions that do not incorporate this SetID verification improvement

Part of: https://github.com/paritytech/polkadot-sdk/issues/8872
crates:
- name: sp-consensus-grandpa
bump: minor
- name: bp-header-chain
bump: patch
- name: sc-consensus-grandpa
bump: patch
- name: sp-blockchain
bump: minor
- name: sp-consensus
bump: minor
- name: sc-consensus
bump: minor
- name: sc-network-sync
bump: patch
15 changes: 14 additions & 1 deletion substrate/client/consensus/common/src/import_queue.rs
Original file line number Diff line number Diff line change
Expand Up @@ -141,6 +141,19 @@ pub trait ImportQueue<B: BlockT>: Send {
async fn run(self, link: &dyn Link<B>);
}

/// The result of importing a justification.
#[derive(Debug, PartialEq)]
pub enum JustificationImportResult {
/// Justification was imported successfully.
Success,

/// Justification was not imported successfully.
Failure,

/// Justification was not imported successfully, because it is outdated.
OutdatedJustification,
}

/// Hooks that the verification queue can use to influence the synchronization
/// algorithm.
pub trait Link<B: BlockT>: Send + Sync {
Expand All @@ -159,7 +172,7 @@ pub trait Link<B: BlockT>: Send + Sync {
_who: RuntimeOrigin,
_hash: &B::Hash,
_number: NumberFor<B>,
_success: bool,
_import_result: JustificationImportResult,
) {
}

Expand Down
24 changes: 16 additions & 8 deletions substrate/client/consensus/common/src/import_queue/basic_queue.rs
Original file line number Diff line number Diff line change
Expand Up @@ -34,7 +34,8 @@ use crate::{
buffered_link::{self, BufferedLinkReceiver, BufferedLinkSender},
import_single_block_metered, verify_single_block_metered, BlockImportError,
BlockImportStatus, BoxBlockImport, BoxJustificationImport, ImportQueue, ImportQueueService,
IncomingBlock, Link, RuntimeOrigin, SingleBlockVerificationOutcome, Verifier, LOG_TARGET,
IncomingBlock, JustificationImportResult, Link, RuntimeOrigin,
SingleBlockVerificationOutcome, Verifier, LOG_TARGET,
},
metrics::Metrics,
};
Expand Down Expand Up @@ -342,8 +343,9 @@ impl<B: BlockT> BlockImportWorker<B> {
) {
let started = std::time::Instant::now();

let success = match self.justification_import.as_mut() {
Some(justification_import) => justification_import
let import_result = match self.justification_import.as_mut() {
Some(justification_import) => {
let result = justification_import
.import_justification(hash, number, justification)
.await
.map_err(|e| {
Expand All @@ -356,16 +358,22 @@ impl<B: BlockT> BlockImportWorker<B> {
e,
);
e
})
.is_ok(),
None => false,
});
match result {
Ok(()) => JustificationImportResult::Success,
Err(sp_consensus::Error::OutdatedJustification) =>
JustificationImportResult::OutdatedJustification,
Err(_) => JustificationImportResult::Failure,
}
},
None => JustificationImportResult::Failure,
};

if let Some(metrics) = self.metrics.as_ref() {
metrics.justification_import_time.observe(started.elapsed().as_secs_f64());
}

self.result_sender.justification_imported(who, &hash, number, success);
self.result_sender.justification_imported(who, &hash, number, import_result);
}
}

Expand Down Expand Up @@ -579,7 +587,7 @@ mod tests {
_who: RuntimeOrigin,
hash: &Hash,
_number: BlockNumber,
_success: bool,
_import_result: JustificationImportResult,
) {
self.events.lock().push(Event::JustificationImported(*hash))
}
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -38,7 +38,7 @@
//! });
//! ```

use crate::import_queue::{Link, RuntimeOrigin};
use crate::import_queue::{JustificationImportResult, Link, RuntimeOrigin};
use futures::prelude::*;
use sc_utils::mpsc::{tracing_unbounded, TracingUnboundedReceiver, TracingUnboundedSender};
use sp_runtime::traits::{Block as BlockT, NumberFor};
Expand Down Expand Up @@ -84,7 +84,7 @@ impl<B: BlockT> Clone for BufferedLinkSender<B> {
/// Internal buffered message.
pub enum BlockImportWorkerMsg<B: BlockT> {
BlocksProcessed(usize, usize, Vec<(BlockImportResult<B>, B::Hash)>),
JustificationImported(RuntimeOrigin, B::Hash, NumberFor<B>, bool),
JustificationImported(RuntimeOrigin, B::Hash, NumberFor<B>, JustificationImportResult),
RequestJustification(B::Hash, NumberFor<B>),
}

Expand All @@ -105,9 +105,9 @@ impl<B: BlockT> Link<B> for BufferedLinkSender<B> {
who: RuntimeOrigin,
hash: &B::Hash,
number: NumberFor<B>,
success: bool,
import_result: JustificationImportResult,
) {
let msg = BlockImportWorkerMsg::JustificationImported(who, *hash, number, success);
let msg = BlockImportWorkerMsg::JustificationImported(who, *hash, number, import_result);
let _ = self.tx.unbounded_send(msg);
}

Expand All @@ -129,8 +129,8 @@ impl<B: BlockT> BufferedLinkReceiver<B> {
match msg {
BlockImportWorkerMsg::BlocksProcessed(imported, count, results) =>
link.blocks_processed(imported, count, results),
BlockImportWorkerMsg::JustificationImported(who, hash, number, success) =>
link.justification_imported(who, &hash, number, success),
BlockImportWorkerMsg::JustificationImported(who, hash, number, import_result) =>
link.justification_imported(who, &hash, number, import_result),
BlockImportWorkerMsg::RequestJustification(hash, number) =>
link.request_justification(&hash, number),
}
Expand Down
3 changes: 2 additions & 1 deletion substrate/client/consensus/common/src/lib.rs
Original file line number Diff line number Diff line change
Expand Up @@ -29,7 +29,8 @@ pub use block_import::{
};
pub use import_queue::{
import_single_block, BasicQueue, BlockImportError, BlockImportStatus, BoxBlockImport,
BoxJustificationImport, DefaultImportQueue, ImportQueue, IncomingBlock, Link, Verifier,
BoxJustificationImport, DefaultImportQueue, ImportQueue, IncomingBlock,
JustificationImportResult, Link, Verifier,
};

mod longest_chain;
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -927,7 +927,9 @@ impl<Block: BlockT> Inner<Block> {
&full.message.signature,
full.round.0,
full.set_id.0,
) {
)
.is_valid()
{
debug!(target: LOG_TARGET, "Bad message signature {}", full.message.id);
telemetry!(
self.config.telemetry;
Expand Down
8 changes: 6 additions & 2 deletions substrate/client/consensus/grandpa/src/communication/mod.rs
Original file line number Diff line number Diff line change
Expand Up @@ -865,7 +865,9 @@ fn check_compact_commit<Block: BlockT>(
round.0,
set_id.0,
&mut buf,
) {
)
.is_valid()
{
debug!(target: LOG_TARGET, "Bad commit message signature {}", id);
telemetry!(
telemetry;
Expand Down Expand Up @@ -952,7 +954,9 @@ fn check_catch_up<Block: BlockT>(

if !sp_consensus_grandpa::check_message_signature_with_buffer(
&msg, id, sig, round, set_id, buf,
) {
)
.is_valid()
{
debug!(target: LOG_TARGET, "Bad catch up message signature {}", id);
telemetry!(
telemetry;
Expand Down
8 changes: 7 additions & 1 deletion substrate/client/consensus/grandpa/src/import.rs
Original file line number Diff line number Diff line change
Expand Up @@ -795,7 +795,13 @@ where
);

let justification = match justification {
Err(e) => return Err(ConsensusError::ClientImport(e.to_string())),
Err(e) => {
return match e {
sp_blockchain::Error::OutdatedJustification =>
Err(ConsensusError::OutdatedJustification),
_ => Err(ConsensusError::ClientImport(e.to_string())),
};
},
Ok(justification) => justification,
};

Expand Down
15 changes: 10 additions & 5 deletions substrate/client/consensus/grandpa/src/justification.rs
Original file line number Diff line number Diff line change
Expand Up @@ -205,17 +205,22 @@ impl<Block: BlockT> GrandpaJustification<Block> {
let mut buf = Vec::new();
let mut visited_hashes = HashSet::new();
for signed in self.justification.commit.precommits.iter() {
if !sp_consensus_grandpa::check_message_signature_with_buffer(
let signature_result = sp_consensus_grandpa::check_message_signature_with_buffer(
&finality_grandpa::Message::Precommit(signed.precommit.clone()),
&signed.id,
&signed.signature,
self.justification.round,
set_id,
&mut buf,
) {
return Err(ClientError::BadJustification(
"invalid signature for precommit in grandpa justification".to_string(),
))
);
match signature_result {
sp_consensus_grandpa::SignatureResult::Invalid =>
return Err(ClientError::BadJustification(
"invalid signature for precommit in grandpa justification".to_string(),
)),
sp_consensus_grandpa::SignatureResult::OutdatedSet =>
return Err(ClientError::OutdatedJustification),
sp_consensus_grandpa::SignatureResult::Valid => {},
}

if base_hash == signed.precommit.target_hash {
Expand Down
39 changes: 29 additions & 10 deletions substrate/client/network/sync/src/engine.rs
Original file line number Diff line number Diff line change
Expand Up @@ -670,17 +670,36 @@ where
ToServiceCommand::BlocksProcessed(imported, count, results) => {
self.strategy.on_blocks_processed(imported, count, results);
},
ToServiceCommand::JustificationImported(peer_id, hash, number, success) => {
ToServiceCommand::JustificationImported(peer_id, hash, number, import_result) => {
let success =
matches!(import_result, sc_consensus::JustificationImportResult::Success);
self.strategy.on_justification_import(hash, number, success);
if !success {
log::info!(
target: LOG_TARGET,
"💔 Invalid justification provided by {peer_id} for #{hash}",
);
self.network_service
.disconnect_peer(peer_id, self.block_announce_protocol_name.clone());
self.network_service
.report_peer(peer_id, ReputationChange::new_fatal("Invalid justification"));

match import_result {
sc_consensus::JustificationImportResult::OutdatedJustification => {
log::info!(
target: LOG_TARGET,
"💔 Outdated justification provided by {peer_id} for #{hash}",
);
},
sc_consensus::JustificationImportResult::Failure => {
log::info!(
target: LOG_TARGET,
"💔 Invalid justification provided by {peer_id} for #{hash}",
);
self.network_service
.disconnect_peer(peer_id, self.block_announce_protocol_name.clone());
self.network_service.report_peer(
peer_id,
ReputationChange::new_fatal("Invalid justification"),
);
},
sc_consensus::JustificationImportResult::Success => {
log::debug!(
target: LOG_TARGET,
"Justification for block #{hash} ({number}) imported from {peer_id} successfully",
);
},
}
},
ToServiceCommand::AnnounceBlock(hash, data) => self.announce_block(hash, data),
Expand Down
2 changes: 1 addition & 1 deletion substrate/client/network/sync/src/service/mock.rs
Original file line number Diff line number Diff line change
Expand Up @@ -55,7 +55,7 @@ mockall::mock! {
who: PeerId,
hash: &B::Hash,
number: NumberFor<B>,
success: bool,
import_result: sc_consensus::JustificationImportResult,
);
fn request_justification(&self, hash: &B::Hash, number: NumberFor<B>);
}
Expand Down
17 changes: 11 additions & 6 deletions substrate/client/network/sync/src/service/syncing_service.rs
Original file line number Diff line number Diff line change
Expand Up @@ -21,7 +21,9 @@ use crate::types::{ExtendedPeerInfo, SyncEvent, SyncEventStream, SyncStatus, Syn
use futures::{channel::oneshot, Stream};
use sc_network_types::PeerId;

use sc_consensus::{BlockImportError, BlockImportStatus, JustificationSyncLink, Link};
use sc_consensus::{
BlockImportError, BlockImportStatus, JustificationImportResult, JustificationSyncLink, Link,
};
use sc_network::{NetworkBlock, NetworkSyncForkRequest};
use sc_utils::mpsc::{tracing_unbounded, TracingUnboundedSender};
use sp_runtime::traits::{Block as BlockT, NumberFor};
Expand All @@ -44,7 +46,7 @@ pub enum ToServiceCommand<B: BlockT> {
usize,
Vec<(Result<BlockImportStatus<NumberFor<B>>, BlockImportError>, B::Hash)>,
),
JustificationImported(PeerId, B::Hash, NumberFor<B>, bool),
JustificationImported(PeerId, B::Hash, NumberFor<B>, JustificationImportResult),
AnnounceBlock(B::Hash, Option<Vec<u8>>),
NewBestBlockImported(B::Hash, NumberFor<B>),
EventStream(TracingUnboundedSender<SyncEvent>),
Expand Down Expand Up @@ -192,11 +194,14 @@ impl<B: BlockT> Link<B> for SyncingService<B> {
who: PeerId,
hash: &B::Hash,
number: NumberFor<B>,
success: bool,
import_result: JustificationImportResult,
) {
let _ = self
.tx
.unbounded_send(ToServiceCommand::JustificationImported(who, *hash, number, success));
let _ = self.tx.unbounded_send(ToServiceCommand::JustificationImported(
who,
*hash,
number,
import_result,
));
}

fn request_justification(&self, hash: &B::Hash, number: NumberFor<B>) {
Expand Down
3 changes: 3 additions & 0 deletions substrate/primitives/blockchain/src/error.rs
Original file line number Diff line number Diff line change
Expand Up @@ -102,6 +102,9 @@ pub enum Error {
#[error("bad justification for header: {0}")]
BadJustification(String),

#[error("outdated justification")]
OutdatedJustification,

#[error("This method is not currently available when running in light client mode")]
NotAvailableOnLightClient,

Expand Down
3 changes: 3 additions & 0 deletions substrate/primitives/consensus/common/src/error.rs
Original file line number Diff line number Diff line change
Expand Up @@ -41,6 +41,9 @@ pub enum Error {
/// Justification requirements not met.
#[error("Invalid justification")]
InvalidJustification,
/// The justification provided is outdated and corresponds to a previous set.
#[error("Import failed with outdated justification")]
OutdatedJustification,
/// Error from the client while importing.
#[error("Import failed: {0}")]
ClientImport(String),
Expand Down
Loading
Loading