-
Notifications
You must be signed in to change notification settings - Fork 1.2k
Sync: Gracefully handle blocks from an unknown fork #11085
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
1d7802f
e5a484a
07f76b6
7bd4b55
bae13b4
ff8cc86
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,10 @@ | ||
| title: 'Sync: Gracefully handle blocks from an unknown fork' | ||
| doc: | ||
| - audience: Node Dev | ||
| description: |- | ||
| There is the possibility that node A connects to node B. Both are at the same best block (20). Shortly after this, node B announces a block 21 that is from a completely different fork (started at e.g. block 15). Right now this leads to node A downloading this block 21 and then failing to import it because it doesn't have the parent block. | ||
|
|
||
| This pull request solves this situation by putting the peer into ancestry search when it detects a fork that is "unknown". | ||
| crates: | ||
| - name: sc-network-sync | ||
| bump: patch |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -348,7 +348,14 @@ pub(crate) enum PeerSyncState<B: BlockT> { | |
| /// Available for sync requests. | ||
| Available, | ||
| /// Searching for ancestors the Peer has in common with us. | ||
| AncestorSearch { start: NumberFor<B>, current: NumberFor<B>, state: AncestorSearchState<B> }, | ||
| AncestorSearch { | ||
| /// The best queued number when starting the ancestor search. | ||
| start: NumberFor<B>, | ||
| /// The current block that is being downloaded. | ||
| current: NumberFor<B>, | ||
| /// The state of the search. | ||
| state: AncestorSearchState<B>, | ||
| }, | ||
| /// Actively downloading new blocks, starting from the given Number. | ||
| DownloadingNew(NumberFor<B>), | ||
| /// Downloading a stale block with given Hash. Stale means that it is a | ||
|
|
@@ -497,6 +504,7 @@ where | |
| let ancient_parent = parent_status == BlockStatus::InChainPruned; | ||
|
|
||
| let known = self.is_known(&hash); | ||
| let is_major_syncing = self.is_major_syncing(); | ||
| let peer = if let Some(peer) = self.peers.get_mut(&peer_id) { | ||
| peer | ||
| } else { | ||
|
|
@@ -509,6 +517,11 @@ where | |
| return None; | ||
| } | ||
|
|
||
| // The node is continuing a known fork if either the block itself is known, the | ||
| // parent is known or the block references the previously announced `best_hash`. | ||
| let continues_known_fork = | ||
| known || known_parent || announce.header.parent_hash() == &peer.best_hash; | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Can we run into a race condition here? Something like a previous block triggered ancestor search and peer.best_hash is already set. But then next block gets announced and this condition would be true.
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. When the peer is in ancestry search mode, this method aborts early (check above). |
||
|
|
||
| let peer_info = is_best.then(|| { | ||
| // update their best block | ||
| peer.best_number = number; | ||
|
|
@@ -520,12 +533,33 @@ where | |
| // If the announced block is the best they have and is not ahead of us, our common number | ||
| // is either one further ahead or it's the one they just announced, if we know about it. | ||
| if is_best { | ||
| if known && self.best_queued_number >= number { | ||
| self.update_peer_common_number(&peer_id, number); | ||
| let best_queued_number = self.best_queued_number; | ||
|
|
||
| if known && best_queued_number >= number { | ||
| peer.update_common_number(number); | ||
| } else if announce.header.parent_hash() == &self.best_queued_hash || | ||
| known_parent && self.best_queued_number >= number | ||
| known_parent && best_queued_number >= number | ||
| { | ||
| self.update_peer_common_number(&peer_id, number.saturating_sub(One::one())); | ||
| peer.update_common_number(number.saturating_sub(One::one())); | ||
| } | ||
|
|
||
| // If this announced block isn't following any known fork, we have to start an | ||
| // ancestor search to find out our real common block. However, we skip this during | ||
| // major sync to avoid pulling peers out of the download pool. | ||
| if !continues_known_fork && !is_major_syncing { | ||
| let current = number.min(best_queued_number); | ||
| peer.common_number = peer.common_number.min(self.client.info().finalized_number); | ||
| peer.state = PeerSyncState::AncestorSearch { | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Could we get this peer stuck in an
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The node can not go back to block 21, especially if this block is below the last finalized block. Ancestry search is always the state with one peer and not with all peers together. So, if we are doing ancestry search with B, we can still import blocks from other peers. |
||
| current, | ||
| start: best_queued_number, | ||
| state: AncestorSearchState::ExponentialBackoff(One::one()), | ||
| }; | ||
|
|
||
| let request = ancestry_request::<B>(current); | ||
| let action = self.create_block_request_action(peer_id, request); | ||
| self.actions.push(action); | ||
|
|
||
| return peer_info; | ||
| } | ||
| } | ||
| self.allowed_requests.add(&peer_id); | ||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about moving this inside
where it is only used?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This doesn't work, because then
peer.best_hashmaybe is already updated.