sync/fix: Clear gap sync on known imported blocks#8445
Conversation
Signed-off-by: Alexandru Vasile <alexandru.vasile@parity.io>
Signed-off-by: Alexandru Vasile <alexandru.vasile@parity.io>
Signed-off-by: Alexandru Vasile <alexandru.vasile@parity.io>
Signed-off-by: Alexandru Vasile <alexandru.vasile@parity.io>
Signed-off-by: Alexandru Vasile <alexandru.vasile@parity.io>
Signed-off-by: Alexandru Vasile <alexandru.vasile@parity.io>
|
All GitHub workflows were cancelled due to failure one of the required jobs. |
|
Unrelated test failing |
skunert
left a comment
There was a problem hiding this comment.
Changes look good. Just one more sanity check:
- When we warp sync to block x, the target of the gap sync will be block x - 1.
- There is not reasonable way that we import a known block x - 1 without completing the gap sync.
I am just wondering why it was originally done like this.
Signed-off-by: Alexandru Vasile <alexandru.vasile@parity.io>
Signed-off-by: Alexandru Vasile <alexandru.vasile@parity.io>
… lexnv/investigate-warpsync
|
Coming back to some older PRs after handling litep2p, sorry for the delay 🙏
@skunert Yep, that makes sense to me. Maybe this was always a missed case on the sync implementation? Maybe @dmitry-markin do you know if there were some edge-cases in the past with known blocks that pointed us towards not closing the gap on importing known blocks? 🤔 |
| // Note: Ideally we can deduce this information with #[derive(derive_more::Debug)]. | ||
| // However, we'd need a bump to the latest version 2 of the crate. |
There was a problem hiding this comment.
Will come with a follow-up, IIRC they have changed some feature-flags and now we have to explicitly select them :D
* master: omni-node: fix `benchmark pallet` to work with `--runtime` (#8594) Handle and suppress "New unknown `FromSwarm` libp2p event" warning (#8731) Implement detailed logging for XCM failures (#8724) [pallet-revive] contract's nonce starts at 1 (#8734) sync/fix: Clear gap sync on known imported blocks (#8445) [PoP] Add personhood tracking pallets (#8164) client/net: Use litep2p as the default network backend (#8461) Unflake `returns_status_for_pruned_blocks` (#8709) [AHM] Report the weights of epmb pallet to expose kusama and polkadot weights (#8704) Remove all XCM dependencies from `pallet-revive` (#8584) Docker master image tag fix (#8711) Record ed as part of the storage deposit (#8718) [pallet-revive] update dry-run logic (#8662) feat: add collator peer ID to ParachainInherentData (#8708) Nest errors in pallet-xcm (#7730) pallet-assets ERC20 precompile (#8554) Broker: Introduce min price + adjust renewals to lower market. (#8630) [AHM] Staking async fixes for XCM and election planning (#8422) Staking (EPMB): Add defensive error handling to voter snapshot creation and solution verification (#8687)
This PR ensures that warp sync gaps are properly cleared when known blocks are imported. Previously, gaps were only removed in response to `ImportedUnknown` events. This limitation caused issues for asset-hub and bridge-hub collators, which remained stuck in the "Block history" state without progressing. The root cause lies in the client.info() reporting a gap during node startup or restart (ie block verification fails). In some cases, a peer may respond with the missing blocks after we’ve already imported them locally, leaving the gap open. Grafana link: https://grafana.teleport.parity.io/goto/jCcsBLxNg?orgId=1 Traces from production: ``` 2025-05-06 12:55:34.251 DEBUG main sync: [Parachain] Starting gap sync #4935955 - #4935955 2025-05-06 12:55:34.558 TRACE tokio-runtime-worker sync: [Parachain] New gap block request for 12D3KooWAVQMhkXmc5ueSYasdsRWQbKus2YGZ6HDZUB4ViJMCxXy, (best:5103253, common:5103253) BlockRequest { id: 0, fields: HEADER | BODY | JUSTIFICATION, from: Number(4935955), direction: Descending, max: Some(1) } 2025-05-06 12:55:34.558 TRACE tokio-runtime-worker sync: [Parachain] Processed `SyncingAction::StartRequest` to 12D3KooWAVQMhkXmc5ueSYasdsRWQbKus2YGZ6HDZUB4ViJMCxXy with strategy key StrategyKey("ChainSync"). 2025-05-06 12:55:34.608 TRACE tokio-runtime-worker sync: [Parachain] BlockResponse 0 from 12D3KooWAVQMhkXmc5ueSYasdsRWQbKus2YGZ6HDZUB4ViJMCxXy with 1 blocks (4935955) 2025-05-06 12:55:34.608 DEBUG tokio-runtime-worker sync: [Parachain] Drained 1 gap blocks from 4935954 2025-05-06 12:55:35.511 TRACE tokio-runtime-worker sync::import-queue: [Parachain] Starting import of 1 blocks (4935955) 2025-05-06 12:55:35.517 TRACE tokio-runtime-worker sync::import-queue: [Parachain] Block already in chain 4935955: 0x63db2b40cccac020fbc922e5e98bb3955f4cdaa823a2be85ecf22776745ccacc 2025-05-06 12:55:35.517 TRACE tokio-runtime-worker sync::import-queue: [Parachain] Block imported successfully Some(4935955) (0x63db…cacc) 2025-05-06 12:55:35.517 TRACE tokio-runtime-worker sync: [Parachain] Cleared blocks from 4935955 to 4935956 ``` ### Testing Done Added two tests to verify that warp sync gaps are correctly cleared under both block import scenarios. The first test closely follows the operations performed by the node, while the second one emulates the imports. ### Next Steps Added extra debug logs to monitor if the issue persists (pointing towards a corupt database -- ie client.info() always has the gap present). Closes: #8416 cc @paritytech/networking --------- Signed-off-by: Alexandru Vasile <alexandru.vasile@parity.io> Co-authored-by: Nikolai Kozlov <1431590+nkpar@users.noreply.github.com>
This PR ensures that warp sync gaps are properly cleared when known blocks are imported. Previously, gaps were only removed in response to `ImportedUnknown` events. This limitation caused issues for asset-hub and bridge-hub collators, which remained stuck in the "Block history" state without progressing. The root cause lies in the client.info() reporting a gap during node startup or restart (ie block verification fails). In some cases, a peer may respond with the missing blocks after we’ve already imported them locally, leaving the gap open. Grafana link: https://grafana.teleport.parity.io/goto/jCcsBLxNg?orgId=1 Traces from production: ``` 2025-05-06 12:55:34.251 DEBUG main sync: [Parachain] Starting gap sync #4935955 - #4935955 2025-05-06 12:55:34.558 TRACE tokio-runtime-worker sync: [Parachain] New gap block request for 12D3KooWAVQMhkXmc5ueSYasdsRWQbKus2YGZ6HDZUB4ViJMCxXy, (best:5103253, common:5103253) BlockRequest { id: 0, fields: HEADER | BODY | JUSTIFICATION, from: Number(4935955), direction: Descending, max: Some(1) } 2025-05-06 12:55:34.558 TRACE tokio-runtime-worker sync: [Parachain] Processed `SyncingAction::StartRequest` to 12D3KooWAVQMhkXmc5ueSYasdsRWQbKus2YGZ6HDZUB4ViJMCxXy with strategy key StrategyKey("ChainSync"). 2025-05-06 12:55:34.608 TRACE tokio-runtime-worker sync: [Parachain] BlockResponse 0 from 12D3KooWAVQMhkXmc5ueSYasdsRWQbKus2YGZ6HDZUB4ViJMCxXy with 1 blocks (4935955) 2025-05-06 12:55:34.608 DEBUG tokio-runtime-worker sync: [Parachain] Drained 1 gap blocks from 4935954 2025-05-06 12:55:35.511 TRACE tokio-runtime-worker sync::import-queue: [Parachain] Starting import of 1 blocks (4935955) 2025-05-06 12:55:35.517 TRACE tokio-runtime-worker sync::import-queue: [Parachain] Block already in chain 4935955: 0x63db2b40cccac020fbc922e5e98bb3955f4cdaa823a2be85ecf22776745ccacc 2025-05-06 12:55:35.517 TRACE tokio-runtime-worker sync::import-queue: [Parachain] Block imported successfully Some(4935955) (0x63db…cacc) 2025-05-06 12:55:35.517 TRACE tokio-runtime-worker sync: [Parachain] Cleared blocks from 4935955 to 4935956 ``` ### Testing Done Added two tests to verify that warp sync gaps are correctly cleared under both block import scenarios. The first test closely follows the operations performed by the node, while the second one emulates the imports. ### Next Steps Added extra debug logs to monitor if the issue persists (pointing towards a corupt database -- ie client.info() always has the gap present). Closes: paritytech#8416 cc @paritytech/networking --------- Signed-off-by: Alexandru Vasile <alexandru.vasile@parity.io> Co-authored-by: Nikolai Kozlov <1431590+nkpar@users.noreply.github.com>
This PR ensures that warp sync gaps are properly cleared when known blocks are imported. Previously, gaps were only removed in response to `ImportedUnknown` events. This limitation caused issues for asset-hub and bridge-hub collators, which remained stuck in the "Block history" state without progressing. The root cause lies in the client.info() reporting a gap during node startup or restart (ie block verification fails). In some cases, a peer may respond with the missing blocks after we’ve already imported them locally, leaving the gap open. Grafana link: https://grafana.teleport.parity.io/goto/jCcsBLxNg?orgId=1 Traces from production: ``` 2025-05-06 12:55:34.251 DEBUG main sync: [Parachain] Starting gap sync #4935955 - #4935955 2025-05-06 12:55:34.558 TRACE tokio-runtime-worker sync: [Parachain] New gap block request for 12D3KooWAVQMhkXmc5ueSYasdsRWQbKus2YGZ6HDZUB4ViJMCxXy, (best:5103253, common:5103253) BlockRequest { id: 0, fields: HEADER | BODY | JUSTIFICATION, from: Number(4935955), direction: Descending, max: Some(1) } 2025-05-06 12:55:34.558 TRACE tokio-runtime-worker sync: [Parachain] Processed `SyncingAction::StartRequest` to 12D3KooWAVQMhkXmc5ueSYasdsRWQbKus2YGZ6HDZUB4ViJMCxXy with strategy key StrategyKey("ChainSync"). 2025-05-06 12:55:34.608 TRACE tokio-runtime-worker sync: [Parachain] BlockResponse 0 from 12D3KooWAVQMhkXmc5ueSYasdsRWQbKus2YGZ6HDZUB4ViJMCxXy with 1 blocks (4935955) 2025-05-06 12:55:34.608 DEBUG tokio-runtime-worker sync: [Parachain] Drained 1 gap blocks from 4935954 2025-05-06 12:55:35.511 TRACE tokio-runtime-worker sync::import-queue: [Parachain] Starting import of 1 blocks (4935955) 2025-05-06 12:55:35.517 TRACE tokio-runtime-worker sync::import-queue: [Parachain] Block already in chain 4935955: 0x63db2b40cccac020fbc922e5e98bb3955f4cdaa823a2be85ecf22776745ccacc 2025-05-06 12:55:35.517 TRACE tokio-runtime-worker sync::import-queue: [Parachain] Block imported successfully Some(4935955) (0x63db…cacc) 2025-05-06 12:55:35.517 TRACE tokio-runtime-worker sync: [Parachain] Cleared blocks from 4935955 to 4935956 ``` ### Testing Done Added two tests to verify that warp sync gaps are correctly cleared under both block import scenarios. The first test closely follows the operations performed by the node, while the second one emulates the imports. ### Next Steps Added extra debug logs to monitor if the issue persists (pointing towards a corupt database -- ie client.info() always has the gap present). Closes: paritytech#8416 cc @paritytech/networking --------- Signed-off-by: Alexandru Vasile <alexandru.vasile@parity.io> Co-authored-by: Nikolai Kozlov <1431590+nkpar@users.noreply.github.com>
This PR ensures that warp sync gaps are properly cleared when known blocks are imported. Previously, gaps were only removed in response to `ImportedUnknown` events. This limitation caused issues for asset-hub and bridge-hub collators, which remained stuck in the "Block history" state without progressing. The root cause lies in the client.info() reporting a gap during node startup or restart (ie block verification fails). In some cases, a peer may respond with the missing blocks after we’ve already imported them locally, leaving the gap open. Grafana link: https://grafana.teleport.parity.io/goto/jCcsBLxNg?orgId=1 Traces from production: ``` 2025-05-06 12:55:34.251 DEBUG main sync: [Parachain] Starting gap sync #4935955 - #4935955 2025-05-06 12:55:34.558 TRACE tokio-runtime-worker sync: [Parachain] New gap block request for 12D3KooWAVQMhkXmc5ueSYasdsRWQbKus2YGZ6HDZUB4ViJMCxXy, (best:5103253, common:5103253) BlockRequest { id: 0, fields: HEADER | BODY | JUSTIFICATION, from: Number(4935955), direction: Descending, max: Some(1) } 2025-05-06 12:55:34.558 TRACE tokio-runtime-worker sync: [Parachain] Processed `SyncingAction::StartRequest` to 12D3KooWAVQMhkXmc5ueSYasdsRWQbKus2YGZ6HDZUB4ViJMCxXy with strategy key StrategyKey("ChainSync"). 2025-05-06 12:55:34.608 TRACE tokio-runtime-worker sync: [Parachain] BlockResponse 0 from 12D3KooWAVQMhkXmc5ueSYasdsRWQbKus2YGZ6HDZUB4ViJMCxXy with 1 blocks (4935955) 2025-05-06 12:55:34.608 DEBUG tokio-runtime-worker sync: [Parachain] Drained 1 gap blocks from 4935954 2025-05-06 12:55:35.511 TRACE tokio-runtime-worker sync::import-queue: [Parachain] Starting import of 1 blocks (4935955) 2025-05-06 12:55:35.517 TRACE tokio-runtime-worker sync::import-queue: [Parachain] Block already in chain 4935955: 0x63db2b40cccac020fbc922e5e98bb3955f4cdaa823a2be85ecf22776745ccacc 2025-05-06 12:55:35.517 TRACE tokio-runtime-worker sync::import-queue: [Parachain] Block imported successfully Some(4935955) (0x63db…cacc) 2025-05-06 12:55:35.517 TRACE tokio-runtime-worker sync: [Parachain] Cleared blocks from 4935955 to 4935956 ``` ### Testing Done Added two tests to verify that warp sync gaps are correctly cleared under both block import scenarios. The first test closely follows the operations performed by the node, while the second one emulates the imports. ### Next Steps Added extra debug logs to monitor if the issue persists (pointing towards a corupt database -- ie client.info() always has the gap present). Closes: #8416 cc @paritytech/networking --------- Signed-off-by: Alexandru Vasile <alexandru.vasile@parity.io> Co-authored-by: Nikolai Kozlov <1431590+nkpar@users.noreply.github.com>
This PR ensures that warp sync gaps are properly cleared when known blocks are imported. Previously, gaps were only removed in response to
ImportedUnknownevents.This limitation caused issues for asset-hub and bridge-hub collators, which remained stuck in the "Block history" state without progressing.
The root cause lies in the client.info() reporting a gap during node startup or restart (ie block verification fails). In some cases, a peer may respond with the missing blocks after we’ve already imported them locally, leaving the gap open.
Grafana link: https://grafana.teleport.parity.io/goto/jCcsBLxNg?orgId=1
Traces from production:
Testing Done
Added two tests to verify that warp sync gaps are correctly cleared under both block import scenarios. The first test closely follows the operations performed by the node, while the second one emulates the imports.
Next Steps
Added extra debug logs to monitor if the issue persists (pointing towards a corupt database -- ie client.info() always has the gap present).
Closes: #8416
cc @paritytech/networking