fix: the race condition in compact #3015
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
In #2903, I thought the race condition fixing is not urgent based on my silly assumption of its low possibility:
But actually I find it's NOT, based on a bug I find today which has a
"block_height": null
for/v1/txhashset/outputs
node API query result.Analysis of the following log:
20190831 22:50:12.752 DEBUG grin_chain::chain - compact: head: 228987, tail: 216353, diff: 12634, horizon: 10080 20190831 22:50:12.752 DEBUG grin_chain::txhashset::txhashset - txhashset: starting compaction... ... 20190831 22:50:19.443 DEBUG grin_servers::common::adapters - Received compact_block 057049f212c5 at 228988 from 35.247.29.132:3414 [out/kern/kern_ids: 1/1/0] going to process. ... 20190831 22:51:25.050 DEBUG grin_chain::txhashset::txhashset - txhashset: ... compaction finished 20190831 22:51:25.388 DEBUG grin_chain::txhashset::txhashset - txhashset: rebuild_index: 117974 UTXOs, took 0s ... 20190831 22:51:26.216 DEBUG grin_chain::pipe - pipe: process_block 057049f212c5 at 228988 [in/out/kern: 0/1/1] 20190831 22:51:26.218 DEBUG grin_chain::pipe - pipe: header_head updated to 057049f212c5 at 228988 ... 20190831 22:51:26.261 DEBUG grin_chain::chain - rebuild_height_for_pos: rebuilding 117974 output_pos's height...
It shows the gap time between
compact
starting andrebuild_height_for_pos
could be very big (1 minutes 14 seconds at above log). During this big gap, there could be some new blocks queued there to be processed. And once the new blocks processed before calling this
rebuild_height_for_pos
, the problem will happen:rebuild_height_for_pos
will lose the indexes of the new processed blocks, because a simplebatch.clear_output_pos_height()
there inrebuild_height_for_pos
.The fix solution is to replace the
rebuild_index
with the newrebuild_height_for_pos
, to avoid the 2-stages rebuilding.Note:
txhashset_write
is still using the 2-stages rebuilding. But it's not a problem like in thecompact
, because thetxhashset_write
only can be called on thestate sync
stage, which forbid the block processing.txhashset_write
refactoring (to avoid multiple rebuilding split into multiple locking) can be considered after the related PR Split header MMR (and sync MMR) out from txhashset #3004 merged, to also remove this 2-stages rebuilding. (but that's a pure improvement, not a bug fix like this PR).