Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: the race condition in compact #3015

Merged
merged 1 commit into from
Sep 4, 2019
Merged

fix: the race condition in compact #3015

merged 1 commit into from
Sep 4, 2019

Conversation

garyyu
Copy link
Contributor

@garyyu garyyu commented Sep 2, 2019

In #2903, I thought the race condition fixing is not urgent based on my silly assumption of its low possibility:

There could be a race condition here if a new block coming in and seize the locking among these rebuild, even in a very low possibility.

But actually I find it's NOT, based on a bug I find today which has a "block_height": null for /v1/txhashset/outputs node API query result.

$ curl -0 -XGET  http://127.0.0.1:13413/v1/txhashset/outputs?start_index=320114\&max=1

{
  "highest_index": 321346,
  "last_retrieved_index": 320114,
  "outputs": [
    {
      "output_type": "Coinbase",
      "commit": "08511d10773136727b13f6bcd21dc62d066425439bdcb708d62456fc8b2f307691",
      "spent": true,
      "proof": "6180cee3 ... ...",
      "proof_hash": "9551a07966bc438f106c982f7eb734b928632662189ebe1d271616f592f697da",
      "block_height": null,
      "merkle_proof": null,
      "mmr_index": 0
    }
  ]
}

Analysis of the following log:

20190831 22:50:12.752 DEBUG grin_chain::chain - compact: head: 228987, tail: 216353, diff: 12634, horizon: 10080
20190831 22:50:12.752 DEBUG grin_chain::txhashset::txhashset - txhashset: starting compaction...
...
20190831 22:50:19.443 DEBUG grin_servers::common::adapters - Received compact_block 057049f212c5 at 228988 from 35.247.29.132:3414 [out/kern/kern_ids: 1/1/0] going to process.
...
20190831 22:51:25.050 DEBUG grin_chain::txhashset::txhashset - txhashset: ... compaction finished
20190831 22:51:25.388 DEBUG grin_chain::txhashset::txhashset - txhashset: rebuild_index: 117974 UTXOs, took 0s
...
20190831 22:51:26.216 DEBUG grin_chain::pipe - pipe: process_block 057049f212c5 at 228988 [in/out/kern: 0/1/1]
20190831 22:51:26.218 DEBUG grin_chain::pipe - pipe: header_head updated to 057049f212c5 at 228988
...
20190831 22:51:26.261 DEBUG grin_chain::chain - rebuild_height_for_pos: rebuilding 117974 output_pos's height...

It shows the gap time between compact starting and rebuild_height_for_pos could be very big (
1 minutes 14 seconds at above log). During this big gap, there could be some new blocks queued there to be processed. And once the new blocks processed before calling this rebuild_height_for_pos, the problem will happen: rebuild_height_for_pos will lose the indexes of the new processed blocks, because a simple batch.clear_output_pos_height() there in rebuild_height_for_pos.

The fix solution is to replace the rebuild_index with the new rebuild_height_for_pos, to avoid the 2-stages rebuilding.

Note:

  • txhashset_write is still using the 2-stages rebuilding. But it's not a problem like in the compact, because the txhashset_write only can be called on the state sync stage, which forbid the block processing.
  • txhashset_write refactoring (to avoid multiple rebuilding split into multiple locking) can be considered after the related PR Split header MMR (and sync MMR) out from txhashset #3004 merged, to also remove this 2-stages rebuilding. (but that's a pure improvement, not a bug fix like this PR).

@garyyu garyyu added the bug label Sep 2, 2019
@garyyu garyyu requested a review from antiochp September 3, 2019 13:46
Copy link
Member

@antiochp antiochp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me.

@garyyu garyyu added the fixed label Sep 3, 2019
@antiochp antiochp merged commit d55ebe8 into mimblewimble:master Sep 4, 2019
@garyyu garyyu deleted the fix-race branch December 11, 2019 06:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants