Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Mining][SHA256D] Reduce lock contention with SHA256D #915

Merged
merged 1 commit into from
May 2, 2021

Conversation

Zannick
Copy link
Collaborator

@Zannick Zannick commented Apr 1, 2021

Problem

While mining SHA256D, the wallet periodically gets stuck, failing to stay caught up with the chain. (At the same time, the mining significantly slows down.)

Root Cause

Lock contention; SHA256D is very fast, completing its loop in subsecond time on a modern cpu, meaning every thread goes through lock-heavy sections of the block template creation code multiple times per second. In comparison, RandomX on the same machine takes minutes to complete its loop, usually being interrupted by the creation of a new block instead of reaching the maximal number of attempts.

Solution

Increase the number of loop iterations for SHA256D. To avoid doing useless work, also periodically check for a new block to end the outer iteration early.

  • Adds an additional loop inside BitcoinMiner for SHA256D only, which reduces (ideally, minimizes) the number of times per chain tip that the mining thread needs to grab cs_main.
  • Periodically checks chainActive.Height() to see if a block has already been found, in order to exit early, similar to RandomX.

Bounty PR

#862

Bounty Payment Address

sv1qqpswvjy7s9yrpcmrt3fu0kd8rutrdlq675ntyjxjzn09f965z9dutqpqgg85esvg8mhmyka5kq5vae0qnuw4428vs9d2gu4nz643jv5a72wkqqq73mnxr

Unit Testing Results

Manually ran all tests in src/test/ via a script and compared to master. No differences.
Ran on testnet.

@CaveSpectre11 CaveSpectre11 added Component: Core App Related to the application itself. Component: Miner Both PoW and PoS block creation Tag: Waiting For Code Review Waiting for code review from a core developer labels Apr 4, 2021
Copy link
Collaborator

@CaveSpectre11 CaveSpectre11 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good stuff, Thank you Zannick. One thing to keep in mind is that atomic changes are best for PRs. Your 3 commits are distinctly different and therefore wouldn't be appropriate to be squashed; however if there's a problem with one of them, then it holds up the other ones.

Unfortunately this must be NACKed; I will explain inline. 4f4528c

src/miner.cpp Outdated
Comment on lines 1008 to 1004
while (nMidTries < nMidLoopCount) {
while (nTries < nInnerLoopCount &&
!(success = CheckProofOfWork(pblock->GetSha256D(midStateHash), pblock->nBits,
Params().GetConsensus(), CBlockHeader::SHA256D_BLOCK))) {
boost::this_thread::interruption_point();
++nTries;
++pblock->nNonce64;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this essentially runs it much longer, I do have some concerns over this change; but I'm not 100% in disagreement with it. I need to give it another look on the next pass.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be clear, though, running the loop longer is the main way I've dealt with the lock contention from setting up the block template and nonce.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The original response was much different, until I read the change description again. I just want to double check everything that passes through when it fully exits and returns.

@Zannick
Copy link
Collaborator Author

Zannick commented Apr 4, 2021

Thanks, I'll get on the changes and split the required commits into separate PRs. Wasn't really sure how many PRs to make for my first PR. =)

@Zannick Zannick changed the title [Mining][SHA256D] Reduce lock contention with SHA256D [WIP][Mining][SHA256D] Reduce lock contention with SHA256D Apr 4, 2021
@CaveSpectre11
Copy link
Collaborator

@Zannick It looks like there's still some stuff left in here that's not in the other two PRs. Do you want to migrate this one, or clean up the commit and submit as a 3rd PR when it's ready? (I'm leaning towards the latter).

@CaveSpectre11 CaveSpectre11 added Tag: Waiting For Developer and removed Tag: Waiting For Code Review Waiting for code review from a core developer labels Apr 5, 2021
@Zannick
Copy link
Collaborator Author

Zannick commented Apr 5, 2021

My intent was to leave this WIP until the logging PR is in, and then clean up, but if you prefer I can close now and resubmit later.

@CaveSpectre11 CaveSpectre11 added the Dev Status: In Progress Someone is actively working on this issue. label Apr 5, 2021
@CaveSpectre11
Copy link
Collaborator

My intent was to leave this WIP until the logging PR is in, and then clean up, but if you prefer I can close now and resubmit later.

No worries; just going through them all and making sure.

@codeofalltrades
Copy link
Collaborator

This PR can be closed correct?

@Zannick
Copy link
Collaborator Author

Zannick commented Apr 16, 2021

I'll revive this after #921 is merged.

Increases the number of hashes a SHA256D miner thread attempts
per created block template, and adds a separate counter to allow
an early continuation when the chain tip changes.

The SHA256D algorithm is fast enough that the main miner loop runs
multiple times per second with the previous loop count. In each loop, a
thread generates a new block template, which requires grabbing cs_main
a couple times. I found this to be a significant source of contention,
often causing both SHA mining and wallet syncing to stop.

In contrast, a RandomX miner thread runs its main loop on the order of
minutes, with a lower loop count, and checks on every
hash attempt whether the tip has changed, resulting in approximately one
block template created per block.

I chose nMidLoopCount to approximately match the RandomX timing, while
nInnerLoopCount is still fast enough to be used as an intermittent
sub-second check of the chain height.
@Zannick Zannick changed the title [WIP][Mining][SHA256D] Reduce lock contention with SHA256D [Mining][SHA256D] Reduce lock contention with SHA256D Apr 18, 2021
@Zannick
Copy link
Collaborator Author

Zannick commented Apr 18, 2021

Ready for re-review.

This fixes #862 in preventing regular, frequent occurrences of the wallet getting stuck. Unfortunately I did see a very rare stuckness in a similar branch (with slightly different changes), which had an interesting symptom of taking minutes for some mining threads to be interrupted once I asked it to stop mining. If I see it again, I'll turn all debugging on before I hit exit, see if I can see what's going on/which threads are blocked, or try other methods of finding lock contention.

Copy link
Collaborator

@codeofalltrades codeofalltrades left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ACK 2270c94
Tested SHA mining on Win10 and Linux 18.04

Copy link
Collaborator

@CaveSpectre11 CaveSpectre11 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

utACK 2270c94

Zannick added a commit to Zannick/veil that referenced this pull request Apr 30, 2021
Validating zerocoin spends requires determining the height of an
accumulator hash, which is done by searching through all the blocks
in the chain starting from genesis, for every txin.

Since the height of a particular hash-denomination pair is not going
to change, we can cache it. This is most useful for in-wallet block
template generation, since each mining thread performs these
validations separately (while holding cs_main).

Adds a SimpleLRUCache template based on 7c9e97a and uses it to
store checksum heights. I chose a particular size for it that should
be large enough for significant spends.

This results in a roughly 30+x speedup of zerocoin spend validation
(26-30ms/txin -> 0.4-0.9ms/txin). Vastly improves Veil-Project#862 alongside Veil-Project#915.
Zannick added a commit to Zannick/veil that referenced this pull request Apr 30, 2021
Validating zerocoin spends requires determining the height of an
accumulator hash, which is done by searching through all the blocks
in the chain starting from genesis, for every txin.

Since the height of a particular hash-denomination pair is not going
to change, we can cache it. This is most useful for in-wallet block
template generation, since each mining thread performs these
validations separately (while holding cs_main).

Adds a SimpleLRUCache template based on 7c9e97a and uses it to
store checksum heights. I chose a particular size for it that should
be large enough for significant spends.

This results in a roughly 30+x speedup of zerocoin spend validation
(26-30ms/txin -> 0.4-0.9ms/txin). Vastly improves Veil-Project#862 alongside Veil-Project#915.
Zannick added a commit to Zannick/veil that referenced this pull request Apr 30, 2021
Validating zerocoin spends requires determining the height of an
accumulator hash, which is done by searching through all the blocks
in the chain starting from genesis, for every txin.

Since the height of a particular hash-denomination pair is not going
to change, we can cache it. This is most useful for in-wallet block
template generation, since each mining thread performs these
validations separately (while holding cs_main).

Adds a SimpleLRUCache template based on 7c9e97a and uses it to
store checksum heights. I chose a particular size for it that should
be large enough for significant spends.

This results in a roughly 30+x speedup of zerocoin spend validation
(26-30ms/txin -> 0.4-0.9ms/txin). Vastly improves Veil-Project#862 alongside Veil-Project#915.
@codeofalltrades codeofalltrades merged commit 4fe97db into Veil-Project:master May 2, 2021
Zannick added a commit to Zannick/veil that referenced this pull request May 2, 2021
Validating zerocoin spends requires determining the height of an
accumulator hash, which is done by searching through all the blocks
in the chain starting from genesis, for every txin.

Since the height of a particular hash-denomination pair is not going
to change, we can cache it. This is most useful for in-wallet block
template generation, since each mining thread performs these
validations separately (while holding cs_main).

Adds a SimpleLRUCache template based on 7c9e97a and uses it to
store checksum heights. I chose a particular size for it that should
be large enough for significant spends.

This results in a roughly 30+x speedup of zerocoin spend validation
(26-30ms/txin -> 0.4-0.9ms/txin). Vastly improves Veil-Project#862 alongside Veil-Project#915.
codeofalltrades added a commit that referenced this pull request May 3, 2021
1235e1c [Zerocoin][Validation] Cache checksum heights (Zannick)

Pull request description:

  ### Problem ###
  While mining SHA256D, the wallet periodically gets stuck, failing to stay caught up with the chain, **even with #915**. (At the same time, the mining significantly slows down.)

  ### Root Cause ###
  Lock contention; the cs_main lock is held throughout critical parts of block template creation, done independently per mining thread. The intermittent stuckness appears correlated with large numbers of txins during zerocoin spend validation. This requires determining the height of an accumulator hash, which is done by searching through all the blocks in the chain starting from genesis, for every txin. My debug logs (with -debug=bench) show very consistent examples like `- Verify 646 txins: 17392.00ms (26.923ms/txin)`.

  ### Solution ###
  Since the height of a particular hash-denomination pair is not going to change, we can cache it. One mining thread will still have to find the height to begin with, but all the other threads will not need to. (Also, hashes may persist to be reused in later blocks; there are likely multiple coins of the same denom with the same accumulator hash.)

  Adds a SimpleLRUCache template based on 7c9e97a and uses it to store checksum heights. I chose a somewhat arbitrary size for it that should be large enough for significant spends.

  This results in a **significant** speedup of zerocoin spend validation (alternately got `- Verify 101 txins: 1.00ms (0.010ms/txin)` , `- Verify 101 txins: 0.00ms (0.000ms/txin)` on one block). Vastly improves #862 alongside #915--I can confirm it still gets stuck without #915.

  ### Bounty PR ###
  #862

  ### Bounty Payment Address ##
  `sv1qqpswvjy7s9yrpcmrt3fu0kd8rutrdlq675ntyjxjzn09f965z9dutqpqgg85esvg8mhmyka5kq5vae0qnuw4428vs9d2gu4nz643jv5a72wkqqq73mnxr`

  ### Testing ###
  testnet: built with `-with-sanitizers=thread -O0 -g` and run with 3/4 threads mining sha256d, and spending tons of 10s at once to stress-test this section.
  mainnet: running 12 threads sha256d, built with #915 and `-O3` and

  ```patch
  diff --git a/src/validation.cpp b/src/validation.cpp
  index f92427109..a2a2c9ef5 100644
  --- a/src/validation.cpp
  +++ b/src/validation.cpp
  @@ -2863,0 +2895,2 @@ bool CChainState::ConnectBlock(const CBlock& block, CValidationState& state, CBl
           return state.DoS(100, error("%s: CheckQueue failed", __func__), REJECT_INVALID, "block-validation-failed");

       int64_t nTime4 = GetTimeMicros(); nTimeVerify += nTime4 - nTime2;
  +    if (nInputs > 20)
  +        LogPrintf("    - Verify %u txins: %.2fms (%.3fms/txin) [%.2fs (%.2fms/blk)]\n", nInputs - 1, MILLI * (nTime4 - nTime2), nInputs <= 1 ? 0 : MILLI * (nTime4 - nTime2) / (nInputs-1), nTimeVerify * MICRO, nTimeVerify * MILLI / nBlocksTotal);
       LogPrint(BCLog::BENCH, "    - Verify %u txins: %.2fms (%.3fms/txin) [%.2fs (%.2fms/blk)]\n", nInputs - 1, MILLI * (nTime4 - nTime2), nInputs <= 1 ? 0 : MILLI * (nTime4 - nTime2) / (nInputs-1), nTimeVerify * MICRO, nTimeVerify * MILLI / nBlocksTotal);
  ```
  for simple performance monitoring.

Tree-SHA512: 922f79e50e592a69d8ad14092218da515b53223aa77664b60c7ae0fa8444d1587e64ec37cdca17234bb3742cd54f52a8d7c1de9f3333d0521612fec651536395
@Zannick Zannick deleted the sha branch May 18, 2021 03:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Component: Core App Related to the application itself. Component: Miner Both PoW and PoS block creation Dev Status: In Progress Someone is actively working on this issue. Tag: Waiting For Developer
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants