-
Notifications
You must be signed in to change notification settings - Fork 90
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Mining][SHA256D] Reduce lock contention with SHA256D #915
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good stuff, Thank you Zannick. One thing to keep in mind is that atomic changes are best for PRs. Your 3 commits are distinctly different and therefore wouldn't be appropriate to be squashed; however if there's a problem with one of them, then it holds up the other ones.
Unfortunately this must be NACKed; I will explain inline. 4f4528c
src/miner.cpp
Outdated
while (nMidTries < nMidLoopCount) { | ||
while (nTries < nInnerLoopCount && | ||
!(success = CheckProofOfWork(pblock->GetSha256D(midStateHash), pblock->nBits, | ||
Params().GetConsensus(), CBlockHeader::SHA256D_BLOCK))) { | ||
boost::this_thread::interruption_point(); | ||
++nTries; | ||
++pblock->nNonce64; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since this essentially runs it much longer, I do have some concerns over this change; but I'm not 100% in disagreement with it. I need to give it another look on the next pass.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To be clear, though, running the loop longer is the main way I've dealt with the lock contention from setting up the block template and nonce.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The original response was much different, until I read the change description again. I just want to double check everything that passes through when it fully exits and returns.
Thanks, I'll get on the changes and split the required commits into separate PRs. Wasn't really sure how many PRs to make for my first PR. =) |
@Zannick It looks like there's still some stuff left in here that's not in the other two PRs. Do you want to migrate this one, or clean up the commit and submit as a 3rd PR when it's ready? (I'm leaning towards the latter). |
My intent was to leave this WIP until the logging PR is in, and then clean up, but if you prefer I can close now and resubmit later. |
No worries; just going through them all and making sure. |
This PR can be closed correct? |
I'll revive this after #921 is merged. |
Increases the number of hashes a SHA256D miner thread attempts per created block template, and adds a separate counter to allow an early continuation when the chain tip changes. The SHA256D algorithm is fast enough that the main miner loop runs multiple times per second with the previous loop count. In each loop, a thread generates a new block template, which requires grabbing cs_main a couple times. I found this to be a significant source of contention, often causing both SHA mining and wallet syncing to stop. In contrast, a RandomX miner thread runs its main loop on the order of minutes, with a lower loop count, and checks on every hash attempt whether the tip has changed, resulting in approximately one block template created per block. I chose nMidLoopCount to approximately match the RandomX timing, while nInnerLoopCount is still fast enough to be used as an intermittent sub-second check of the chain height.
Ready for re-review. This fixes #862 in preventing regular, frequent occurrences of the wallet getting stuck. Unfortunately I did see a very rare stuckness in a similar branch (with slightly different changes), which had an interesting symptom of taking minutes for some mining threads to be interrupted once I asked it to stop mining. If I see it again, I'll turn all debugging on before I hit exit, see if I can see what's going on/which threads are blocked, or try other methods of finding lock contention. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ACK 2270c94
Tested SHA mining on Win10 and Linux 18.04
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
utACK 2270c94
Validating zerocoin spends requires determining the height of an accumulator hash, which is done by searching through all the blocks in the chain starting from genesis, for every txin. Since the height of a particular hash-denomination pair is not going to change, we can cache it. This is most useful for in-wallet block template generation, since each mining thread performs these validations separately (while holding cs_main). Adds a SimpleLRUCache template based on 7c9e97a and uses it to store checksum heights. I chose a particular size for it that should be large enough for significant spends. This results in a roughly 30+x speedup of zerocoin spend validation (26-30ms/txin -> 0.4-0.9ms/txin). Vastly improves Veil-Project#862 alongside Veil-Project#915.
Validating zerocoin spends requires determining the height of an accumulator hash, which is done by searching through all the blocks in the chain starting from genesis, for every txin. Since the height of a particular hash-denomination pair is not going to change, we can cache it. This is most useful for in-wallet block template generation, since each mining thread performs these validations separately (while holding cs_main). Adds a SimpleLRUCache template based on 7c9e97a and uses it to store checksum heights. I chose a particular size for it that should be large enough for significant spends. This results in a roughly 30+x speedup of zerocoin spend validation (26-30ms/txin -> 0.4-0.9ms/txin). Vastly improves Veil-Project#862 alongside Veil-Project#915.
Validating zerocoin spends requires determining the height of an accumulator hash, which is done by searching through all the blocks in the chain starting from genesis, for every txin. Since the height of a particular hash-denomination pair is not going to change, we can cache it. This is most useful for in-wallet block template generation, since each mining thread performs these validations separately (while holding cs_main). Adds a SimpleLRUCache template based on 7c9e97a and uses it to store checksum heights. I chose a particular size for it that should be large enough for significant spends. This results in a roughly 30+x speedup of zerocoin spend validation (26-30ms/txin -> 0.4-0.9ms/txin). Vastly improves Veil-Project#862 alongside Veil-Project#915.
Validating zerocoin spends requires determining the height of an accumulator hash, which is done by searching through all the blocks in the chain starting from genesis, for every txin. Since the height of a particular hash-denomination pair is not going to change, we can cache it. This is most useful for in-wallet block template generation, since each mining thread performs these validations separately (while holding cs_main). Adds a SimpleLRUCache template based on 7c9e97a and uses it to store checksum heights. I chose a particular size for it that should be large enough for significant spends. This results in a roughly 30+x speedup of zerocoin spend validation (26-30ms/txin -> 0.4-0.9ms/txin). Vastly improves Veil-Project#862 alongside Veil-Project#915.
1235e1c [Zerocoin][Validation] Cache checksum heights (Zannick) Pull request description: ### Problem ### While mining SHA256D, the wallet periodically gets stuck, failing to stay caught up with the chain, **even with #915**. (At the same time, the mining significantly slows down.) ### Root Cause ### Lock contention; the cs_main lock is held throughout critical parts of block template creation, done independently per mining thread. The intermittent stuckness appears correlated with large numbers of txins during zerocoin spend validation. This requires determining the height of an accumulator hash, which is done by searching through all the blocks in the chain starting from genesis, for every txin. My debug logs (with -debug=bench) show very consistent examples like `- Verify 646 txins: 17392.00ms (26.923ms/txin)`. ### Solution ### Since the height of a particular hash-denomination pair is not going to change, we can cache it. One mining thread will still have to find the height to begin with, but all the other threads will not need to. (Also, hashes may persist to be reused in later blocks; there are likely multiple coins of the same denom with the same accumulator hash.) Adds a SimpleLRUCache template based on 7c9e97a and uses it to store checksum heights. I chose a somewhat arbitrary size for it that should be large enough for significant spends. This results in a **significant** speedup of zerocoin spend validation (alternately got `- Verify 101 txins: 1.00ms (0.010ms/txin)` , `- Verify 101 txins: 0.00ms (0.000ms/txin)` on one block). Vastly improves #862 alongside #915--I can confirm it still gets stuck without #915. ### Bounty PR ### #862 ### Bounty Payment Address ## `sv1qqpswvjy7s9yrpcmrt3fu0kd8rutrdlq675ntyjxjzn09f965z9dutqpqgg85esvg8mhmyka5kq5vae0qnuw4428vs9d2gu4nz643jv5a72wkqqq73mnxr` ### Testing ### testnet: built with `-with-sanitizers=thread -O0 -g` and run with 3/4 threads mining sha256d, and spending tons of 10s at once to stress-test this section. mainnet: running 12 threads sha256d, built with #915 and `-O3` and ```patch diff --git a/src/validation.cpp b/src/validation.cpp index f92427109..a2a2c9ef5 100644 --- a/src/validation.cpp +++ b/src/validation.cpp @@ -2863,0 +2895,2 @@ bool CChainState::ConnectBlock(const CBlock& block, CValidationState& state, CBl return state.DoS(100, error("%s: CheckQueue failed", __func__), REJECT_INVALID, "block-validation-failed"); int64_t nTime4 = GetTimeMicros(); nTimeVerify += nTime4 - nTime2; + if (nInputs > 20) + LogPrintf(" - Verify %u txins: %.2fms (%.3fms/txin) [%.2fs (%.2fms/blk)]\n", nInputs - 1, MILLI * (nTime4 - nTime2), nInputs <= 1 ? 0 : MILLI * (nTime4 - nTime2) / (nInputs-1), nTimeVerify * MICRO, nTimeVerify * MILLI / nBlocksTotal); LogPrint(BCLog::BENCH, " - Verify %u txins: %.2fms (%.3fms/txin) [%.2fs (%.2fms/blk)]\n", nInputs - 1, MILLI * (nTime4 - nTime2), nInputs <= 1 ? 0 : MILLI * (nTime4 - nTime2) / (nInputs-1), nTimeVerify * MICRO, nTimeVerify * MILLI / nBlocksTotal); ``` for simple performance monitoring. Tree-SHA512: 922f79e50e592a69d8ad14092218da515b53223aa77664b60c7ae0fa8444d1587e64ec37cdca17234bb3742cd54f52a8d7c1de9f3333d0521612fec651536395
Problem
While mining SHA256D, the wallet periodically gets stuck, failing to stay caught up with the chain. (At the same time, the mining significantly slows down.)
Root Cause
Lock contention; SHA256D is very fast, completing its loop in subsecond time on a modern cpu, meaning every thread goes through lock-heavy sections of the block template creation code multiple times per second. In comparison, RandomX on the same machine takes minutes to complete its loop, usually being interrupted by the creation of a new block instead of reaching the maximal number of attempts.
Solution
Increase the number of loop iterations for SHA256D. To avoid doing useless work, also periodically check for a new block to end the outer iteration early.
BitcoinMiner
for SHA256D only, which reduces (ideally, minimizes) the number of times per chain tip that the mining thread needs to grabcs_main
.chainActive.Height()
to see if a block has already been found, in order to exit early, similar to RandomX.Bounty PR
#862
Bounty Payment Address
sv1qqpswvjy7s9yrpcmrt3fu0kd8rutrdlq675ntyjxjzn09f965z9dutqpqgg85esvg8mhmyka5kq5vae0qnuw4428vs9d2gu4nz643jv5a72wkqqq73mnxr
Unit Testing Results
Manually ran all tests in
src/test/
via a script and compared to master. No differences.Ran on testnet.