Adjust the default transaction replay thread pool size #25
Adjust the default transaction replay thread pool size #25steviez wants to merge 1 commit intoanza-xyz:masterfrom
Conversation
|
I ran some nodes against testnet and mainnet to gather some information about the batches getting passed to the thread pool. This collection method was very simple; every time Granted this is only one day of runtime on one node for each cluster, but I think it is telling. The data on testnet is noisier, but on mainnet: |
|
Aside from gut feeling, this initial datapoint also suggests that the thread was over-utilized.
The numbers show that there is a pretty steep drop-off. I probably lost some precision, but the first 28 threads are doing 99.94% of the work. The first 24 99.58%, and the first 16 95.23%. These points also seem to suggest that the extra threads are rarely getting utilized and adding extra overhead for little to no gain (and potentially doing more harm than good when considering the general overhead) |
91aae97 to
3f9a7a5
Compare
8eeb91e to
8dac066
Compare
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #25 +/- ##
=======================================
Coverage 81.8% 81.8%
=======================================
Files 841 841
Lines 228307 228307
=======================================
+ Hits 186941 186973 +32
+ Misses 41366 41334 -32 |
0f88a6c to
50dcfa8
Compare
|
For the sake of experimenting, I single-threaded tx replay with the included patch. The node was unable to catchup; this is somewhat expected from looking at the following two metrics:
Mainnet metrics for my test nodes show show an average of ~450-500ms for |
50dcfa8 to
cd5eefc
Compare
|
Enough time has passed - going to close + re-open this PR |



Problem
The thread pool that is used to perform entry(transaction verification) and transaction execution is currently set to have the same size as the number of virtual cores on a machine. For example, a 24 core / 48 thread machine will put 48 threads into this pool.
This thread-pool is over-provisioned, and the extra thread actually cause more harm than good. When work is sent to the pool, all thread are woken up, even if there is only work for one or two threads. This "thundering herd" effect causes lots of general system disruption, and can easily be mitigated by bounding the thread pool size to more accurately fit the workload we throw at it.
Part of work for #35
Summary of Changes