Improve bench-tps keypair generation#7723
Conversation
6735635 to
ad668c0
Compare
ad668c0 to
d5f22ba
Compare
Codecov Report
@@ Coverage Diff @@
## master #7723 +/- ##
========================================
- Coverage 81.7% 81.7% -0.1%
========================================
Files 241 241
Lines 50731 50731
========================================
- Hits 41491 41486 -5
- Misses 9240 9245 +5 |
|
@sakridge has a better eye than I do in for this part of the code, I defer review to him |
| let keypair_chunks = source_keypair_chunks.len() as u64; | ||
| let mut reclaim_lamports_back_to_source_account = false; | ||
| let mut i = keypair0_balance; | ||
| let mut i = 0; |
There was a problem hiding this comment.
So that mechanism was in place to quickly restart bench-tps with the same parameters as a previous run. We would detect the balance of the last keypair to check how many lamports had been transferred from one half of they keypairs to the other half. But I found that the assumptions were not valid for running with ramp-tps.
In this PR, I removed the quick-start mechanism but I think that was a mistake. I'll think of another approach that works well for repeated bench-tps runs as well as incremental ramp-tps runs
| if timer.elapsed() >= Duration::from_secs(5) { | ||
| if failed_verify > 0 { | ||
| debug!("total txs failed verify: {}", failed_verify); | ||
| let failed_verify = Arc::new(AtomicUsize::new(0)); |
There was a problem hiding this comment.
what do you think about breaking this up into another function. The complexity was bad before, now it's even worse.
There was a problem hiding this comment.
Totally in favour, I'll clean this up
|
Overall question, why are the funding transactions failing in the original? |
jstarry
left a comment
There was a problem hiding this comment.
Overall question, why are the funding transactions failing in the original?
To be honest, not exactly sure but it's likely either:
- Chosen node for funding txs goes down or is unresponsive.
- Another bench-tps client is already running and funding txs get dropped by the network
This PR doesn't aim to fix the root cause for failed funding txs but it does aim to make funding faster and it will bail on verifying the fund txs if it comes across too many failures. In that case, the funding txs would be sent again (helping with 2.) and the multi client might pick a new RPC node (helping with 1.)
| if timer.elapsed() >= Duration::from_secs(5) { | ||
| if failed_verify > 0 { | ||
| debug!("total txs failed verify: {}", failed_verify); | ||
| let failed_verify = Arc::new(AtomicUsize::new(0)); |
There was a problem hiding this comment.
Totally in favour, I'll clean this up
| let keypair_chunks = source_keypair_chunks.len() as u64; | ||
| let mut reclaim_lamports_back_to_source_account = false; | ||
| let mut i = keypair0_balance; | ||
| let mut i = 0; |
There was a problem hiding this comment.
So that mechanism was in place to quickly restart bench-tps with the same parameters as a previous run. We would detect the balance of the last keypair to check how many lamports had been transferred from one half of they keypairs to the other half. But I found that the assumptions were not valid for running with ramp-tps.
In this PR, I removed the quick-start mechanism but I think that was a mistake. I'll think of another approach that works well for repeated bench-tps runs as well as incremental ramp-tps runs
0f187f5 to
69cbb82
Compare
|
@sakridge I updated the PR and it's ready for another pass. I updated the logic around transfer direction to be a lot simpler. I tested the changes briefly on a cpu testnet and behaviour looks good. I'll run with gpus later today to make sure tps isn't affected. |
ok cool, i'll take a look |
| // 100 lamports should give enough wiggle room to handle source / dest keypair sets getting unbalanced | ||
| let minimum_balance = 100; | ||
|
|
||
| if first_keypair_balance < minimum_balance || last_keypair_balance < minimum_balance { |
There was a problem hiding this comment.
I'm not sure I get this minimum balance check fixed at 100. The user could have specified 1000 or 10,000 lamports per account, then if they have only 100 they would not be funded, is that correct? The cluster fees can be in the 1000 lamport range also, so I'm not sure a fixed low value here will work in all cases.
|
I wasn't able to get a gpu testnet working but ran a gce cpu testnet with 25k avg tps and tx errors were pretty quiet so I feel pretty confident in this new approach of switching transfer directions more frequently. |
|
@sakridge can you take another look? Thanks! |
* Improve bench-tps keypair generation * Fix tests * Fix move test * cargo fmt * Split up funding function into smaller functions * Support restarting bench-tps without re-funding * Change quick start logic and remove noisy log (cherry picked from commit b78b1bb)
Problem
Account generation at the beginning of bench-tps can sometimes be really slow. This could be due to a number of reasons:
Summary of Changes
Fixes: #7597