Skip to content

prioritization fee cache: remove lru crate#30

Merged
CriesofCarrots merged 1 commit intoanza-xyz:masterfrom
fanatid:getRPF-opt
Mar 27, 2024
Merged

prioritization fee cache: remove lru crate#30
CriesofCarrots merged 1 commit intoanza-xyz:masterfrom
fanatid:getRPF-opt

Conversation

@fanatid
Copy link
Copy Markdown

@fanatid fanatid commented Mar 3, 2024

Moved from solana-labs#35228

Problem

lru::LruCache requires write lock for any action.

Summary of Changes

Use BTreeMap instead of LruCache, write lock would be required only on finalizing slot + BTreeMap allows easily to remove the oldest slot.

@mergify mergify Bot requested a review from a team March 3, 2024 17:46
@CriesofCarrots CriesofCarrots added the CI Pull Request is ready to enter CI label Mar 13, 2024
@anza-team anza-team removed the CI Pull Request is ready to enter CI label Mar 13, 2024
@CriesofCarrots
Copy link
Copy Markdown

Looks like this needs a ./cargo-for-all-lock-files.sh tree run to get further in CI

@fanatid
Copy link
Copy Markdown
Author

fanatid commented Mar 13, 2024

Updated

@CriesofCarrots CriesofCarrots added the CI Pull Request is ready to enter CI label Mar 13, 2024
@anza-team anza-team removed the CI Pull Request is ready to enter CI label Mar 13, 2024
@CriesofCarrots CriesofCarrots added the CI Pull Request is ready to enter CI label Mar 13, 2024
@anza-team anza-team removed the CI Pull Request is ready to enter CI label Mar 13, 2024
@tao-stones
Copy link
Copy Markdown

Can you benchmark before/after change for comparison?

This would run existing bench, if you found these benches are outdated, please feel free to update them 😄
./cargo nightly bench --manifest-path runtime/Cargo.toml -- -Z unstable-options bench_process_transactions --ignored

@fanatid
Copy link
Copy Markdown
Author

fanatid commented Mar 14, 2024

master 151675b

running 2 tests
test bench_process_transactions_multiple_slots ... bench:  13,967,987 ns/iter (+/- 1,213,641)
test bench_process_transactions_single_slot    ... bench:   2,059,505 ns/iter (+/- 411,210)

PR 3030611

running 2 tests
test bench_process_transactions_multiple_slots ... bench:  12,907,941 ns/iter (+/- 1,044,988)
test bench_process_transactions_single_slot    ... bench:   1,339,063 ns/iter (+/- 239,231)

but I do not think that this display real improvements because we only parse transactions and send results to background update thread
not sure how to benchmark background thread, so I added std output (where cache_lock_time few times less)

master

running 1 test
slot_finalize_time 128950us, cache_lock_time 7069us
test bench_process_transactions_single_slot    ... bench:   1,704,378 ns/iter (+/- 186,777)

PR

running 1 test
slot_finalize_time 119721us, cache_lock_time 1765us
test bench_process_transactions_single_slot    ... bench:   1,592,675 ns/iter (+/- 249,831)

diff:

diff --git a/runtime/benches/prioritization_fee_cache.rs b/runtime/benches/prioritization_fee_cache.rs
index 8c6bf1fe0a..f97691625d 100644
--- a/runtime/benches/prioritization_fee_cache.rs
+++ b/runtime/benches/prioritization_fee_cache.rs
@@ -44,22 +44,29 @@ fn build_sanitized_transaction(
 fn bench_process_transactions_single_slot(bencher: &mut Bencher) {
     let prioritization_fee_cache = PrioritizationFeeCache::default();
 
-    let bank = Arc::new(Bank::default_for_tests());
+    let GenesisConfigInfo { genesis_config, .. } = create_genesis_config(10_000);
+    let bank0 = Bank::new_for_benches(&genesis_config);
+    let bank_forks = BankForks::new_rw_arc(bank0);
+    let bank = bank_forks.read().unwrap().working_bank();
+    let collector = solana_sdk::pubkey::new_rand();
+    let mut n = 0;
+    bencher.iter(move || {
+        n += 1;
+        let bank = Bank::new_from_parent(bank.clone(), &collector, n);
 
-    // build test transactions
-    let transactions: Vec<_> = (0..5000)
-        .map(|n| {
-            let compute_unit_price = n % 7;
-            build_sanitized_transaction(
-                compute_unit_price,
-                &Pubkey::new_unique(),
-                &Pubkey::new_unique(),
-            )
-        })
-        .collect();
+        let transactions = (0..500)
+            .map(|n| {
+                let compute_unit_price = n % 7;
+                build_sanitized_transaction(
+                    compute_unit_price,
+                    &Pubkey::new_unique(),
+                    &Pubkey::new_unique(),
+                )
+            })
+            .collect::<Vec<_>>();
 
-    bencher.iter(|| {
         prioritization_fee_cache.update(&bank, transactions.iter());
+        prioritization_fee_cache.finalize_priority_fee(bank.slot(), bank.bank_id());
     });
 }
 
diff --git a/runtime/src/prioritization_fee_cache.rs b/runtime/src/prioritization_fee_cache.rs
index 0490f59445..176a7ba405 100644
--- a/runtime/src/prioritization_fee_cache.rs
+++ b/runtime/src/prioritization_fee_cache.rs
@@ -44,6 +44,9 @@ struct PrioritizationFeeCacheMetrics {
 
     // Accumulated time spent on finalizing block prioritization fees.
     total_block_finalize_elapsed_us: AtomicU64,
+
+    finalized: AtomicU64,
+    cache: AtomicU64,
 }
 
 impl PrioritizationFeeCacheMetrics {
@@ -65,6 +68,7 @@ impl PrioritizationFeeCacheMetrics {
     fn accumulate_total_cache_lock_elapsed_us(&self, val: u64) {
         self.total_cache_lock_elapsed_us
             .fetch_add(val, Ordering::Relaxed);
+        self.cache.fetch_add(val, Ordering::Relaxed);
     }
 
     fn accumulate_total_entry_update_elapsed_us(&self, val: u64) {
@@ -75,6 +79,7 @@ impl PrioritizationFeeCacheMetrics {
     fn accumulate_total_block_finalize_elapsed_us(&self, val: u64) {
         self.total_block_finalize_elapsed_us
             .fetch_add(val, Ordering::Relaxed);
+        self.finalized.fetch_add(val, Ordering::Relaxed);
     }
 
     fn report(&self, slot: Slot) {
@@ -161,6 +166,7 @@ impl Drop for PrioritizationFeeCache {
             .unwrap()
             .join()
             .expect("Prioritization fee cache servicing thread failed to join");
+        println!("slot_finalize_time {}us, cache_lock_time {}us", self.metrics.finalized.load(Ordering::Relaxed), self.metrics.cache.load(Ordering::Relaxed));
     }
 }

Copy link
Copy Markdown

@tao-stones tao-stones left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall a good change!

Comment thread runtime/src/prioritization_fee_cache.rs Outdated
Comment thread runtime/src/prioritization_fee.rs
Comment thread runtime/src/prioritization_fee_cache.rs Outdated
Comment thread runtime/src/prioritization_fee_cache.rs Outdated
Comment thread runtime/src/prioritization_fee_cache.rs
Comment thread runtime/src/prioritization_fee_cache.rs Outdated
@tao-stones
Copy link
Copy Markdown

so I added std output (where cache_lock_time few times less)

Great to see significant decrease on cache_lock_time!

Do you mind to add entry_update_time to std output alone with cache_lock_time and slot_finalize_time to get a better picture? Thank you.

@fanatid
Copy link
Copy Markdown
Author

fanatid commented Mar 14, 2024

master:

running 1 test
cache 7620us, update: 158405us, finalize 128463us
test bench_process_transactions_single_slot    ... bench:   1,747,303 ns/iter (+/- 323,183)

PR

running 1 test
cache 1527us, update: 392736us, finalize 116258us
test bench_process_transactions_single_slot    ... bench:   1,533,384 ns/iter (+/- 213,986)

PR after bf3cae6 (update per tx instead of per bank)

running 1 test
cache 1785us, update: 149256us, finalize 107472us
test bench_process_transactions_single_slot    ... bench:   1,674,978 ns/iter (+/- 414,933)

but honestly entry_update_time is very unstable here, I had seen that numbers changed for master and updated PR too much, I had seen 2 times difference:

running 1 test
cache 1785us, update: 149256us, finalize 107472us
test bench_process_transactions_single_slot    ... bench:   1,674,978 ns/iter (+/- 414,933)
running 1 test
cache 3929us, update: 318381us, finalize 243182us
test bench_process_transactions_single_slot    ... bench:   1,690,419 ns/iter (+/- 191,294)

not sure would be good to send an update per tx or collect + send per bank

Copy link
Copy Markdown

@t-nelson t-nelson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this seems to be making several independent changes simultaneously. we generally reject these in favor of one pr per change. i'd highly recommend breaking it up

Comment thread runtime/src/prioritization_fee_cache.rs Outdated
@tao-stones
Copy link
Copy Markdown

this seems to be making several independent changes simultaneously. we generally reject these in favor of one pr per change. i'd highly recommend breaking it up

Yea, agree it'd be better (at least easier to review) if can break into separate PRs, perhaps:

  • replace lruCache with BtreeMap with manual capacity control
  • Introducing unfinalized cache at receiver thread,
  • batch updates

@fanatid
Copy link
Copy Markdown
Author

fanatid commented Mar 16, 2024

lruCache and unfinalized are related to each other, code would need to be adjusted after each PR again.
While it's possible and I moved unfinalized to #272 but doesn't make sense at all, I already split getRPF changes for 2 PRs (this and #217)

The batch update was removed after the benchmark.

BTreeMap::retain doesn't make sense, better switch to HashMap then. BTreeMap was used because we iterate in order and can break earlier, HashMap would be faster. I had made a change in #272.

@CriesofCarrots
Copy link
Copy Markdown

BTreeMap::retain doesn't make sense, better switch to HashMap then. BTreeMap was used because we iterate in order and can break earlier, HashMap would be faster. I had made a change in #272.

Did you read the entirety of my comment? split_off would take advantage of the ordering and not require visiting every element.

@fanatid fanatid changed the title Optimize prioritization fee cache prioritization fee cache: remove lru crate Mar 27, 2024
@fanatid
Copy link
Copy Markdown
Author

fanatid commented Mar 27, 2024

@tao-stones updated

@CriesofCarrots
Copy link
Copy Markdown

Can you please rebase on master and clean up the commit history? We are generally opposed to merge commits. Also, it looks like the description needs updating.

@codecov-commenter
Copy link
Copy Markdown

Codecov Report

Attention: Patch coverage is 95.34884% with 2 lines in your changes are missing coverage. Please review.

Project coverage is 81.9%. Comparing base (80d3200) to head (f815a5c).

Additional details and impacted files
@@           Coverage Diff           @@
##           master      #30   +/-   ##
=======================================
  Coverage    81.8%    81.9%           
=======================================
  Files         841      841           
  Lines      228242   228246    +4     
=======================================
+ Hits       186923   186940   +17     
+ Misses      41319    41306   -13     

@CriesofCarrots CriesofCarrots merged commit ba9c25c into anza-xyz:master Mar 27, 2024
@fanatid fanatid deleted the getRPF-opt branch March 27, 2024 19:07
OliverNChalk pushed a commit to OliverNChalk/agave that referenced this pull request Nov 11, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants