Cache block time in Blockstore#11955
Conversation
| let slot_duration = slot_duration_from_slots_per_year(bank.slots_per_year()); | ||
| let epoch = bank.epoch_schedule().get_epoch(bank.slot()); | ||
| let stakes = HashMap::new(); | ||
| let stakes = bank.epoch_vote_accounts(epoch).unwrap_or(&stakes); | ||
|
|
||
| if let Err(e) = blockstore.cache_block_time(bank.slot(), slot_duration, stakes) { | ||
| error!("cache_block_time failed: slot {:?} {:?}", bank.slot(), e); | ||
| } |
There was a problem hiding this comment.
How expensive is this? Maybe wrap a measure around it?
There was a problem hiding this comment.
The pieces of this are measured in get_timestamp_slots() and cache_block_time(). You looking for a sum?
There was a problem hiding this comment.
I was just more wondering if we run into issues of being able to keep up with the new slots as they come in. But if this operation takes 1-10ms or so then no worries
There was a problem hiding this comment.
Empirical data suggests this operation will be in the 2-10ms range for current mainnet-beta throughput. However, it does depend on deserializing blocks to find vote transactions, and that deserialization definitely takes longer as TPS increases. (With about 10k TPS, I was seeing this take about 10x as long on my under-powered GCE instance.) One solution would be to index vote transactions/timestamps in blockstore to avoid the deserialization altogether; possibly as part of the transaction-status-service. I think that could be a follow-up optimization. Wdyt? @mvines
There was a problem hiding this comment.
Ok cool, that seems fine for now. But how about this:
- Move the
recv_timeout()out ofcache_block_time() - In the main thread loop, wrap a measure around
cache_block_time(). Then ifcache_block_time()takes longer than IDK, 100ms or so, emit awarn!orerror!log.
Since this is an unbounded channel, if cache_block_time() ever does get backed up and roots start coming in faster than it can process then we have a memory leak and will probably eventually OOM. It'd be nice to get yelled at from the log if this ever starts happening
There was a problem hiding this comment.
Sounds like a plan
ec4453f to
a346ced
Compare
a346ced to
b72f12d
Compare
b72f12d to
5bb49ae
Compare
Codecov Report
@@ Coverage Diff @@
## master #11955 +/- ##
========================================
Coverage 82.0% 82.0%
========================================
Files 337 338 +1
Lines 79225 79332 +107
========================================
+ Hits 65011 65124 +113
+ Misses 14214 14208 -6 |
|
I rolled in adding block_time to ConfirmedBlock because it was a one-liner, which brought along block_time -> bigtable for free :) |
|
@mvines Anything more you'd like to see here? |
| let slot_duration = slot_duration_from_slots_per_year(bank.slots_per_year()); | ||
| let epoch = bank.epoch_schedule().get_epoch(bank.slot()); | ||
| let stakes = HashMap::new(); | ||
| let stakes = bank.epoch_vote_accounts(epoch).unwrap_or(&stakes); | ||
|
|
||
| if let Err(e) = blockstore.cache_block_time(bank.slot(), slot_duration, stakes) { | ||
| error!("cache_block_time failed: slot {:?} {:?}", bank.slot(), e); | ||
| } |
There was a problem hiding this comment.
I was just more wondering if we run into issues of being able to keep up with the new slots as they come in. But if this operation takes 1-10ms or so then no worries
2263558 to
80bfc10
Compare
80bfc10 to
8572ed2
Compare
|
Post-merge comments welcome |
* Add blockstore column to cache block times * Add method to cache block time * Add service to cache block time * Update rpc getBlockTime to use new method, and refactor blockstore slightly * Return block_time with confirmed block, if available * Add measure and warning to cache-block-time
* Submit a vote timestamp every vote (#10630) * Submit a timestamp for every vote * Submit at most one vote timestamp per second * Submit a timestamp for every new vote Co-authored-by: Tyera Eulberg <tyera@solana.com> * Timestamp first vote (#11856) * Cache block time in Blockstore (#11955) * Add blockstore column to cache block times * Add method to cache block time * Add service to cache block time * Update rpc getBlockTime to use new method, and refactor blockstore slightly * Return block_time with confirmed block, if available * Add measure and warning to cache-block-time Co-authored-by: Michael Vines <mvines@gmail.com>
Problem
The
getBlockTimerpc endpoint can return null for a block that hasn't been pruned from Blockstore. This is because we only keep the last 5 epochs of stake info, and stake info is needed for calculating a block timestamp on demand.To address this problem, and generally offer better block-time support, we've made two changes to the original design (https://docs.solana.com/implemented-proposals/validator-timestamp-oracle):
Summary of Changes
getBlockTimerpcConfirmedBlockwhen populatedFixes #10089
Note: blocks before this PR is released may still return
null