Skip to content
This repository was archived by the owner on Jan 22, 2025. It is now read-only.

Cache block time in Blockstore#11955

Merged
CriesofCarrots merged 6 commits into
solana-labs:masterfrom
CriesofCarrots:cache-block-time
Sep 9, 2020
Merged

Cache block time in Blockstore#11955
CriesofCarrots merged 6 commits into
solana-labs:masterfrom
CriesofCarrots:cache-block-time

Conversation

@CriesofCarrots
Copy link
Copy Markdown
Contributor

@CriesofCarrots CriesofCarrots commented Aug 31, 2020

Problem

The getBlockTime rpc endpoint can return null for a block that hasn't been pruned from Blockstore. This is because we only keep the last 5 epochs of stake info, and stake info is needed for calculating a block timestamp on demand.

To address this problem, and generally offer better block-time support, we've made two changes to the original design (https://docs.solana.com/implemented-proposals/validator-timestamp-oracle):

  1. The initial design called for block times to be calculated on demand in order to demonstrate that they are deterministic from the ledger. Caching block times saves this unnecessary every-call work; the block times are still deterministic from the ledger.
  2. The initial design called for validators to vote every ~30min (4500 slots) in order to save space in Vote transactions. As of Submit a vote timestamp every vote #10630 , validators are now timestamp every Vote. As a result, there are plenty of timestamps available to calculate a stake-weighted timestamp as soon as a block is rooted in Blockstre.

Summary of Changes

  • Add Blockstore column family to cache blocktimes
  • Add CacheBlockTimeService, triggered from ReplayStage::handle_votable_bank, to calculate and cache a block's timestamp as soon as a block is rooted
  • Use cache in getBlockTime rpc
  • Add block_time to ConfirmedBlock when populated

Fixes #10089
Note: blocks before this PR is released may still return null

@CriesofCarrots CriesofCarrots requested a review from mvines August 31, 2020 23:15
Comment thread core/src/replay_stage.rs Outdated
Comment on lines +51 to +58
let slot_duration = slot_duration_from_slots_per_year(bank.slots_per_year());
let epoch = bank.epoch_schedule().get_epoch(bank.slot());
let stakes = HashMap::new();
let stakes = bank.epoch_vote_accounts(epoch).unwrap_or(&stakes);

if let Err(e) = blockstore.cache_block_time(bank.slot(), slot_duration, stakes) {
error!("cache_block_time failed: slot {:?} {:?}", bank.slot(), e);
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How expensive is this? Maybe wrap a measure around it?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The pieces of this are measured in get_timestamp_slots() and cache_block_time(). You looking for a sum?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was just more wondering if we run into issues of being able to keep up with the new slots as they come in. But if this operation takes 1-10ms or so then no worries

Copy link
Copy Markdown
Contributor Author

@CriesofCarrots CriesofCarrots Sep 8, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Empirical data suggests this operation will be in the 2-10ms range for current mainnet-beta throughput. However, it does depend on deserializing blocks to find vote transactions, and that deserialization definitely takes longer as TPS increases. (With about 10k TPS, I was seeing this take about 10x as long on my under-powered GCE instance.) One solution would be to index vote transactions/timestamps in blockstore to avoid the deserialization altogether; possibly as part of the transaction-status-service. I think that could be a follow-up optimization. Wdyt? @mvines

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok cool, that seems fine for now. But how about this:

  1. Move the recv_timeout() out of cache_block_time()
  2. In the main thread loop, wrap a measure around cache_block_time(). Then if cache_block_time() takes longer than IDK, 100ms or so, emit a warn! or error! log.

Since this is an unbounded channel, if cache_block_time() ever does get backed up and roots start coming in faster than it can process then we have a memory leak and will probably eventually OOM. It'd be nice to get yelled at from the log if this ever starts happening

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds like a plan

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How does this look? d0a0f50

@CriesofCarrots CriesofCarrots force-pushed the cache-block-time branch 2 times, most recently from ec4453f to a346ced Compare August 31, 2020 23:35
Comment thread ledger/src/blockstore.rs
@CriesofCarrots CriesofCarrots added CI Pull Request is ready to enter CI and removed CI Pull Request is ready to enter CI labels Sep 1, 2020
@solana-grimes solana-grimes removed the CI Pull Request is ready to enter CI label Sep 1, 2020
@codecov
Copy link
Copy Markdown

codecov Bot commented Sep 1, 2020

Codecov Report

Merging #11955 into master will increase coverage by 0.0%.
The diff coverage is 93.1%.

@@           Coverage Diff            @@
##           master   #11955    +/-   ##
========================================
  Coverage    82.0%    82.0%            
========================================
  Files         337      338     +1     
  Lines       79225    79332   +107     
========================================
+ Hits        65011    65124   +113     
+ Misses      14214    14208     -6     

@CriesofCarrots
Copy link
Copy Markdown
Contributor Author

I rolled in adding block_time to ConfirmedBlock because it was a one-liner, which brought along block_time -> bigtable for free :)
Holding off on deprecating getBlockTime for now: #10089 (comment)

@CriesofCarrots
Copy link
Copy Markdown
Contributor Author

@mvines Anything more you'd like to see here?

Comment thread core/src/cache_block_time_service.rs Outdated
Comment on lines +51 to +58
let slot_duration = slot_duration_from_slots_per_year(bank.slots_per_year());
let epoch = bank.epoch_schedule().get_epoch(bank.slot());
let stakes = HashMap::new();
let stakes = bank.epoch_vote_accounts(epoch).unwrap_or(&stakes);

if let Err(e) = blockstore.cache_block_time(bank.slot(), slot_duration, stakes) {
error!("cache_block_time failed: slot {:?} {:?}", bank.slot(), e);
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was just more wondering if we run into issues of being able to keep up with the new slots as they come in. But if this operation takes 1-10ms or so then no worries

@CriesofCarrots
Copy link
Copy Markdown
Contributor Author

Post-merge comments welcome

@CriesofCarrots CriesofCarrots merged commit 05db41f into solana-labs:master Sep 9, 2020
CriesofCarrots added a commit to CriesofCarrots/solana that referenced this pull request Sep 23, 2020
* Add blockstore column to cache block times

* Add method to cache block time

* Add service to cache block time

* Update rpc getBlockTime to use new method, and refactor blockstore slightly

* Return block_time with confirmed block, if available

* Add measure and warning to cache-block-time
CriesofCarrots added a commit that referenced this pull request Sep 23, 2020
* Submit a vote timestamp every vote (#10630)

* Submit a timestamp for every vote

* Submit at most one vote timestamp per second

* Submit a timestamp for every new vote

Co-authored-by: Tyera Eulberg <tyera@solana.com>

* Timestamp first vote (#11856)

* Cache block time in Blockstore (#11955)

* Add blockstore column to cache block times

* Add method to cache block time

* Add service to cache block time

* Update rpc getBlockTime to use new method, and refactor blockstore slightly

* Return block_time with confirmed block, if available

* Add measure and warning to cache-block-time

Co-authored-by: Michael Vines <mvines@gmail.com>
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

RPC: getBlockTime returns null for slot that hasn't been purged

3 participants