Skip to content
This repository was archived by the owner on Jan 22, 2025. It is now read-only.

validator: Add --enable-bigtable-ledger-upload flag #12040

Merged
mvines merged 9 commits into
solana-labs:masterfrom
mvines:bt
Sep 4, 2020
Merged

validator: Add --enable-bigtable-ledger-upload flag #12040
mvines merged 9 commits into
solana-labs:masterfrom
mvines:bt

Conversation

@mvines
Copy link
Copy Markdown
Contributor

@mvines mvines commented Sep 4, 2020

When enabled, the --enable-bigtable-ledger-upload flag will cause the validator to continually upload new confirmed blocks to BigTable in the background.

@codecov
Copy link
Copy Markdown

codecov Bot commented Sep 4, 2020

Codecov Report

Merging #12040 into master will decrease coverage by 0.0%.
The diff coverage is 8.5%.

@@            Coverage Diff            @@
##           master   #12040     +/-   ##
=========================================
- Coverage    82.1%    82.1%   -0.1%     
=========================================
  Files         334      336      +2     
  Lines       78798    78856     +58     
=========================================
+ Hits        64768    64777      +9     
- Misses      14030    14079     +49     

@mvines mvines marked this pull request as ready for review September 4, 2020 17:02
Comment thread core/src/rpc_service.rs
};
let exit_bigtable_ledger_upload_service = Arc::new(AtomicBool::new(false));

let (bigtable_ledger_storage, _bigtable_ledger_upload_service) =
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's slightly awkward that BigTableUploadService is plumbed into RpcService like this. Moving it up a level into Validator would require moving the tokio Runtime and solana_storage_bigtable::LedgerStorage::new as well and I didn't like that code churn in this PR. That's my excuse at least

sync::{atomic::AtomicBool, Arc},
};

async fn upload(
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changes in this file are really just moving code into ledger/

@mvines
Copy link
Copy Markdown
Contributor Author

mvines commented Sep 4, 2020

For mainnet-beta, I think we keep the same upload at the end of an epoch process going as the last resort. Then additionally add this flag on one or more of the warehouse nodes to get confirmed block data into BigTable within seconds. At that point, we can start making API nodes ephemeral!

Copy link
Copy Markdown
Contributor

@CriesofCarrots CriesofCarrots left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good overall. I'll buy your rationalization for the new service in rpc_service, though I agree it's awkward.

I'm wondering about 2 potential races:

  1. Is this the right code for the end-of-epoch upload? https://github.com/solana-labs/cluster/blob/3614bbf6d36a275f4a5c0cba10d517367825b508/bin/warehouse-upload-to-storage-bucket.sh#L74 It seems like that is checking the entire ledger for gaps in bigtable storage all at once, and won't catch new confirmed blocks that are uploaded by the BigTableUploadService in the meantime. ... on looking at the lower-level apis more, maybe this isn't an issue, beyond efficiency.

  2. Although this isn't merged yet, it will bring the complication of blocktimes and might be worth planning for now: #11955. Because the CacheBlockTimeService is prompted by handle_votable_bank and the BigTableUploadService by a changes in highest_confirmed_root, if both PRs get merged as is, the blocktime "should" get cached before the upload happens. But if the blocktime calculation backs up for some reason, there's no guarantee.

Comment thread ledger/src/bigtable_upload.rs Outdated
@mvines
Copy link
Copy Markdown
Contributor Author

mvines commented Sep 4, 2020

  1. Is this the right code for the end-of-epoch upload? https://github.com/solana-labs/cluster/blob/3614bbf6d36a275f4a5c0cba10d517367825b508/bin/warehouse-upload-to-storage-bucket.sh#L74 It seems like that is checking the entire ledger for gaps in bigtable storage all at once, and won't catch new confirmed blocks that are uploaded by the BigTableUploadService in the meantime. ... on looking at the lower-level apis more, maybe this isn't an issue, beyond efficiency.

Yes that's the right code. But this always runs on a ledger from the previous Epoch, using the old ledger from the warehouse node. In the meantime, the warehouse node has been restarted from a snapshot and is building a new ledger.

  1. Although this isn't merged yet, it will bring the complication of blocktimes and might be worth planning for now: Cache block time in Blockstore #11955. Because the CacheBlockTimeService is prompted by handle_votable_bank and the BigTableUploadService by a changes in highest_confirmed_root, if both PRs get merged as is, the blocktime "should" get cached before the upload happens. But if the blocktime calculation backs up for some reason, there's no guarantee.

I don't think we need to be this aggressive with uploading ledger into BigTable. Would it help to trail the highest confirmed root by 100 slots or so?

@CriesofCarrots
Copy link
Copy Markdown
Contributor

Yes that's the right code. But this always runs on a ledger from the previous Epoch, using the old ledger from the warehouse node. In the meantime, the warehouse node has been restarted from a snapshot and is building a new ledger.

Aha! That explains it. Thanks for that detail.

I don't think we need to be this aggressive with uploading ledger into BigTable. Would it help to trail the highest confirmed root by 100 slots or so?

Yep, that would definitely help. 100 slots should be more than enough

@mvines
Copy link
Copy Markdown
Contributor Author

mvines commented Sep 4, 2020

Yep, that would definitely help. 100 slots should be more than enough

Cool I added 3af90dd for this for now. But perhaps CacheBlockTimeService could signal the last slot it's added a block time to somehow, another field in block_commitment_cache or blockstore method? This was we remove the possible race entirely.

@mvines
Copy link
Copy Markdown
Contributor Author

mvines commented Sep 4, 2020

Maybe once #11955 lands, the loop in here can check if blockstore.get_block_time() returns Some(_) for the highest slot it wants to upload. If None not then wait a little while longer

@CriesofCarrots
Copy link
Copy Markdown
Contributor

Maybe once #11955 lands, the loop in here can check if blockstore.get_block_time() returns Some(_) for the highest slot it wants to upload. If None not then wait a little while longer

Sounds good. I’ll play with some options when I rebase #11955 on this.

@mvines mvines removed the v1.2 label Sep 4, 2020
@mvines mvines merged commit b64fb29 into solana-labs:master Sep 4, 2020
@mvines mvines deleted the bt branch September 4, 2020 23:02
mergify Bot added a commit that referenced this pull request Sep 5, 2020
…2057)

* Add --enable-bigtable-ledger-upload flag

(cherry picked from commit d8e2038)

* Relocate BigTable uploader to ledger/ crate

(cherry picked from commit 91a56ca)

# Conflicts:
#	ledger/Cargo.toml

* Cargo.lock

(cherry picked from commit 2b8a521)

* Add BigTableUploadService

(cherry picked from commit bc7731b)

* Add BigTableUploadService

(cherry picked from commit bafdcf2)

* Add exit flag for bigtable upload operations

(cherry picked from commit d3611f7)

* Remove dead code

(cherry picked from commit cd3c134)

* Request correct access

(cherry picked from commit 4ba43c2)

* Add LARGEST_CONFIRMED_ROOT_UPLOAD_DELAY

(cherry picked from commit b64fb29)

* Resolve merge conflicts

Co-authored-by: Michael Vines <mvines@gmail.com>
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants