Skip to content
This repository was archived by the owner on Jan 22, 2025. It is now read-only.
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 3 additions & 1 deletion ledger-tool/src/main.rs
Original file line number Diff line number Diff line change
Expand Up @@ -1338,7 +1338,7 @@ fn minimize_bank_for_snapshot(
ending_slot: Slot,
) {
let (transaction_account_set, transaction_accounts_measure) = measure!(
blockstore.get_accounts_used_in_range(snapshot_slot, ending_slot),
blockstore.get_accounts_used_in_range(bank, snapshot_slot, ending_slot),
"get transaction accounts"
);
let total_accounts_len = transaction_account_set.len();
Expand Down Expand Up @@ -1918,6 +1918,7 @@ fn main() {
.arg(&no_snapshot_arg)
.arg(&account_paths_arg)
.arg(&accounts_db_skip_initial_hash_calc_arg)
.arg(&accountsdb_skip_shrink)
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it looks like this speeds up a bit? Feature parity with other solana-ledger-tool subcommands.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah probably a good option to allow - nice.

.arg(&ancient_append_vecs)
.arg(&hard_forks_arg)
.arg(&max_genesis_archive_unpacked_size_arg)
Expand Down Expand Up @@ -3015,6 +3016,7 @@ fn main() {
halt_at_slot: Some(snapshot_slot),
poh_verify: false,
accounts_db_config: Some(get_accounts_db_config(&ledger_path, arg_matches)),
accounts_db_skip_shrink: arg_matches.is_present("accounts_db_skip_shrink"),
..ProcessOptions::default()
},
snapshot_archive_path,
Expand Down
32 changes: 26 additions & 6 deletions ledger/src/blockstore.rs
Original file line number Diff line number Diff line change
Expand Up @@ -39,7 +39,10 @@ use {
poh_timing_point::{send_poh_timing_point, PohTimingSender, SlotPohTimingInfo},
},
solana_rayon_threadlimit::get_max_thread_count,
solana_runtime::hardened_unpack::{unpack_genesis_archive, MAX_GENESIS_ARCHIVE_UNPACKED_SIZE},
solana_runtime::{
bank::Bank,
hardened_unpack::{unpack_genesis_archive, MAX_GENESIS_ARCHIVE_UNPACKED_SIZE},
},
solana_sdk::{
clock::{Slot, UnixTimestamp, DEFAULT_TICKS_PER_SECOND, MS_PER_TICK},
genesis_config::{GenesisConfig, DEFAULT_GENESIS_ARCHIVE, DEFAULT_GENESIS_FILE},
Expand Down Expand Up @@ -2835,6 +2838,7 @@ impl Blockstore {
/// Used by ledger-tool to create a minimized snapshot
pub fn get_accounts_used_in_range(
&self,
bank: &Bank,
starting_slot: Slot,
ending_slot: Slot,
) -> DashSet<Pubkey> {
Expand All @@ -2844,11 +2848,27 @@ impl Blockstore {
.into_par_iter()
.for_each(|slot| {
if let Ok(entries) = self.get_slot_entries(slot, 0) {
entries.par_iter().for_each(|entry| {
entry.transactions.iter().for_each(|tx| {
tx.message.static_account_keys().iter().for_each(|pubkey| {
result.insert(*pubkey);
});
entries.into_par_iter().for_each(|entry| {
entry.transactions.into_iter().for_each(|tx| {
if let Some(lookups) = tx.message.address_table_lookups() {
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@t-nelson here

lookups.iter().for_each(|lookup| {
result.insert(lookup.account_key);
});
}
// howdy, anybody who reached here from the panic messsage!
// the .unwrap() below could indicate there was an odd error or there
// could simply be a tx with a new ALT, which is just created/updated
// in this range. too bad... this edge case isn't currently supported.
// see: https://github.com/solana-labs/solana/issues/30165
// for casual use, please choose different slot range.
let sanitized_tx = bank.fully_verify_transaction(tx).unwrap();
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Definitely better than it was, as when I wrote this originally I didn't even know lookup tables were a thing - but I'm not 100% convinced this is correct either. If a lookup table is extended in the slot range, then a tx after that could fail to verify this check (bank is looking at beginning of range) and we wouldn't add the associated keys for the transaction.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@apfitzge hehe, i know there's buried compromise here: how about this helpful, cheerful, encouraging comment?: db26aed92b7a101a3fb010f86b4f669860d9b8ba

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

patches are welcome but it'll take some good deal of efforts to support this fully....

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

^ agreed. I'm not sure how many people actually use this subcommand - but not sure its worth distracting us over scheduler.

I'd update the comment to include this issue I just made: #30165 😄

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

patches are welcome but it'll take some good deal of efforts to support this fully....

Yeah, it might be that we just have to replay the slot range and add accounts as we see them that way which is a bit of effort.
I hadn't taken that approach initially because we need to keep account state at the beginning of the snapshot range in order to create it, and process of (unpacking + replay + delete + unpacking + minimize + snapshot) seemed too long.
And also the fact ALTs weren't in (to my knowledge) at the time I started on the project as a total Solana noob haha.

With @xiangzhu70's change for keeping and loading from snapshot files will allow us to do this in a timely manner!

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd update the comment to include this issue I just made: #30165 😄

thanks for creating issue; just done: a2b8565

With @xiangzhu70's change for keeping and loading from snapshot files will allow us to do this in a timely manner!

Oh sounds promising. :)

sanitized_tx
.message()
.account_keys()
.iter()
.for_each(|&pubkey| {
result.insert(pubkey);
});
});
});
}
Expand Down
8 changes: 8 additions & 0 deletions runtime/src/bank.rs
Original file line number Diff line number Diff line change
Expand Up @@ -1912,6 +1912,7 @@ impl Bank {
additional_builtins,
debug_do_not_add_builtins,
);
bank.fill_missing_sysvar_cache_entries();
Copy link
Copy Markdown
Contributor Author

@ryoqun ryoqun Feb 7, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so, this single new line will be most objectionable change in this pr without in-source comments... Arguably, this is touching an important code path.

after some research, in short, i think this is a simple overlook at both https://github.com/solana-labs/solana/pull/21108/files#r1098224198 #21108 and https://github.com/solana-labs/solana/pull/22455/files#r1098222965 #22455

So, how this unintentional omission relates to this pr?:

bank fails to create sanitized tx from valid VersionedTransaction if sysvar isn't properly populated due to the internal use of slothashes. And this is a edge case (took a while to figure out because sometimes it worked by random args/ledgers). That's because normally we replay a bunch of slots after snapshot load. so, these new child banks are filled with sysvar caches via (new_from_parent). This bug only manifests when we immediately consume the snapshot bank without any replay.

As for justification of touching this code path, I considered to relax pub(crate) on fill_missing_sysvar_cache_entries() and call it inside get_accounts_used_in_range or minimize_bank_for_snapshot, under this pr's strong backport desire.

However, I concluded the risk is small overall (use of sysvar cache on snapshot bank is VERY rare as this isn't noticed for months) then, that this correct (yet broad) fix should instead be preferred to prevent potential future subtle bugs and another dev suffering from this unhappy rabbit hole.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cc: @Lichtso @jstarry as respective referenced pr authors.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

However, I concluded the risk is small overall (use of sysvar cache on snapshot bank is VERY rare as this isn't noticed for months)

I think normal validator operations would never hit this case since we replay and create child banks.

If there were a network outage (plz no) and we needed to restart from a snapshot, would we hit this case and be unable to restart w/o this patch?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If there were a network outage (plz no) and we needed to restart from a snapshot, would we hit this case and be unable to restart w/o this patch?

fortunately, it's not. the pesky hard network outage (hard-fork) case is rigorously tested by local-cluster tests.

as far as i checked, sysvar_cache is only used for tx processing. And the snapshot bank itself is frozen/rooted by definition. So, there should be no tx processing. I bet some frozen assert will be triggered, if ever done.

However, we need populated sysvar_cache for minimized snapshot as a kind of unusual use case here.

So, the other component which is affected would be rpc's tx preflight, with alt tx and with air-gapped node. (this is little known trick, but you can inspect account state at single point of time with rich set of rpc functionality if you make machine stand-alone with empty ledger with single snapshot).

On the other hand, starting to populate sysvar_cache shouldn't cause any bad effects as well. After all, it's rather harmless operation.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it; it does seem like it was an oversight from the previous PRs on just not including this code on the initialization from fields.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes you are right. Can't think of any reason why it shouldn't be called here as well. Seems like I simply missed it and it did not break any tests.


// Sanity assertions between bank snapshot and genesis config
// Consider removing from serializable bank state
Expand Down Expand Up @@ -6920,6 +6921,13 @@ impl Bank {
Ok(sanitized_tx)
}

pub fn fully_verify_transaction(
&self,
tx: VersionedTransaction,
) -> Result<SanitizedTransaction> {
self.verify_transaction(tx, TransactionVerificationMode::FullVerification)
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i love type safety. but it's too verbose to impose this fn signature at call-site, imo.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also, these small wrapper fns avoid new use TransactionVerificationMode-ing only for the sake of this fn invocation at call-site too

}

/// only called from ledger-tool or tests
fn calculate_capitalization(&self, debug_verify: bool) -> u64 {
let is_startup = true;
Expand Down