Skip to content
This repository was archived by the owner on Jan 22, 2025. It is now read-only.
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
32 changes: 30 additions & 2 deletions ledger/src/leader_schedule_cache.rs
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
use crate::{blocktree::Blocktree, leader_schedule::LeaderSchedule, leader_schedule_utils};
use log::*;
use solana_runtime::bank::Bank;
use solana_runtime::bank::{Bank, MAX_LEADER_SCHEDULE_STAKES};
use solana_sdk::{
clock::{Epoch, Slot},
epoch_schedule::EpochSchedule,
Expand Down Expand Up @@ -48,7 +48,9 @@ impl LeaderScheduleCache {

// Calculate the schedule for all epochs between 0 and leader_schedule_epoch(root)
let leader_schedule_epoch = epoch_schedule.get_leader_schedule_epoch(root_bank.slot());
for epoch in 0..leader_schedule_epoch {
for epoch in (leader_schedule_epoch.max(MAX_LEADER_SCHEDULE_STAKES)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this might come out prettier if the bank gave you the epochs for which it had leader_schedule_stakes...

- MAX_LEADER_SCHEDULE_STAKES)..leader_schedule_epoch
{
let first_slot_in_epoch = epoch_schedule.get_first_slot_in_epoch(epoch);
cache.slot_leader_at(first_slot_in_epoch, Some(root_bank));
}
Expand Down Expand Up @@ -84,6 +86,22 @@ impl LeaderScheduleCache {
}

pub fn slot_leader_at(&self, slot: Slot, bank: Option<&Bank>) -> Option<Pubkey> {
let slot_epoch = self.epoch_schedule.get_epoch_and_slot_index(slot).0;
if slot_epoch
< self
.max_epoch
.read()
.unwrap()
.max(MAX_LEADER_SCHEDULE_STAKES)
- MAX_LEADER_SCHEDULE_STAKES
{
panic!(
Copy link
Copy Markdown
Contributor

@sagar-solana sagar-solana Nov 26, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure its best to crash here.

What if someone sends our validator a Shred that's really old, it'll cause the validator to panic.
Actually that's a bad example(but hopefully it gets the point across), I think the window_service should restructure its checks such that it doesn't risk querying a leader schedule that far back.

Since the function returns an Option, why not just return None and log an error?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the window_service should restructure its checks such that it doesn't risk querying a leader schedule that far back. ... Since the function returns an Option, why not just return None and log an error?

Yes, that will be what I'd write as the final form of this PR!

I'm doing panic! here just to know how much other code is affected by this PR's change. So I should comment as such. Sorry for being less explicit about the intention. And thanks for checking my PR!

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Gotcha! Thanks for adding the [wip] in the title :) Maybe a Draft PR in future experiments?
Also the panic won't catch all possibilities. Like the RPC one I described but it's perfectly fine for what you're trying to do right now. Also imo snapshot stuff > this.

"Requested too old epoch!: {} < {}",
slot_epoch,
*self.max_epoch.read().unwrap()
);
}

if let Some(bank) = bank {
self.slot_leader_at_else_compute(slot, bank)
} else if self.epoch_schedule.slots_per_epoch == 0 {
Expand Down Expand Up @@ -113,6 +131,16 @@ impl LeaderScheduleCache {
);
return None;
}
if self.get_epoch_schedule_else_compute(epoch, bank).is_none()
/* && epoch < bank.epoch()*/
{
panic!(
"getting failed! epoch: {}, slot: {}, bank epoch: {}",
epoch,
current_slot,
bank.epoch()
);
}
while let Some(leader_schedule) = self.get_epoch_schedule_else_compute(epoch, bank) {
// clippy thinks I should do this:
// for (i, <item>) in leader_schedule
Expand Down
1 change: 1 addition & 0 deletions local-cluster/src/local_cluster.rs
Original file line number Diff line number Diff line change
Expand Up @@ -716,6 +716,7 @@ mod test {
cluster_lamports: 100,
ticks_per_slot: 8,
slots_per_epoch: MINIMUM_SLOTS_PER_EPOCH as u64,
stakers_slot_offset: MINIMUM_SLOTS_PER_EPOCH as u64,
..ClusterConfig::default()
};
let cluster = LocalCluster::new(&config);
Expand Down
1 change: 1 addition & 0 deletions local-cluster/tests/local_cluster.rs
Original file line number Diff line number Diff line change
Expand Up @@ -307,6 +307,7 @@ fn test_two_unbalanced_stakes() {
validator_configs: vec![validator_config.clone(); 2],
ticks_per_slot: num_ticks_per_slot,
slots_per_epoch: num_slots_per_epoch,
stakers_slot_offset: num_slots_per_epoch,
poh_config: PohConfig::new_sleep(Duration::from_millis(1000 / num_ticks_per_second)),
..ClusterConfig::default()
});
Expand Down
71 changes: 63 additions & 8 deletions runtime/src/bank.rs
Original file line number Diff line number Diff line change
Expand Up @@ -54,6 +54,8 @@ use std::{

pub const SECONDS_PER_YEAR: f64 = (365.25 * 24.0 * 60.0 * 60.0);

pub const MAX_LEADER_SCHEDULE_STAKES: Epoch = 3;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is small for testing? I thought we were gonna use 16 or 32

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. Oh, sorry for the lack of comment. I'm intentionally reducing this to cause CI fragile. :D

@rob-solana At my current understanding with a day of code reading/testing, the assumptions that leader_schedule_stakes for past epochs are always present doesn't cause hard errors for validators (no unwrap on the data derived from bank.epoch_stakes), but might silently stop operating or skip (old) legimate tasks (unwrap_or/unwrap_defaults). So, I'm chasing the possible problem due to new retaining behavior.


type BankStatusCache = StatusCache<Result<()>>;

#[derive(Default)]
Expand Down Expand Up @@ -386,14 +388,7 @@ impl Bank {
}
}

// update epoch_stakes cache
// if my parent didn't populate for this staker's epoch, we've
// crossed a boundary
if new.epoch_stakes.get(&leader_schedule_epoch).is_none() {
new.epoch_stakes
.insert(leader_schedule_epoch, new.stakes.read().unwrap().clone());
}

new.update_epoch_stakes(leader_schedule_epoch);
new.ancestors.insert(new.slot(), 0);
new.parents().iter().enumerate().for_each(|(i, p)| {
new.ancestors.insert(p.slot(), i + 1);
Expand Down Expand Up @@ -486,6 +481,25 @@ impl Bank {
self.store_account(&sysvar::slot_hashes::id(), &account);
}

fn update_epoch_stakes(&mut self, leader_schedule_epoch: Epoch) {
// update epoch_stakes cache
// if my parent didn't populate for this staker's epoch, we've
// crossed a boundary
if self.epoch_stakes.get(&leader_schedule_epoch).is_none() {
self.epoch_stakes
.insert(leader_schedule_epoch, self.stakes.read().unwrap().clone());

if leader_schedule_epoch >= MAX_LEADER_SCHEDULE_STAKES {
self.epoch_stakes
.remove(&(leader_schedule_epoch - MAX_LEADER_SCHEDULE_STAKES));
}
error!(
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why error?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, the log level should be lower than error.

Just to stand out for testing purpose as said above. Sorry for misleading you!

"new epoch stakes: {:#?}",
self.epoch_stakes.keys().collect::<Vec<_>>()
);
}
}

fn update_fees(&self) {
self.store_account(
&sysvar::fees::id(),
Expand Down Expand Up @@ -1743,6 +1757,47 @@ mod tests {
assert_eq!(bank1.block_height(), 1);
}

impl Bank {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few more comments here would be awesome :)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the feedback! Piece of cake once I fix the snapshot issue. :)

fn epoch_stake_keys(&self) -> Vec<Epoch> {
let mut keys: Vec<Epoch> = self.epoch_stakes.keys().map(|k| *k).collect();
keys.sort();
keys
}

fn epoch_stake_key_info(&self) -> (Epoch, Epoch, usize) {
let mut keys: Vec<Epoch> = self.epoch_stakes.keys().map(|k| *k).collect();
keys.sort();
(*keys.first().unwrap(), *keys.last().unwrap(), keys.len())
}
}

#[test]
fn test_bank_update_epoch_stakes() {
let (genesis_config, _mint_keypair) = create_genesis_config(100_000);
let mut bank = Bank::new(&genesis_config);

let initial_epochs = bank.epoch_stake_keys();
assert_eq!(initial_epochs, vec![0, 1]);

for existing_epoch in &initial_epochs {
bank.update_epoch_stakes(*existing_epoch);
assert_eq!(bank.epoch_stake_keys(), initial_epochs);
}

for epoch in (initial_epochs.len() as Epoch)..MAX_LEADER_SCHEDULE_STAKES {
bank.update_epoch_stakes(epoch);
assert_eq!(bank.epoch_stakes.len() as Epoch, epoch + 1);
}

assert_eq!(bank.epoch_stake_key_info(), (0, 2, 3));

bank.update_epoch_stakes(MAX_LEADER_SCHEDULE_STAKES);
assert_eq!(bank.epoch_stake_key_info(), (1, 3, 3));

bank.update_epoch_stakes(MAX_LEADER_SCHEDULE_STAKES + 1);
assert_eq!(bank.epoch_stake_key_info(), (2, 4, 3));
}

#[test]
fn test_bank_capitalization() {
let bank = Arc::new(Bank::new(&GenesisConfig {
Expand Down