Skip to content
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
59 changes: 34 additions & 25 deletions ledger/src/blockstore.rs
Original file line number Diff line number Diff line change
Expand Up @@ -1416,32 +1416,30 @@ impl Blockstore {

if !erasure_meta.check_coding_shred(&shred) {
metrics.num_coding_shreds_invalid_erasure_config += 1;
let conflicting_shred = self.find_conflicting_coding_shred(
&shred,
slot,
erasure_meta,
just_received_shreds,
);
if let Some(conflicting_shred) = conflicting_shred {
if !self.has_duplicate_shreds_in_slot(slot) {
if self
.store_duplicate_slot(
slot,
conflicting_shred.clone(),
shred.payload().clone(),
)
.is_err()
{
warn!("bad duplicate store..");
if !self.has_duplicate_shreds_in_slot(slot) {
if let Some(conflicting_shred) = self
.find_conflicting_coding_shred(&shred, slot, erasure_meta, just_received_shreds)
.map(Cow::into_owned)
{
if let Err(e) = self.store_duplicate_slot(
slot,
conflicting_shred.clone(),
shred.payload().clone(),
Comment on lines +1424 to +1427
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Semi-related to the PR (but certainly not blockers):

  • Wondering why we didn't use write_batch for this function to store the udpates, not that the updates to this column need to occur atomically with any other column
  • Wondering if we could store the proof without cloning the two Vec's; we'd want to serialize with borrowed data but deserialize with owned data. A quick glance since bincode supports Cow

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah we could definitely be smarter about this:

  • write batch to store duplicate proof updates
  • cache a map so we don't have to read blockstore every time we check if there's already a duplicate proof
  • avoid the clones

I guess duplicate's happen so infrequently that this isn't a major problem, but can definitely look into cleaning this up in a follow up.

Copy link
Copy Markdown

@steviez steviez May 9, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wondering why we didn't use write_batch for this function to store the udpate

I figured this out and thought I edited my comment, but I guess I did not:

  • store_duplicate_slot() and has_duplicate_shreds_in_slot() both read from the DB
  • Items written to the write_batch are not visible to the DB until the batch is committed

We call has_duplicate_shreds_in_slot() in the same function, potentially after calling store_duplicate_slot() ... if we used the write_batch, the read wouldn't see the write immediately before it. And, if insert_shreds_handle_duplicate() was called with 100 shreds that we're iterating through, a duplicate proof written for the 5th shred would not be visible if we read for the next 95 shreds

This same gotcha is why we have the get_shred_from_just_inserted_or_db() function in combination with the HashMap for getting shreds that are in the same "insert batch"

) {
warn!("Unable to store conflicting erasure meta duplicate proof for {slot} {erasure_set:?} {e}");
}

duplicate_shreds.push(PossibleDuplicateShred::ErasureConflict(
shred.clone(),
conflicting_shred,
));
} else {
error!(
"Unable to find the conflicting coding shred that set {erasure_meta:?}.
This should only happen in extreme cases where blockstore cleanup has caught up to the root.
Skipping the erasure meta duplicate shred check"
);
}
} else {
datapoint_info!("bad-conflict-shred", ("slot", slot, i64));
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe keep this line (or an error log or something similar), for when find_conflicting_coding_shred returns None.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have a couple warn!()'s under the // ToDo: ... line, do think those are sufficient ? Namely, I think self.has_duplicate_shreds_in_slot(slot) still being false would indicate that we did not find a conflicting coding shred

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm! the warn! logs will be there regardless of this datapoint.
but this bad-conflict-shred would specifically indicate that we have a mismatching erasure-meta but can't find the shred which initialized that erasure-meta.

anyways, I am inclined to keep this datapoint just in case we observe these anomalies, but don't feel strongly about that. so either way is fine with me.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added it back as an error log.

}

// ToDo: This is a potential slashing condition
Expand Down Expand Up @@ -1484,28 +1482,39 @@ impl Blockstore {
result
}

fn find_conflicting_coding_shred(
&self,
fn find_conflicting_coding_shred<'a>(
&'a self,
shred: &Shred,
slot: Slot,
erasure_meta: &ErasureMeta,
just_received_shreds: &HashMap<ShredId, Shred>,
) -> Option<Vec<u8>> {
just_received_shreds: &'a HashMap<ShredId, Shred>,
) -> Option<Cow<Vec<u8>>> {
// Search for the shred which set the initial erasure config, either inserted,
// or in the current batch in just_received_shreds.
let index = erasure_meta.first_received_coding_shred_index()?;
let shred_id = ShredId::new(slot, index, ShredType::Code);
let maybe_shred = self.get_shred_from_just_inserted_or_db(just_received_shreds, shred_id);

if index != 0 || maybe_shred.is_some() {
return maybe_shred;
}
Comment thread
steviez marked this conversation as resolved.

// If we are using a blockstore created from an earlier version than 1.18.12,
// `index` will be 0 as it was not yet populated, revert to a scan until we no longer support
// those blockstore versions.
for coding_index in erasure_meta.coding_shreds_indices() {
let maybe_shred = self.get_coding_shred(slot, coding_index);
if let Ok(Some(shred_data)) = maybe_shred {
let potential_shred = Shred::new_from_serialized_shred(shred_data).unwrap();
if shred.erasure_mismatch(&potential_shred).unwrap() {
return Some(potential_shred.into_payload());
return Some(Cow::Owned(potential_shred.into_payload()));
}
} else if let Some(potential_shred) = {
let key = ShredId::new(slot, u32::try_from(coding_index).unwrap(), ShredType::Code);
just_received_shreds.get(&key)
} {
if shred.erasure_mismatch(potential_shred).unwrap() {
return Some(potential_shred.payload().clone());
return Some(Cow::Borrowed(potential_shred.payload()));
}
}
}
Expand Down