blockstore: use erasure meta index field to find conflicting shreds#1151
Conversation
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## master #1151 +/- ##
=======================================
Coverage 82.1% 82.1%
=======================================
Files 886 886
Lines 236439 236490 +51
=======================================
+ Hits 194252 194306 +54
+ Misses 42187 42184 -3 🚀 New features to boost your workflow:
|
| if !self.has_duplicate_shreds_in_slot(slot) { | ||
| let conflicting_shred = self | ||
| .find_conflicting_coding_shred(&shred, slot, erasure_meta, just_received_shreds) | ||
| .expect("Shred indicated by erasure meta must exist") |
There was a problem hiding this comment.
This expect here made me wonder what happens in the scenarios that a shred is purge/pruned/dumped from blockstore between the time erasure_meta is obtained until here the shred is fetched from blockstore?
There was a problem hiding this comment.
when replay decides to dump a block it uses clear_unconfirmed_slot which shares a lock with shred insertion precisely to avoid the situation you mentioned:
agave/ledger/src/blockstore.rs
Lines 1290 to 1291 in bf1b765
i'm not an expert on the blockstore automatic cleanup (cc @steviez). while it doesn't share this lock, it looks at blockstore.max_root() in order to determine which slots to cleanup. We also compare against blockstore.max_root() earlier
agave/ledger/src/blockstore.rs
Line 1386 in bf1b765
but this seems unsafe as
replay_stage could have rooted and triggered an automatic cleanup in between.
There was a problem hiding this comment.
hmm, will recheck these blockstore expects under the assumption that cleanup could have happened https://discord.com/channels/428295358100013066/1235788902888116274/1235942333334290442
There was a problem hiding this comment.
You're correct that the BlockstoreCleanupService doesn't share the insert_shreds_lock. And right, the service fetches the latest root, and potentially clean data that is strictly older than the root. We never clean data newer than the latest root.
Practically speaking, the value for --limit-ledger-size is required to be >= 50M. Assuming 1k shreds per slot, the oldest data is ~50k slots older than the newest data. That being said, I think there is merit to an unwrap() and expect() audit like y'all are suggesting
There was a problem hiding this comment.
hmm, will recheck these blockstore expects under the assumption that cleanup could have happened https://discord.com/channels/428295358100013066/1235788902888116274/1235942333334290442
Given that mainnet is upgrading to v1.18, and we have not have much of soak time, probably better to demote all (or most of) those expects to an error log or metric. We can always add them back in if we are confident we are not missing an edge case.
There was a problem hiding this comment.
I combed through the expect/unwraps we introduced and demoted the affected ones here #1259 .
I believe these remaining checks to be safe and have not demoted them:
agave/ledger/src/blockstore.rs
Line 1134 in fd8b075
agave/ledger/src/blockstore.rs
Line 1163 in fd8b075
| .store_duplicate_slot(slot, conflicting_shred.clone(), shred.payload().clone()) | ||
| .is_err() | ||
| { | ||
| warn!("bad duplicate store.."); |
There was a problem hiding this comment.
this was old code but nonetheless we need to log the actual err than just dropping it.
2dd7027 to
c3b08ae
Compare
| )); | ||
| } | ||
| } else { | ||
| datapoint_info!("bad-conflict-shred", ("slot", slot, i64)); |
There was a problem hiding this comment.
maybe keep this line (or an error log or something similar), for when find_conflicting_coding_shred returns None.
There was a problem hiding this comment.
We have a couple warn!()'s under the // ToDo: ... line, do think those are sufficient ? Namely, I think self.has_duplicate_shreds_in_slot(slot) still being false would indicate that we did not find a conflicting coding shred
There was a problem hiding this comment.
hmm! the warn! logs will be there regardless of this datapoint.
but this bad-conflict-shred would specifically indicate that we have a mismatching erasure-meta but can't find the shred which initialized that erasure-meta.
anyways, I am inclined to keep this datapoint just in case we observe these anomalies, but don't feel strongly about that. so either way is fine with me.
There was a problem hiding this comment.
I added it back as an error log.
| if let Err(e) = self.store_duplicate_slot( | ||
| slot, | ||
| conflicting_shred.clone(), | ||
| shred.payload().clone(), |
There was a problem hiding this comment.
Semi-related to the PR (but certainly not blockers):
- Wondering why we didn't use
write_batchfor this function to store the udpates, not that the updates to this column need to occur atomically with any other column - Wondering if we could store the proof without cloning the two
Vec's; we'd want to serialize with borrowed data but deserialize with owned data. A quick glance since bincode supportsCow
There was a problem hiding this comment.
yeah we could definitely be smarter about this:
- write batch to store duplicate proof updates
- cache a map so we don't have to read blockstore every time we check if there's already a duplicate proof
- avoid the clones
I guess duplicate's happen so infrequently that this isn't a major problem, but can definitely look into cleaning this up in a follow up.
There was a problem hiding this comment.
Wondering why we didn't use write_batch for this function to store the udpate
I figured this out and thought I edited my comment, but I guess I did not:
store_duplicate_slot()andhas_duplicate_shreds_in_slot()both read from the DB- Items written to the
write_batchare not visible to the DB until the batch is committed
We call has_duplicate_shreds_in_slot() in the same function, potentially after calling store_duplicate_slot() ... if we used the write_batch, the read wouldn't see the write immediately before it. And, if insert_shreds_handle_duplicate() was called with 100 shreds that we're iterating through, a duplicate proof written for the 5th shred would not be visible if we read for the next 95 shreds
This same gotcha is why we have the get_shred_from_just_inserted_or_db() function in combination with the HashMap for getting shreds that are in the same "insert batch"
| )); | ||
| } | ||
| } else { | ||
| datapoint_info!("bad-conflict-shred", ("slot", slot, i64)); |
There was a problem hiding this comment.
We have a couple warn!()'s under the // ToDo: ... line, do think those are sufficient ? Namely, I think self.has_duplicate_shreds_in_slot(slot) still being false would indicate that we did not find a conflicting coding shred
steviez
left a comment
There was a problem hiding this comment.
Change looks good to me, but I think it'd be good to confirm that Behzad's concern about the removed datapoint is adequately covered by the comments further down the function I mentioned
Problem
We perform a costly scan in order to find the original coding shred when an erasure config mismatch is detected.
Summary of Changes
Instead use the new field on erasure meta introduced in #961
Revert to the scan if the field is 0 in order to support old blockstores. This can be removed when we no longer support < v1.18.12