Fix the boundary inconsistency between delete_file_in_range and delete_range#27201
Fix the boundary inconsistency between delete_file_in_range and delete_range#27201yhchiang-sol merged 1 commit intosolana-labs:masterfrom yhchiang-sol:delete-files-in-range
Conversation
|
could you write a unit test for this? I think manually flushing can create the problematic situation? |
steviez
left a comment
There was a problem hiding this comment.
Logic itself looks good, just one thing where I think we can shrink the diff a little (open to discussion on it though if you disagree).
…e_range (solana-labs#27201) #### Problem RocksDB's delete_range applies to [from, to) while delete_file_in_range applies to [from, to] by default, and the rust-rocksdb api does not include the option to make delete_file_in_range apply to [from, to). Such inconsistency might cause `blockstore::run_purge` to produce an inconsistent result as it invokes both delete_range and delete_file_in_range. #### Summary of Changes This PR makes all our purge / delete related functions to be inclusive on both starting and ending slots.
| ) -> Result<()> { | ||
| let mut index0 = self.transaction_status_index_cf.get(0)?.unwrap_or_default(); | ||
| let mut index1 = self.transaction_status_index_cf.get(1)?.unwrap_or_default(); | ||
| let to_slot = to_slot.saturating_add(1); |
There was a problem hiding this comment.
nit: there's more rusty way:
for slot in from_slot..=to_slot {
...
}There was a problem hiding this comment.
Thanks for spotting this. Will have a quick fix for this.
| /// is different from \[`from`, `to`\] of Database::delete_range_cf as we makes | ||
| /// the semantics of Database::delete_range_cf matches the blockstore purge | ||
| /// logic. | ||
| fn delete_range_cf<C: Column>( |
There was a problem hiding this comment.
@yhchiang-sol I'm very happy about our new consistent interval handling with [from, to]. :)
that said, I think we should put the range manipulation code as deep as possible for encapsulation.
so, i think this non-pub fn might be a good place to actually .saturating_add(1). That's because it looks like WriteBatch::delete_range_cf is only called by Database::delete_range_cf?
Then, we can remove these rather extra justification comment about different semantics put at the docstring in WriteBatch::delete_range_cf
is there strong reason we're adjusting the to at Database::delete_range_cf specifically?
There was a problem hiding this comment.
The type here is C::Index, which could be u64, (u64, u64), Pubkey, Signature, (u64, Signature, Slot), and (u64, Pubkey, Slot, Signature). We will need to implement the C::saturating_add for each of them. Might be worth-trying I think, although some types might be tricky.
If we want to move lower to the rocksdb delete_range_cf, then is more difficult to perform +1 as it takes arbitrary byte array.
A cleaner solution is to make RocksDB's range delete optionally perform the inclusive deletion based on the WriteOptions where we will add a new boolean indicating inclusive deletion, but I guess this would take a much longer route as we need to carry this information into range-deletion key format and update the internal range-deletion logic to honor this.
So probably good for now I think.
allow me to re-iterate my comment. however, I'd rather like to see these changes be accompanied with proper tests, which clearly pin-points which behavior (bug) is actually changed. as you might know, a bug in persistent subsystem is rather critical and generally hard to recover when encountered at the production. Of course, i think there can be an exception. i mean, if this pr is kind of urgent to ship. Admittedly, i committed a sin of sparse test coverage when i worked on #16697 because that's was urgent to ship... Lastly, I know #26651 went through extensive testing. but it still missed to spot this, right?
for example, for this, I'd write like this:
|
| write_batch, | ||
| w_active_transaction_status_index, | ||
| to_slot, | ||
| to_slot + 1, |
There was a problem hiding this comment.
I'm in the process of revisiting deletion- and ledger-cleanup-related code and adding missing tests if any. Just want to leave a comment here that this has been covered by the existing check.
Below is the test failure log if I remove +1 in the above statement.
---- blockstore::blockstore_purge::tests::test_purge_transaction_status stdout ----
thread 'blockstore::blockstore_purge::tests::test_purge_transaction_status' panicked at 'assertion failed: `(left == right)`
left: `0`,
right: `2`', ledger/src/blockstore/blockstore_purge.rs:752:9
stack backtrace:
0: rust_begin_unwind
at /rustc/4b91a6ea7258a947e59c6522cd5898e7c0a6a88f/library/std/src/panicking.rs:584:5
1: core::panicking::panic_fmt
at /rustc/4b91a6ea7258a947e59c6522cd5898e7c0a6a88f/library/core/src/panicking.rs:142:14
2: core::panicking::assert_failed_inner
3: core::panicking::assert_failed
at /rustc/4b91a6ea7258a947e59c6522cd5898e7c0a6a88f/library/core/src/panicking.rs:181:5
4: solana_ledger::blockstore::blockstore_purge::tests::test_purge_transaction_status
at ./src/blockstore/blockstore_purge.rs:752:9
5: solana_ledger::blockstore::blockstore_purge::tests::test_purge_transaction_status::{{closure}}
at ./src/blockstore/blockstore_purge.rs:615:5
6: core::ops::function::FnOnce::call_once
at /rustc/4b91a6ea7258a947e59c6522cd5898e7c0a6a88f/library/core/src/ops/function.rs:248:5
7: core::ops::function::FnOnce::call_once
at /rustc/4b91a6ea7258a947e59c6522cd5898e7c0a6a88f/library/core/src/ops/function.rs:248:5
note: Some details are omitted, run with `RUST_BACKTRACE=full` for a verbose backtrace.
|
@yhchiang-sol hey, sorry for bunch of post-merge comments here and spin-off pr for you #27529 ;). all of these aren't urgent at all. please reply to/work on these at your convenient time. :) thanks for maintaining blockstore code. |
Problem
RocksDB's delete_range applies to [from, to) while delete_file_in_range
applies to [from, to] by default, and the rust-rocksdb api does not include
the option to make delete_file_in_range apply to [from, to). Such inconsistency
might cause
blockstore::run_purgeto produce an inconsistent result as itinvokes both delete_range and delete_file_in_range.
rocksdb::DeleteRange
https://github.com/facebook/rocksdb/blob/91166012c848f720f0208e91d766810d4f7e8cf9/include/rocksdb/db.h#L463-L479
rocksdb::DeleteFilesInRange
https://github.com/facebook/rocksdb/blob/91166012c848f720f0208e91d766810d4f7e8cf9/include/rocksdb/convenience.h#L496-L503
and rocksdb's c api hides the
include_enddefault param, defaulting to= true....https://github.com/facebook/rocksdb/blob/91166012c848f720f0208e91d766810d4f7e8cf9/db/c.cc#L5236-L5259
Summary of Changes
This PR makes all our purge / delete related functions to be inclusive
on both starting and ending slots.