Add ledger-tool dead-slots and improve purge a lot by ryoqun · Pull Request #13065 · solana-labs/solana

ryoqun · 2020-10-21T10:40:13Z

Problem

We need to support an incident where a validator must recover from very old slot (which are marked as dead because of corruption for unknown reason).

Summary of Changes

Add debugging subcommand called ledger-tool dead-slots which just prints the dead slots.
Fix unbound memory grow of ledger-tool purge by batching. Previously, we tried to create infinitely-large (well, practically) rockdb's WriteBatch, resulting in std::bad_alloc from libstd++ in librocksdb.
Improve performance of ledger-tool purge by really disabling auto and manual compaction (--no-compaction haven't working to begin with due to a bug....)...

Combined with the changes, it's still slow but tolerable and finally we can purge any number of slots.

Note: This pr should be quickly backported way down to v1.3, so I tried to keep minimize diff and risk (of bugs).

Fixes #12907

ryoqun · 2020-10-21T10:57:07Z

example run result:

$ /tmp/solana-ledger-tool-v6 --ledger ledger/ purge --no-compaction 41590628 --batch-size 1000
[2020-10-21T09:31:54.281990469Z INFO  solana_ledger_tool] solana-ledger-tool 1.5.0 (src:990932c9; feat:4263608917)
[2020-10-21T09:31:54.282507407Z INFO  solana_ledger::blockstore] Maximum open file descriptors: 500000
[2020-10-21T09:31:54.282523188Z INFO  solana_ledger::blockstore] Opening database at "/home/sol/ledger/rocksdb"
[2020-10-21T09:31:54.285129209Z WARN  solana_ledger::blockstore_db] Disabling rocksdb's auto compaction for maintenance bulk ledger update...
[2020-10-21T09:36:09.847333769Z INFO  solana_ledger::blockstore] "/home/sol/ledger/rocksdb" open took 255.6s
Purging data from slots 41590628 to 41593628 (3000 slots) (skip compaction: true)
[2020-10-21T09:36:09.894200350Z INFO  solana_ledger_tool] Purging chunked slots from 41590628 to 41591627
[2020-10-21T09:36:26.907518807Z INFO  solana_metrics::metrics] metrics disabled: SOLANA_METRICS_CONFIG: environment variable not found
[2020-10-21T09:36:26.908140647Z INFO  solana_metrics::metrics] datapoint: blockstore-purge from_slot=41590628i to_slot=41591627i delete_range_us=10435755i write_batch_us=6540037i
[2020-10-21T09:36:28.096820185Z INFO  solana_ledger::blockstore::blockstore_purge] purge_from_next_slots: adjusted meta for slot 41590627
[2020-10-21T09:36:28.097889498Z INFO  solana_ledger_tool] Purging chunked slots from 41591628 to 41592627
[2020-10-21T09:36:48.345251818Z INFO  solana_metrics::metrics] datapoint: blockstore-purge from_slot=41591628i to_slot=41592627i delete_range_us=12937158i write_batch_us=7310027i
[2020-10-21T09:36:49.390388140Z INFO  solana_ledger_tool] Purging chunked slots from 41592628 to 41593627
[2020-10-21T09:37:09.236605111Z INFO  solana_metrics::metrics] datapoint: blockstore-purge from_slot=41592628i to_slot=41593627i delete_range_us=11881292i write_batch_us=7964787i
[2020-10-21T09:37:10.265496553Z INFO  solana_ledger_tool] Purging chunked slots from 41593628 to 41593628
[2020-10-21T09:37:10.279390795Z INFO  solana_metrics::metrics] datapoint: blockstore-purge from_slot=41593628i to_slot=41593628i delete_range_us=5844i write_batch_us=7887i

ryoqun · 2020-10-21T11:03:48Z

+                end_slot - start_slot,
+                no_compaction,
+            );
+            for slots in &(start_slot..=end_slot).chunks(batch_size) {


Hmm, with dead_slot_iterator, we can do better by implementing ledger-tool purge --only-dead-slots? Or, too dangerous to add it at this moment? ;)

Hmm, I think we should lean to the safer side. Still, the support incident's root cause is unclear. Purging only dead slots might not be enough...

I think we should lean to the safer side. Still, the support incident's root cause is unclear. Purging only dead slots might not be enough...

On the other hand, I'm starting to suspect a validator can handle the repair_state which will contain around 2_200_000 worth of slots to repair (this is the number of slots since the incident up to the current slot).... So, maybe purging dead slots only might be the only viable way forward?

@carllin do you have rough estimate for the required ram? Also, maybe flushing repairs to disk isn't small work, right?

Cool, solana-validator even can repair from snapshot-39258079-7o86jrh3NJNjYxx4pWcPa7fSwAZvX4HynXdhynZ98VF2.tar.zst, it seems atm.

So there is no problem.

We want to resume from around slot 41040067

Hmm, with dead_slot_iterator, we can do better by implementing ledger-tool purge --only-dead-slots?

I just added this in case for the backup plan: 61c38ac

ryoqun · 2020-10-21T11:13:10Z

+            for slots in &(start_slot..=end_slot).chunks(batch_size) {
+                let slots = slots.collect::<Vec<_>>();
+                assert!(!slots.is_empty());
+
+                let start_slot = *slots.first().unwrap();
+                let end_slot = *slots.last().unwrap();


I think this batching logic should be too simple to introduce bugs.

And this should be semantically equivalent repeatedly executing solana-ledger-tool purge ... {N1 N2, N2 N3, N3 N4}.

ryoqun · 2020-10-21T12:39:20Z

        ("purge", Some(arg_matches)) => {
            let start_slot = value_t_or_exit!(arg_matches, "start_slot", Slot);
            let end_slot = value_t!(arg_matches, "end_slot", Slot).ok();
-            let no_compaction = arg_matches.is_present("no-compaction");


"no-compaction" should be "no_compaction": https://github.com/solana-labs/solana/pull/11052/files#r509242376 #11052

codecov · 2020-10-21T14:19:19Z

Codecov Report

Merging #13065 into master will increase coverage by 0.0%.
The diff coverage is 89.6%.

@@           Coverage Diff           @@
##           master   #13065   +/-   ##
=======================================
  Coverage    82.1%    82.1%           
=======================================
  Files         366      366           
  Lines       86097    86105    +8     
=======================================
+ Hits        70739    70748    +9     
+ Misses      15358    15357    -1

ryoqun · 2020-10-21T16:33:41Z


        let end_slot = last_slot.unwrap();
        info!("Purging slots {} to {}", start_slot, end_slot);
-        blockstore.purge_slots(start_slot, end_slot, PurgeType::Exact);


this order was very broken, when the process is interrupted immediately after purge_slots and before purge_from_next_slots. It leaves super dangerous dangling references in meta....

ryoqun · 2020-10-21T16:34:05Z

+                    purge_from_blockstore(dead_slot, dead_slot);
+                }
            }
-            blockstore.purge_from_next_slots(start_slot, end_slot);


this order was very broken, when the process is interrupted immediately after purge_slots and before purge_from_next_slots. It leaves super dangerous dangling references in meta....

Well, when I was working around https://github.com/solana-labs/solana/pull/12350/files#diff-5e9f940ea065adfd3066ef0d8ef0cfb5029b5ba478d96fb484b36954e464505cR1609, I just skipped the fact check and corrected new code only.... and now that laziness bit me. ;)

ryoqun · 2020-10-21T16:45:17Z


        // Column family names
-        let meta_cf_descriptor = ColumnFamilyDescriptor::new(SlotMeta::NAME, get_cf_options());
+        let meta_cf_descriptor =


heh, finally rustfmt aligns these nicely. ;)

ryoqun · 2020-10-21T17:17:30Z

+                    .long("batch-size")
+                    .value_name("NUM")
+                    .takes_value(true)
+                    .default_value("1000")


well, I preferred more large values; but it seems that 10000 (45 slots/sec) worsens the throughput...

not much digging; I'll just settle down on 1000 (measured 50 slots/sec).

* Add ledger-tool dead-slots and improve purge a lot * Reduce batch size... * Add --dead-slots-only and fixed purge ordering (cherry picked from commit 0776fa0)

* Add ledger-tool dead-slots and improve purge a lot * Reduce batch size... * Add --dead-slots-only and fixed purge ordering (cherry picked from commit 0776fa0) Co-authored-by: Ryo Onodera <ryoqun@gmail.com>

Add ledger-tool dead-slots and improve purge a lot

556ddcd

ryoqun requested review from carllin and mvines October 21, 2020 10:40

Reduce batch size...

add305c

ryoqun commented Oct 21, 2020

View reviewed changes

ryoqun added v1.3 labels Oct 21, 2020

ryoqun commented Oct 21, 2020

View reviewed changes

ryoqun mentioned this pull request Oct 21, 2020

ledger-tool: Add purge --no-compaction flag #11052

Merged

Add --dead-slots-only and fixed purge ordering

61c38ac

ryoqun commented Oct 21, 2020

View reviewed changes

ryoqun mentioned this pull request Oct 21, 2020

Follow up to persistent tower with tests and API cleaning #12350

Merged

mvines approved these changes Oct 21, 2020

View reviewed changes

ryoqun removed the v1.3 label Oct 21, 2020

ryoqun mentioned this pull request Oct 21, 2020

Add ledger-tool dead-slots and improve purge a lot (manual bp: #13065) #13070

Merged

ryoqun commented Oct 21, 2020

View reviewed changes

ryoqun added the automerge Merge this Pull Request automatically once CI passes label Oct 21, 2020

mergify Bot merged commit 0776fa0 into solana-labs:master Oct 21, 2020

mergify Bot mentioned this pull request Oct 21, 2020

Add ledger-tool dead-slots and improve purge a lot (bp #13065) #13071

Merged

Conversation

ryoqun commented Oct 21, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Problem

Summary of Changes

Uh oh!

ryoqun commented Oct 21, 2020

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ryoqun Oct 21, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ryoqun Oct 21, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ryoqun Oct 21, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

codecov Bot commented Oct 21, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

ryoqun commented Oct 21, 2020 •

edited

Loading

ryoqun Oct 21, 2020 •

edited

Loading

ryoqun Oct 21, 2020 •

edited

Loading

ryoqun Oct 21, 2020 •

edited

Loading

codecov Bot commented Oct 21, 2020 •

edited

Loading