Fix/minimize rebuild #15

gammazero · 2021-07-19T00:43:27Z

This PR builds on, and is a replacement for PR #13.

This PR addresses two issues with the pinner:

An untended shutdown of IPFS can lead to saving pins in an incomplete state, requiring rebuilding indexes at the next start up.
A defect in the pin index rebuild process caused all indexes to be rebuilt, whether needed or not.

These solutions to these problems:

Pins are automatically synced after each modification to pins and indexes. This default behavior can be disabled and re-enabled if needed for high-volume operations. Additionally, syncing pins is done separately from syncing the dag service, so a long duration dag service sync does prolong the time that pins are in an incomplete "dirty" state.
The rebuild process fixes a problem with setting the correct ID when loading pins. Additionally, large numbers of pin and index data are not loaded into maps during this process.

Additional tests have been added to test the correctness of indexes and the rebuild process.

This PR builds on PR #13 to sync the pinner on every pin operation. This PR does the following that #13 does not: - Sync's dag service separately from pin data - Does not release and immediately reacquire lock to sync, syncs while still holding pinner lock. - Syncs only pin data for most operations In addition to sync of pin data, this PR also revises how indexes are rebuilt. Instead of reading through all pins and loading all previous index data, now only a single pass through the pins is needed to rebuild missing indexes resulting from incomplete add or delete operations. This is operationally much simpler, but also does not require storing entire pin sets or index sets in memory as did the previous solution.

Fixes an error that would cause all indexes to be rebuilt whether needed or not when dirty flag detected. This would greatly prolong the rebuilding process as every index for every pin would be rewritten. No other problems resulted, othen than the unnesessary rebuild of all indexes and large amount of memory used during rebuild for large pin sets. Tests have been added to verify the correct functioning of the indexer and rebuilding process.

petar

It looks reasonable. The highest value here appears to be the more efficient index rebuilding. This pinner implementation is becoming hard to review and verify. Going forward it should teach us to do the right thing: use transactions or write-ahead logging. I am suspecting that every bug fix iteration on this implementation is comparable in time to re-implementing the whole thing using transactions. In any event, this looks better than the previous version!

dspinner/pin.go

gammazero · 2021-07-19T20:27:50Z

Using transactions for this is the better fix. However, this means that the pinner must be given a transaction-capable datastore, and the pinner will need to be able to get the transaction interface from the datastore -- which is not difficult to solve. However, if there are pinners currently using datastores that do not support transactions, then this will require a migration to a new datastore, and that may require a larger scoped change.

petar

LGTM.

@aschmahmann should we deploy in the clusters before merging to master?

aschmahmann · 2021-07-19T20:59:43Z

Running some live tests here seems reasonable to me if we're pretty confident that things should work and we're close to merge (i.e. we don't want to mess up the cluster nodes if we can avoid it). I would only deploy to one at a time in case there are issues.

As for transactions. I wouldn't worry too much about little using non-transactional datastores, although we can take care of this in a subsequent PR. The main datastores that I know of that are in use here are LevelDB and Badger both of which should support transactions. Maybe there's also a map datastore, but we could either make one that supports transactions or just use the in memory levelDB datastore.

gammazero · 2021-07-28T15:56:00Z

dspinner/pin.go

@@ -981,15 +981,15 @@ func (p *pinner) rebuildIndexes(ctx context.Context) error {
 			indexer = p.cidRIndex
 			// Delete any direct index
 			err = p.cidDIndex.Delete(ctx, indexKey, pp.Id)
-			log.Infof("deleting stale pin index for cid %v", pp.Cid.String())
+			log.Errorf("deleting stale pin index for cid %v", pp.Cid.String())


This appears to log the error message whether or not a stale pin was actually found. Perhaps Delete needs to return a boolean, or there should be a find for the pin first.

Let's not log deletions for now. I'll update the PR.

gammazero added 4 commits July 16, 2021 16:10

remove unneeded vals

62b7c58

Additional error checks

54fd055

gammazero requested review from aschmahmann and petar July 19, 2021 01:04

petar reviewed Jul 19, 2021

View reviewed changes

dspinner/pin.go Outdated Show resolved Hide resolved

dspinner/pin.go Show resolved Hide resolved

dspinner/pin.go Outdated Show resolved Hide resolved

dspinner/pin.go Show resolved Hide resolved

dspinner/pin.go Outdated Show resolved Hide resolved

Review changes

cb1434c

gammazero requested a review from petar July 19, 2021 20:28

petar approved these changes Jul 19, 2021

View reviewed changes

petar added a commit to ipfs/kubo that referenced this pull request Jul 20, 2021

point go mod pinner to ipfs/go-ipfs-pinner#15

ba4ad9e

BigLep mentioned this pull request Jul 20, 2021

Restarting node with invalid pin index takes excessive time ipfs/kubo#8149

Closed

petar mentioned this pull request Jul 27, 2021

point ipfs to pinner that syncs on every pin ipfs/kubo#8231

Merged

gammazero and others added 3 commits July 27, 2021 11:00

Do not rebuild indexes with old values

86e8c79

log every repaired pin; flush datastore while repairing

bba6acc

info -> error

a6efbad

gammazero commented Jul 28, 2021

View reviewed changes

Check that stale pin exists before logging and removing

a223f5b

petar merged commit f708928 into master Jul 29, 2021

petar mentioned this pull request Jul 29, 2021

sync pinner on every pin operation #13

Closed

gammazero deleted the fix/minimize-rebuild branch July 30, 2021 16:39

aschmahmann mentioned this pull request Aug 23, 2021

Release v0.10 ipfs/kubo#8176

Closed

62 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix/minimize rebuild #15

Fix/minimize rebuild #15

gammazero commented Jul 19, 2021

petar left a comment

gammazero commented Jul 19, 2021

petar left a comment

aschmahmann commented Jul 19, 2021

gammazero Jul 28, 2021

petar Jul 28, 2021

Fix/minimize rebuild #15

Fix/minimize rebuild #15

Conversation

gammazero commented Jul 19, 2021

petar left a comment

Choose a reason for hiding this comment

gammazero commented Jul 19, 2021

petar left a comment

Choose a reason for hiding this comment

aschmahmann commented Jul 19, 2021

gammazero Jul 28, 2021

Choose a reason for hiding this comment

petar Jul 28, 2021

Choose a reason for hiding this comment