-
Notifications
You must be signed in to change notification settings - Fork 10
Conversation
This PR builds on PR #13 to sync the pinner on every pin operation. This PR does the following that #13 does not: - Sync's dag service separately from pin data - Does not release and immediately reacquire lock to sync, syncs while still holding pinner lock. - Syncs only pin data for most operations In addition to sync of pin data, this PR also revises how indexes are rebuilt. Instead of reading through all pins and loading all previous index data, now only a single pass through the pins is needed to rebuild missing indexes resulting from incomplete add or delete operations. This is operationally much simpler, but also does not require storing entire pin sets or index sets in memory as did the previous solution.
Fixes an error that would cause all indexes to be rebuilt whether needed or not when dirty flag detected. This would greatly prolong the rebuilding process as every index for every pin would be rewritten. No other problems resulted, othen than the unnesessary rebuild of all indexes and large amount of memory used during rebuild for large pin sets. Tests have been added to verify the correct functioning of the indexer and rebuilding process.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks reasonable. The highest value here appears to be the more efficient index rebuilding. This pinner implementation is becoming hard to review and verify. Going forward it should teach us to do the right thing: use transactions or write-ahead logging. I am suspecting that every bug fix iteration on this implementation is comparable in time to re-implementing the whole thing using transactions. In any event, this looks better than the previous version!
Using transactions for this is the better fix. However, this means that the pinner must be given a transaction-capable datastore, and the pinner will need to be able to get the transaction interface from the datastore -- which is not difficult to solve. However, if there are pinners currently using datastores that do not support transactions, then this will require a migration to a new datastore, and that may require a larger scoped change. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
@aschmahmann should we deploy in the clusters before merging to master?
Running some live tests here seems reasonable to me if we're pretty confident that things should work and we're close to merge (i.e. we don't want to mess up the cluster nodes if we can avoid it). I would only deploy to one at a time in case there are issues. As for transactions. I wouldn't worry too much about little using non-transactional datastores, although we can take care of this in a subsequent PR. The main datastores that I know of that are in use here are LevelDB and Badger both of which should support transactions. Maybe there's also a map datastore, but we could either make one that supports transactions or just use the in memory levelDB datastore. |
dspinner/pin.go
Outdated
@@ -981,15 +981,15 @@ func (p *pinner) rebuildIndexes(ctx context.Context) error { | |||
indexer = p.cidRIndex | |||
// Delete any direct index | |||
err = p.cidDIndex.Delete(ctx, indexKey, pp.Id) | |||
log.Infof("deleting stale pin index for cid %v", pp.Cid.String()) | |||
log.Errorf("deleting stale pin index for cid %v", pp.Cid.String()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This appears to log the error message whether or not a stale pin was actually found. Perhaps Delete
needs to return a boolean, or there should be a find for the pin first.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's not log deletions for now. I'll update the PR.
This PR builds on, and is a replacement for PR #13.
This PR addresses two issues with the pinner:
These solutions to these problems:
Additional tests have been added to test the correctness of indexes and the rebuild process.