Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

More intelligent blockstore garbage collection #3092

Open
jakobvarmose opened this issue Aug 17, 2016 · 15 comments
Open

More intelligent blockstore garbage collection #3092

jakobvarmose opened this issue Aug 17, 2016 · 15 comments
Labels
need/community-input Needs input from the wider community status/deferred Conscious decision to pause or backlog

Comments

@jakobvarmose
Copy link

Type: Feature
Area: Blockstore, Pin

Description:

The current garbage collector deletes all un-pinned blocks. This makes the system more fragile, as people don't usually pin a lot of objects. And if I have configured my node to keep up to 10GB of data I would also expect it to keep close to that limit at all times.

One solution is to delete blocks at random until the disk usage falls below the threshold limit. But blocks could also be deleted based on supply and demand, or when they were last accessed.

The current garbage collector uses a mark-and-sweep algorithm, but another option would be to use reference-counting, and I think that is better.

@JesseWeinstein
Copy link
Contributor

It might also be good to have both a min and max threshold, and have a garbage collection that only removed things until the min threshold was reached. It could be automatically triggered at the max threashold, as is done now.

@whyrusleeping
Copy link
Member

@jakobvarmose how would your reference counting idea work? would it count existing references to blocks? or would it simply increment some counter each time a block is referenced by another object?

I agree 100% that the current GC implementation is a bit... blunt. The issue is that the marking phase is relatively expensive, so doing smaller runs is not cheap. There has been some previous discussion towards this topic here: ipfs/notes#130

One related topic i was talking about earlier (for ipfs block rm) was keeping a bloom filter of pinned objects stored on disk to expedite pin checking. This could also help out here for a partial GC.

@jakobvarmose
Copy link
Author

jakobvarmose commented Aug 17, 2016

@jakobvarmose how would your reference counting idea work? would it count existing references to blocks? or would it simply increment some counter each time a block is referenced by another object?

I would add two new variables associated with each block:

directPin bool
refCounter uint

When an object is pinned directly just set b.directPin = true. When recursively pinning an object increment b.refCounter on that block and on each of its descendants.

To check if a block is pinned simply do b.directPin || b.refCounter > 0.

@whyrusleeping
Copy link
Member

@jakobvarmose yeah, we used to do that. It was really slow, and expensive. We have thousands to millions (or more) of blocks. Blocks may also be very small, and the overhead of that reference information would be significant. Every pin operation would need to iterate over every child block and update every entry for that child block.

@jakobvarmose
Copy link
Author

jakobvarmose commented Aug 17, 2016

@whyrusleeping Oh, I didn't know. Yeah, it would use quite a bit of memory. But I can't see how it would be much slower, as the current implementation also reads each child block from disk, and this is probably the slowest part. Unpinning will of course be slower, but pinning should be about the same speed. Or what am I missing?

@Kubuxu
Copy link
Member

Kubuxu commented Aug 17, 2016

@jakobvarmose no, currently when you pin and unpin we only update list of root hashes, reference counting would require to iterate whole hash tree and update the reference count.

@kevina
Copy link
Contributor

kevina commented Aug 18, 2016

@whyrusleeping I was wondering about why we won't use reference counting for indirect pins. How is that worse than what we have to do now. How is it worse to have to iterate over every child of a single recursive pin to update it worse than having to always iterate over every child of every recursive pin just to check if a block is pinned. The checking operation would seam to me to be more frequent than the updating operation.

Have we considered storing the indirect pinned blocks individually in the datastore. That is for example for each indirectly pinned block have an entry under the name "/local/pins/indirect/". This will have disk space overhead, but since it is no longer in memory will scale well. The underlying leveldb is designed to be fast so at this point I am having a hard time understating that a mass update could be really slow.

The cost of checking pins will become more of an issue once "block rm" lands (#2962) and also in my filestore code (#2634).

A bloom filter will help in the case that a block is not pinned, but to make sure it is we will still have to iterate over every child of every single recursive pin.

@whyrusleeping
Copy link
Member

@kevina we've been down this road before. The disk overhead is significant, the cost to pin large objects becomes relatively obscene. If you want to dig into it more, go check out the old code and PRs

@kevina
Copy link
Contributor

kevina commented Aug 18, 2016

Some references for why we no longer store information on indirectly pinned blocks in the datastore: #1192 #1225 #1381 #1420

@jakobvarmose
Copy link
Author

jakobvarmose commented Aug 18, 2016

@Kubuxu When unpinning you are right that the current implementation runs very quickly. But when pinning all child blocks are read from the datastore (see https://github.com/ipfs/go-ipfs/blob/master/pin/pin.go#L140-L144).

@jakobvarmose
Copy link
Author

Another related idea: To make GC faster instead of simply marking, count the number of references (you could call it count-and-sweep), and store this result to disk. Additionally store incremental updates of recursive pins. The next time the garbage collector is run it will read the results from last time, and update incrementally.

@jbenet
Copy link
Member

jbenet commented Aug 18, 2016

Maybe we should review proper gc algorithms and find something that matches
our perf needs (re algorithmic complexity)
On Wed, Aug 17, 2016 at 22:43 Jakob Varmose Bentzen <
[email protected]> wrote:

Another related idea: To make GC faster instead of simply marking, count
the number of references (you could call it count-and-sweep), and store
this result to disk. Additionally store incremental updates of recursive
pins. The next time the garbage collector is run it will read the results
from last time, and update incrementally.


You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
#3092 (comment), or mute
the thread
https://github.com/notifications/unsubscribe-auth/AAIcoRj8rOz4hFyEKj8BwSKXFXGndzWJks5qg8bjgaJpZM4JmA_C
.

@whyrusleeping whyrusleeping added the need/community-input Needs input from the wider community label Aug 18, 2016
@whyrusleeping
Copy link
Member

reference: #2030

@kevina
Copy link
Contributor

kevina commented Aug 20, 2016

A rather longest IRC discussion: https://botbot.me/freenode/ipfs/msg/71626736/ (Starting around 9 pm PDT on Aug 19). No real consensus but lots of ideas and background info.

@whyrusleeping whyrusleeping added the status/deferred Conscious decision to pause or backlog label Sep 14, 2016
@whyrusleeping
Copy link
Member

ref: #4149

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
need/community-input Needs input from the wider community status/deferred Conscious decision to pause or backlog
Projects
None yet
Development

No branches or pull requests

6 participants