Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GC: Data may remain after garbage collecting everything #54

Open
schomatis opened this issue Mar 16, 2018 · 49 comments
Open

GC: Data may remain after garbage collecting everything #54

schomatis opened this issue Mar 16, 2018 · 49 comments

Comments

@schomatis
Copy link
Contributor

Regarding a future transition to Badger, its garbage collection process bails out in the case there is only one value log file, and by default the size of a value log file is set to 1GB. This translates to the scenario where a user adds data and then "deletes" it (e.g., unpinning and garbage collecting it) and it will still have up to 1GB of disk taken with these values marked as deleted but not reclaimed by Badger's GC.

This is not an error, and for big players 1GB may be negligible, but it will raise a normal end user's attention (specially because now everything is deleted and the disk space gets freed), and should be taken in consideration (maybe in the form of clear documentation clarifying that, or catching the ErrNoRewrite error and printing a warning to the user when appropriate).

@schomatis
Copy link
Contributor Author

As a simple PoC,

ipfs init --profile=badgerds
du -hc  ~/.ipfs
# 84K	total
ipfs add -r --pin=false --silent ~/go/src/github.com/ipfs/go-ipfs
ipfs repo stat
# NumObjects: 5563
# RepoSize:   140915008
du -hc  ~/.ipfs
# 135M	total
ipfs repo gc -q
ipfs repo stat
# NumObjects: 13
# RepoSize:   143023447
du -hc  ~/.ipfs
# 137M	total

(compare with running the same commands without --profile=badgerds.)

@schomatis
Copy link
Contributor Author

Opened issue dgraph-io/badger#442 to track the minimum value log file size allowed as a possible workaround for the user who wishes to minimize the used disk space after garbage collection.

@leerspace
Copy link

I might be mistaken, but I don't think that badger's GC is triggered yet during IPFS' GC: #4300. Or, at least it doesn't seem to be yet on 0.4.14-rc2.

@schomatis
Copy link
Contributor Author

@leerspace Thanks for pointing me to that PR which gave me more context on the matter. Badger's GC is now linked to IPFS GC through #4578 where vlogFileSize is set to 1 GB. I'm attaching a more detailed script where it can be seen that the number of value log files are never reduced to zero (GC doesn't act when there is only one of them) which causes the effect described above.

For full GC there is an offline tool being developed in dgraph-io/badger#400, which will likely help with the cases where the online badger GC doesn't help

The offline GC tool that is mentioned in that PR seems to have the same constraint so I don't think it will provide a solution to this issue.

Badger GC PoC script
set -x
set -e

export IPFS_PATH=~/.ipfs_badger_test
ipfs init --profile=badgerds
du -hsc  $IPFS_PATH
# 84K	total
ipfs add -r --pin=false --silent -Q ~/go/src/github.com/ipfs/go-ipfs
ipfs repo stat
# NumObjects: 5560
# RepoSize:   140914167
du -hsc  $IPFS_PATH
# 135M	total
ls -lah $IPFS_PATH/badgerds
# 134M mar 19 15:47 000000.vlog  <- Single log file.

ipfs repo gc -q > /dev/null # Not really that quiet.
ipfs repo stat
# NumObjects: 13
# RepoSize:   143021466
du -hsc  $IPFS_PATH
# 137M	total

# Repeat again but setting the value log size to 100 MB.
rm -rv $IPFS_PATH # WARNING! Check the env.var to not delete the actual IPFS root.
ipfs init --profile=badgerds

# Set the 'vlogFileSize' configuration to 100 MB. (May need to install jq.)
cp $IPFS_PATH/config $IPFS_PATH/config.back
cat $IPFS_PATH/config.back | jq  '.Datastore.Spec.child.vlogFileSize = "100000000"' > $IPFS_PATH/config

ipfs add -r --pin=false --silent -Q ~/go/src/github.com/ipfs/go-ipfs
ipfs repo stat
# NumObjects: 5560
# RepoSize:   140918856

ls -lah $IPFS_PATH/badgerds
# 96M mar 19 15:47 000000.vlog  <- First log file cut short at 100 MB (will be GC'ed).
# 39M mar 19 15:47 000001.vlog  <- Second log file (will be left behind in the GC process).

ipfs repo gc -q > /dev/null # Not really that quiet.
ipfs repo stat
# NumObjects: 13
# RepoSize:   143023447
ls -lah $IPFS_PATH/badgerds
# 40M mar 19 15:47 000001.vlog
du -hsc  $IPFS_PATH
# 41M	total

# Multiple succesive calls to GC won't change this.
ipfs repo gc -q > /dev/null
ipfs repo gc -q > /dev/null
ipfs repo gc -q > /dev/null

ls -lah $IPFS_PATH/badgerds
# 40M mar 19 15:47 000001.vlog
du -hsc  $IPFS_PATH
# 41M	total

@schomatis schomatis changed the title [badger] GC: Data may remain (up to 1GB) after garbage collecting everything GC: Data may remain (up to 1GB) after garbage collecting everything May 3, 2018
@ghost
Copy link

ghost commented May 13, 2018

So does badger do any GC at all, or is it just that it holds on to a bit of diskspace?

@schomatis
Copy link
Contributor Author

@lgierth Yes, it does GC, it's just that it waits for certain conditions before doing it (like having more than one value log file and having those log files filled with deleted keys passed a certain percentage threshold). So if those conditions aren't met when the user calls ipfs repo gc which will in turn request Badger's GC nothing will happen and the user will not perceive any disk space being freed (even though there may be much unpinned content).

This is expected as Badger's architecture is more complex than flatfs which simply stores keys as separate files (where GC is just the process of deleting those files) so it makes sense that Badger's GC waits for the situation "to be worth it" before doing any actual work (which would include scanning files and searching keys) that may obstruct the rest of the DB operations.

I think a simplified version of what I just explained should be clearly documented for users to prepare their disk-freeing expectations to the Badger scenario (which will definitely be different from what they're accustomed to with the current flatfs default datastore).

@ghost
Copy link

ghost commented May 13, 2018

So when will it free diskspace then? I've just had these two new gateways run out of diskspace and it refuses to restart (Error: runtime error: slice bounds out of range).

So right now I have no choice other than rm -rf $IPFS_PATH and switch the new gateways back to flatfs.

It would be good if an explicit ipfs repo gc it would tell badger to nevermind what it thinks is best and just do its cleanup routine.

@ghost
Copy link

ghost commented May 13, 2018

Ah I see: https://github.com/dgraph-io/badger#garbage-collection

This does sound like we can do online garbage collection at a time of our choice.

@schomatis
Copy link
Contributor Author

I've just had these two new gateways run out of diskspace

In that case Badger should already be performing GC when running ipfs repo gc. I'm assuming that there are many more than just one value log file and the percentage of discarded keys is 10% to trigger a GC (that is, if in a value log file more than 10% of it's keys are marked as deleted it will rewrite the value log file to a smaller size without those keys).

@ghost
Copy link

ghost commented May 13, 2018

@Stebalien cleared this up for me on IRC:

badger needs to have a special gc function called to actually gc. The new datastore interfaces expose this but they're still in the process of being piped through.

I've put the gateways back on flatfs for the time being, I'll be on research retreat and then vacation for the next 4 weeks -- otherwise I'd be up for riding the edge :)

@magik6k
Copy link
Member

magik6k commented May 17, 2018

@Stebalien cleared this up for me on IRC:

badger needs to have a special gc function called to actually gc. The new datastore interfaces expose this but they're still in the process of being piped through.

GC is called on badger - see https://github.com/ipfs/go-ipfs/blob/master/pin/gc/gc.go#L115-L125 / https://github.com/ipfs/go-ds-badger/blob/master/datastore.go#L258

@Stebalien
Copy link
Member

Got it. I hadn't realized that had been bubbled up yet.

@schomatis
Copy link
Contributor Author

@lgierth Could you document here more about the server environment in which you had the GC problem that we discussed in IRC please? I think it may be an interesting test case to have under consideration (big database, almost no space left, and to reclaim it it was necessary to write delete entries before actually running Badger's GC).

@leerspace
Copy link

leerspace commented Mar 2, 2019

I'm not seeing any reclaimed free space after garbage collection using badger with v0.4.19, but maybe someone can tell whether this is "expected" behavior.

For example (NumObjects and RepoSize from ipfs repo stat):

$ ipfs init --profile=badgerds
# NumObjects: 28
# RepoSize:   49962
$ ipfs daemon &
$ ipfs get QmSnuWmxptJZdLJpKRarxBMS2Ju2oANVrgbr2xWbie9b2D #stopped just after 3.5 GB
# NumObjects: 16118
# RepoSize:   3798130246
$ ipfs repo gc
# NumObjects: 13
# RepoSize:   3798130246
$ ipfs shutdown
# NumObjects: 93
# RepoSize:   3806508960
$ ipfs repo gc # offline garbage collection still gets me no space reclaimed
# NumObjects: 13
# RepoSize:   3805143019

@Stebalien
Copy link
Member

Yeah, that's a badger bug. Looks like dgraph-io/badger#718.

@Stebalien
Copy link
Member

Yeah, this is a lot more than 1GiB. GC doesn't appear to do much at all.

@Stebalien Stebalien transferred this issue from ipfs/kubo Mar 8, 2019
@BenLubar
Copy link

I ran mkfifo badger.bak and then badger backup and badger restore and I've got a LOT of data that's not being garbage collected:

ben@urist:/storage/ipfs/data-badgerdebug$ du -bsch badgerds badgerds-restore
81G     badgerds
22G     badgerds-restore

ipfs repo gc was run right before the btrfs snapshot was taken.

@magik6k
Copy link
Member

magik6k commented Apr 19, 2019

Related badger issue - dgraph-io/badger#767

@obo20
Copy link

obo20 commented Aug 28, 2019

@magik6k based on the resulting close of the issue you linked, it seems like badger does eventually reclaim the space. Were you able to notice any eventual space reclamation on your end?

@Stebalien
Copy link
Member

Triage: STILL BROKEN! Even after the periodic GC and repeated GC fixes.

@jsign
Copy link

jsign commented Feb 3, 2020

We had a similar experience (with badger v1 or v2). Forcing GC, or doing manual badger flatten, etc doesn't seem to reclaim space on log files.

The only thing that worked was doing a backup and restore (offline). (~900mb -> ~8mb)

@Stebalien
Copy link
Member

That is very annoying. We're now doing repeated GC cycles until badger reports there is nothing to do and that's still not fixing it.

@jsign
Copy link

jsign commented Mar 20, 2020

Sharing this topic which might be interesting to keep track of. My pain-points are related to many vlogs, but the solution might indirectly improve some part of the problem.

@Stebalien
Copy link
Member

That's great to hear! However, it looks like we're going to need to switch to badger 2 to see those improvements. We're currently on badger 1 as we haven't tested badger 2 thoroughly.

@RubenKelevra
Copy link

Another example (with more data):

[ipfs@heimdal-pacman-store ~]$ ipfs pin ls
[ipfs@heimdal-pacman-store ~]$ ipfs files ls
[ipfs@heimdal-pacman-store ~]$ time ipfs repo gc

real    0m16.464s
user    0m0.012s
sys     0m0.035s
[ipfs@heimdal-pacman-store ~]$ time ipfs repo stat --human
NumObjects: 4
RepoSize:   23 GB
StorageMax: 75 GB
RepoPath:   /home/ipfs/.ipfs
Version:    fs-repo@9

real    0m6.191s
user    0m0.000s
sys     0m0.147s

Performance - even on this small machine is great with badgerds, but there seems to be zero bytes cleanup after I deleted data with the current settings.

@CsterKuroi
Copy link

@RubenKelevra The same situation

@RubenKelevra
Copy link

@jsign sounds like we could use those parameters as default and let the user tweak them if they want more performance (or tweak them in the server profiles).
I end up having a repo with 6 GB on a node with no data, which is quite confusing. I doubt that any of this data remaining will be helpful in the future to speedup fetching/storing different content.
Can you do a PR to implement those tweaks?

Yes, that can be an option, but changing defaults might have some unexpected consequences for existing instances since they might spin-up again on new defaults. Being v0.* and its semantic on semver makes that tricky.

How about exposing those settings and make them accessible via the configuration file?

The configuration files won't be updated on existing setups, we could just edit the badgerds profile, to include the changes on new setups.

This way existing setups won't be affected at all.

I'm already quite happy about finding a solution that seems to be reproducible and solves this problem. If we can have some extra confirmations that others can reproduce the solution I presented, maybe it's just giving it enough visibility in the readme or similar.

Can you look into creating the changes? Would really enjoy to test these out in my setup :)

@jarifibrahim
Copy link

Hey guys, we've closed dgraph-io/badger#1228 with fixes in master. We'll do a new badger release soon. It would be very useful if you can test the master branch.

@Stebalien
Copy link
Member

From what I can tell, that's mostly concerned with expiring keys, right?

@jarifibrahim
Copy link

No. It's for deleted and expired keys. There was a bug in compaction because of which we would keep deleted/expired keys. We no longer do that.

@jsign
Copy link

jsign commented Jun 2, 2020

That sounds promising :)

@jarifibrahim
Copy link

We've had issues with deleted keys, move keys in badger for a long time (this issue was created 2 years ago). I'm glad we fixed it :)

Please do try it out. We're always looking for feedback 🌠

@jsign
Copy link

jsign commented Jun 2, 2020

We've had issues with deleted keys, move keys in badger for a long time (this issue was created 2 years ago). I'm glad we fixed it :)

Please do try it out. We're always looking for feedback 🌠

@jarifibrahim , do you have some release estimate? I know master is available, but I'm not sure you have some special tests on releases to have more guarantees about safety.

@jarifibrahim
Copy link

@jarifibrahim , do you have some release estimate? I know master is available, but I'm not sure you have some special tests on releases to have more guarantees about safety.

We wanted to do Badger release last week but some other things came up (we're a small team of engineers working on badger and dgraph). I'll try to get a new patch version for v1.x and v2.x before end of this week (worst case end of next week).

Most of the tests for badger run as part of the CI. We have a bank test https://github.com/dgraph-io/badger/blob/master/badger/cmd/bank.go which checks for transactional issues and runs for (4h+4h) 8 hours at midnight every day.

Which version of badger are you using?

@Stebalien
Copy link
Member

go-ipfs is using v1.6.1 and has been slowly adding support for v2.

@CsterKuroi
Copy link

Any update ?

@Stebalien
Copy link
Member

Not yet. The next step here is to copy the badger plugin (https://github.com/ipfs/go-ipfs/tree/master/plugin/plugins/badgerds) as "badger2ds", and make it use github.com/ipfs/go-ds-badger2. Want to submit a PR to go-ipfs?

@CsterKuroi
Copy link

https://github.com/dgraph-io/badger/releases/tag/v1.6.2

@RubenKelevra
Copy link

Soooo ... how is this looking? Is it fixed? Can it be closed?

@Stebalien
Copy link
Member

Stebalien commented Mar 4, 2021 via email

@RubenKelevra
Copy link

@Stebalien but this PR sounds like this bug should be fixed: dgraph-io/badger#1354

RubenKelevra added a commit to RubenKelevra/ipfs_go-ipfs-config that referenced this issue Mar 24, 2021
@Stebalien
Copy link
Member

That was a v2 fix, not a v1 fix.

@dokterbob
Copy link

dokterbob commented Apr 14, 2021

:'(

Such a pity - we saw huge performance gains on ipfs-search but it is eating up storage an alarming rate. Will have to switch back to flatfs, for now.

$ ipfs repo stat --human && ipfs repo gc -q | wc -l && ipfs repo stat --human
NumObjects: 1102
RepoSize:   904 GB
StorageMax: 10 GB
RepoPath:   /var/lib/ipfs
Version:    fs-repo@11
1143
NumObjects: 111
RepoSize:   904 GB
StorageMax: 10 GB
RepoPath:   /var/lib/ipfs
Version:    fs-repo@11

(After ~ 24 hours of crawling.)

@dokterbob
Copy link

Note: after shutting down ipfs and calling badger flatten, garbage collection does seem to work.

$ go get github.com/dgraph-io/badger/[email protected]
$ badger flatten --dir /var/lib/ipfs/badgerds
$ ipfs repo stat --human && ipfs repo gc -q | wc -l && ipfs repo stat --human
NumObjects: 1127
RepoSize:   320 GB
StorageMax: 10 GB
RepoPath:   /var/lib/ipfs
Version:    fs-repo@11

1118
NumObjects: 2462
RepoSize:   89 GB
StorageMax: 10 GB
RepoPath:   /var/lib/ipfs
Version:    fs-repo@11
$ ipfs repo stat --human && ipfs repo gc -q | wc -l && ipfs repo stat --human
NumObjects: 4280
RepoSize:   961 MB
StorageMax: 10 GB
RepoPath:   /var/lib/ipfs
Version:    fs-repo@11
4271
NumObjects: 10
RepoSize:   961 MB
StorageMax: 10 GB
RepoPath:   /var/lib/ipfs
Version:    fs-repo@11

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.