Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unbounded disk growth when lots of additions and deletions of keys #1158

Closed
connorgorman opened this issue Dec 12, 2019 · 3 comments
Closed
Labels
kind/question Something requiring a response

Comments

@connorgorman
Copy link
Contributor

connorgorman commented Dec 12, 2019

What version of Go are you using (go version)?

$ go version
1.13.3

What version of Badger are you using?

1.6 with backports

Does this issue reproduce with the latest master?

AFAIK, no change has fixed this

What are the hardware specifications of the machine (RAM, OS, Disk)?

Not applicable for this question

What did you do?

Wrote millions of keys and values and then deleted all of them

What did you expect to see?

I would expect there to be some way to remove tombstoned keys that are no longer referenced (especially when the MaxVersions =1 ). I know that this is challenging online due to the way that compaction works, but is there a way to delete the keys if I Flatten the LSM tree offline? I thought that Backup/Restore may fix the issue, but that was unsuccessful as well

What did you see instead?

Unbounded disk and SST growth based on the number of unique keys written

@connorgorman connorgorman changed the title Disk growth when Unbounded disk growth when lots of additions and deletions of keys Dec 12, 2019
@connorgorman
Copy link
Contributor Author

connorgorman commented Dec 13, 2019

@jarifibrahim @manishrjain Sanity check, if I have MaxVersions=1, Flatten the LSM into one level and then stream out the keys. I can safely ignore the ones that are tombstoned because there will only be one version of the key in the LSM correct?

EDIT: Okay, no I think the above logic is incorrect. The Flatten will move all the versions into the same level, but not deduplicate them. AFAIK, the only way to remove the keys is a logical DB rewrite

@jarifibrahim
Copy link
Contributor

@jarifibrahim @manishrjain Sanity check, if I have MaxVersions=1, Flatten the LSM into one level and then stream out the keys. I can safely ignore the ones that are tombstoned because there will only be one version of the key in the LSM correct?

@connorgorman How about using the stream framework to stream out the keys. You can ignore all versions after the first one. That should give you unique versions. You can set the key to list function in stream framework to just use the first version it receives.

badger/stream.go

Lines 58 to 67 in 1b0c074

// KeyToList, similar to ChooseKey, is only invoked on the highest version of the value. It
// is upto the caller to iterate over the versions and generate zero, one or more KVs. It
// is expected that the user would advance the iterator to go through the versions of the
// values. However, the user MUST immediately return from this function on the first encounter
// with a mismatching key. See example usage in ToList function. Can be left nil to use ToList
// function by default.
//
// Note: Calls to KeyToList are concurrent.
KeyToList func(key []byte, itr *Iterator) (*pb.KVList, error)

I would expect there to be some way to remove tombstoned keys that are no longer referenced (especially when the MaxVersions =1 ). I know that this is challenging online due to the way that compaction works, but is there a way to delete the keys if I Flatten the LSM tree offline?

Flatten wouldn't remove the old keys because it only brings all the tables to the same level. Compaction, however, will remove those keys eventually.

badger/levels.go

Lines 559 to 584 in 1b0c074

if version <= discardTs && vs.Meta&bitMergeEntry == 0 {
// Keep track of the number of versions encountered for this key. Only consider the
// versions which are below the minReadTs, otherwise, we might end up discarding the
// only valid version for a running transaction.
numVersions++
lastValidVersion := vs.Meta&bitDiscardEarlierVersions > 0
if isDeletedOrExpired(vs.Meta, vs.ExpiresAt) ||
numVersions > s.kv.opt.NumVersionsToKeep ||
lastValidVersion {
// If this version of the key is deleted or expired, skip all the rest of the
// versions. Ensure that we're only removing versions below readTs.
skipKey = y.SafeCopy(skipKey, it.Key())
if lastValidVersion {
// Add this key. We have set skipKey, so the following key versions
// would be skipped.
} else if hasOverlap {
// If this key range has overlap with lower levels, then keep the deletion
// marker with the latest version, discarding the rest. We have set skipKey,
// so the following key versions would be skipped.
} else {
// If no overlap, we can skip all the versions, by continuing here.
numSkips++
updateStats(vs)
continue // Skip adding this key.
}

@jarifibrahim jarifibrahim added the kind/question Something requiring a response label Dec 19, 2019
@jarifibrahim
Copy link
Contributor

Hey @connorgorman, we've made some changes in #1166 which would drop invalid keys sooner now. Please try it out.

I'm closing this issue because I don't think there's anything to be done here. Feel free to reopen.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/question Something requiring a response
Development

No branches or pull requests

2 participants