-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Flatten
leaves behind 2 or more versions
#1156
Comments
Here is a test to reproduce this issue. https://gist.github.com/hpucha/b080bbbe3fefacaf8852596cc19753cb |
Hey @hpucha , Lines 498 to 501 in 1b0c074
So if you have 4 versions (t4, t3, t2, t1) and discard timestamp is Lines 559 to 567 in 1b0c074
|
Thank you so much for taking a look @jarifibrahim That does make sense wrt the discard timestamp. However, when I logged what was happening, I observed that in a large majority of cases it was below the discard timestamp but was still being retained because of the overlap check. This is the change I am running locally with So in the above example, assuming a |
@hpucha I understand what you're trying to say but even if there is an Here's a test to prove it. Since the number of versions is set to
|
This should be fixed by #1166 |
This has been fixed via #1166 |
What version of Go are you using (
go version
)?What version of Badger are you using?
release/1.6
What did you do?
We are experiencing an issue with badger GC (release 1.6) not being aggressive enough, and was hoping to get some help.
We start with an ingest phase of loading ~17GB of data (unique keys) and then receive updates for about 50% of the keys over time (same set of keys are updated mostly, every few hours). The total size of the data despite the updates stays around the same (17G).
When we run such a workload for several hours, we see that the size of vlog files steadily keeps growing. We see periods of value log gc but it doesn't recover enough space to keep the data with in some bounded percentage of 17G. We experimented with different gc thresholds, but that was not helpful. In fact a very low threshold of 0.01 done very frequently resulted in #1031.
Tracking down the problem led me to investigate LSM compaction since I do see that we keep multiple versions of keys there without discarding and hence I believe that vlog GC correctly claims that there is nothing to GC.
One issue was #767 related to
WriteBatch
. However, even when I callFlatten
I see that we have at least two versions of keys when the tables have overlap (In our case, num versions to keep is 1 and both with L0/L1 and L0/L1/L2 configs I see this behavior). I am able to repro this with a synthetic workload.My guess is that the overlap check over here
https://github.com/dgraph-io/badger/blob/master/levels.go#L575
is only applicable to the
Deleted/Expired
case and should not be applicable for thenumVersions
case (similar to thelastValidVersion
check).Fix as in here https://gist.github.com/hpucha/a5168affa99248fd69f69b0c5f723c72/revisions?diff=unified seems to work, but I am not confident since I am not familiar with all the possibilities. It would be great to get feedback if this is plausible or if there are some correctness issues with this. Or other pointers on how to debug and why we are seeing 2 versions even after
Flatten
would also be very helpful.What did you expect to see?
Flatten
leaves behind one version per key.What did you see instead?
Flatten
leaves behind multiple versions (2 or more)The text was updated successfully, but these errors were encountered: