Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(compaction): Don't drop data when split overlaps with top tables #1587

Merged
merged 10 commits into from
Nov 6, 2020

Conversation

jarifibrahim
Copy link
Contributor

@jarifibrahim jarifibrahim commented Nov 6, 2020

Badger compactions are split into sub compactions if there are more than 3 tables
bottom tables are involved. The current implementation would drop keys if the top
table would overlap with a bottom table. Consider the following scenario

Top table => [C (deleted)]
bot table => [ A ] [ B ] [ C (valid) ] [ D ] 

When this range is compacted, the result was [ A B C(valid) D ]
(only C(deleted) was dropped). The expected result was [A B D] .

This PR fixes the above-mentioned issue. The test in this PR fails on master
but works with this change.

This change is Reviewable

Ibrahim Jarif added 3 commits November 5, 2020 14:09
There was a race condition in the subcompaction code which was caused
because we were doing builder.Close after doing throttle.Done in
subcomapction. This PR fixes that race condition by calling builder.Done
before calling throttle.Done .
Copy link
Contributor

@ahsanbarkati ahsanbarkati left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@jarifibrahim jarifibrahim merged commit 6c07455 into master Nov 6, 2020
@jarifibrahim jarifibrahim deleted the ibrahim/compaction-split branch November 6, 2020 14:07
// This will generate splits like [A1 ... C2] . Notice that we
// dropped the C3 which is the last key of the top table.
// See TestCompaction/with_split test.
right := y.KeyWithTs(y.ParseKey(t.Biggest()), math.MaxUint64)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

getKeyRange takes care of these things. I thought I was using it -- that's the reason for the bug.

In fact, don't we want to consider the right most key? In that case, the right should be using 0 as the version.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We're generating key ranges using the bot tables so they don't have the correct key version.

In fact, don't we want to consider the right most key? In that case, the right should be using 0 as the version.

No. We skip the right keys of the key range.
https://github.com/dgraph-io/badger/blob/6c074551f8f5a5b470357236cd90e860d0deb517/levels.go#L666-L668

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

3 participants