-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support compression in Badger #1013
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
✅ A review job has been created and sent to the PullRequest network.
@jarifibrahim you can click here to see the review status or cancel the code review job.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let me know when this is ready for review.
Reviewable status: 0 of 15 files reviewed, 1 unresolved discussion (waiting on @jarifibrahim)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Due to inactivity, PullRequest has cancelled this review job. You can reactivate the code review job from the PullRequest dashboard
b242229
to
a554b7e
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 0 of 18 files reviewed, 3 unresolved discussions (waiting on @ashish-goswami and @jarifibrahim)
appveyor.yml, line 20 at r6 (raw file):
# scripts that run after cloning repository install: - set PATH=%GOPATH%\bin;c:\go\bin;c:\msys64\mingw64\bin;%PATH%
The zstd library requires GCC and this change enables GCC on windows build.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like PR is not complete yet. Will review again.
Reviewable status: 0 of 18 files reviewed, 8 unresolved discussions (waiting on @ashish-goswami and @jarifibrahim)
appveyor.yml, line 20 at r6 (raw file):
Previously, jarifibrahim (Ibrahim Jarif) wrote…
The zstd library requires GCC and this change enables GCC on windows build.
May be add this comment to file itself.
manifest.go, line 31 at r6 (raw file):
"github.com/dgraph-io/badger/options"
merge with existing imports.
table/builder.go, line 28 at r6 (raw file):
"github.com/golang/snappy" "github.com/pkg/errors"
merge all external import paths.
table/builder.go, line 158 at r6 (raw file):
b.writeChecksum(blockBuf) // Compress the block
nit: period
table/builder.go, line 165 at r6 (raw file):
blockBuf, err = b.compressData(b.buf.Bytes()[b.baseOffset:]) y.Check(err) // Truncate already written data
nit: period.
table/builder.go, line 167 at r6 (raw file):
// Truncate already written data b.buf.Truncate(int(b.baseOffset)) // Write compressed data
nit: period.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Dismissed @ashish-goswami and @golangcibot from 4 discussions.
Reviewable status: 0 of 18 files reviewed, all discussions resolved
manifest.go, line 435 at r4 (raw file):
Previously, golangcibot (Bot from GolangCI) wrote…
line is 104 characters (from
lll
)
Done.
manifest.go, line 31 at r6 (raw file):
Previously, ashish-goswami (Ashish Goswami) wrote…
merge with existing imports.
Done.
table/builder.go, line 305 at r1 (raw file):
Previously, golangcibot (Bot from GolangCI) wrote…
printf: Errorf format %s has arg ctype of wrong type github.com/dgraph-io/badger/options.CompressionType (from
govet
)
Done.
table/builder.go, line 28 at r6 (raw file):
Previously, ashish-goswami (Ashish Goswami) wrote…
merge all external import paths.
Done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add compression libraries in go.mod and go.sum files.
Reviewable status: 0 of 18 files reviewed, 9 unresolved discussions (waiting on @jarifibrahim)
options.go, line 117 at r7 (raw file):
CompactL0OnClose: true, KeepL0InMemory: true, Compression: options.ZSTDCompression,
We have ZSTDCompression as default. Will it work for windows by default? If not, we can mention it somewhere.
options.go, line 135 at r7 (raw file):
} // BuildTableOptions ...
Please complete the comment.
options/options.go, line 50 at r7 (raw file):
type CompressionType uint32 // The following constants should always be kept in sync with CompressionType protobuf
nit: period.
options/options.go, line 54 at r7 (raw file):
// NoCompression mode indicates that a block is not compressed. NoCompression CompressionType = 0 // SnappyCompression mode indicates that a block is not compressed.
correct comment.
options/options.go, line 56 at r7 (raw file):
// SnappyCompression mode indicates that a block is not compressed. SnappyCompression CompressionType = 1 // ZSTDCompression mode indicates that a block is not compressed.
correct comment.
table/builder.go, line 342 at r7 (raw file):
} // compressData compresses the given data
nit: period.
table/builder.go, line 352 at r7 (raw file):
return zstd.Compress(nil, data) } return nil, errors.New("Unsupported compression type")
We can define this err at the top. can be used while checking errors.
table/table.go, line 65 at r7 (raw file):
DataKey *pb.DataKey // Compression ...
Please complete the comment.
table/table.go, line 99 at r7 (raw file):
} // CompressionType ..
complete the comment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 1 of 2 files at r3, 2 of 14 files at r4, 9 of 11 files at r5, 1 of 3 files at r6, 2 of 4 files at r7, 5 of 5 files at r8.
Reviewable status: all files reviewed, 16 unresolved discussions (waiting on @jarifibrahim)
db.go, line 914 at r8 (raw file):
return y.Wrapf(err, "failed to get datakey in db.handleFlushTask") } bopts := BuildTableOptions(db.opt)
Does this need to be public?
levels.go, line 652 at r8 (raw file):
for _, table := range newTables { changes = append(changes, newCreateChange(table.ID(), cd.nextLevel.level, table.KeyID(), table.CompressionType()))
Just send table in newCreateChange, instead of sending 4 different arguments.
options/options.go, line 52 at r8 (raw file):
const ( // NoCompression mode indicates that a block is not compressed. NoCompression CompressionType = 0
Go towards smaller names. Makes the code simpler to read. You already are calling it a "CompressionType", then the actual names can be smaller.
None
options/options.go, line 54 at r8 (raw file):
NoCompression CompressionType = 0 // SnappyCompression mode indicates that a block is compressed using Snappy algorithm. SnappyCompression CompressionType = 1
Snappy
options/options.go, line 56 at r8 (raw file):
SnappyCompression CompressionType = 1 // ZSTDCompression mode indicates that a block is compressed using ZSTD algorithm. ZSTDCompression CompressionType = 2
ZSTD
table/table.go, line 368 at r8 (raw file):
} if t.opt.Compression != options.NoCompression {
This if isn't required? Because t.decompressData already does this check.
table/table_test.go, line 855 at r8 (raw file):
func getTableForBenchmarks(b *testing.B, count int) *Table { rand.Seed(time.Now().Unix()) opts := Options{Compression: options.ZSTDCompression, BlockSize: 4 * 1024, BloomFalsePositive: 0.01}
100 chars.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: all files reviewed, 14 unresolved discussions (waiting on @ashish-goswami and @manishrjain)
db.go, line 914 at r8 (raw file):
Previously, manishrjain (Manish R Jain) wrote…
Does this need to be public?
No. I've changed it now.
levels.go, line 652 at r8 (raw file):
Previously, manishrjain (Manish R Jain) wrote…
Just send table in newCreateChange, instead of sending 4 different arguments.
newCreateChange
is called by table manifest and table also. I can't change newCreateChange without refactoring a lot of code.
options.go, line 117 at r7 (raw file):
Previously, ashish-goswami (Ashish Goswami) wrote…
We have ZSTDCompression as default. Will it work for windows by default? If not, we can mention it somewhere.
ZSTD works on windows. The windows build seems to be working fine.
options.go, line 135 at r7 (raw file):
Previously, ashish-goswami (Ashish Goswami) wrote…
Please complete the comment.
Done.
options/options.go, line 50 at r7 (raw file):
Previously, ashish-goswami (Ashish Goswami) wrote…
nit: period.
The comment was outdate. Removed it.
options/options.go, line 54 at r7 (raw file):
Previously, ashish-goswami (Ashish Goswami) wrote…
correct comment.
Done.
options/options.go, line 56 at r7 (raw file):
Previously, ashish-goswami (Ashish Goswami) wrote…
correct comment.
Done.
options/options.go, line 52 at r8 (raw file):
Previously, manishrjain (Manish R Jain) wrote…
Go towards smaller names. Makes the code simpler to read. You already are calling it a "CompressionType", then the actual names can be smaller.
None
Done.
options/options.go, line 54 at r8 (raw file):
Previously, manishrjain (Manish R Jain) wrote…
Snappy
Done.
options/options.go, line 56 at r8 (raw file):
Previously, manishrjain (Manish R Jain) wrote…
ZSTD
Done.
table/builder.go, line 352 at r7 (raw file):
Previously, ashish-goswami (Ashish Goswami) wrote…
We can define this err at the top. can be used while checking errors.
I don't think it will be reused anywhere. I'll prefer to keep it as it is.
table/table.go, line 65 at r7 (raw file):
Previously, ashish-goswami (Ashish Goswami) wrote…
Please complete the comment.
Done.
table/table.go, line 99 at r7 (raw file):
Previously, ashish-goswami (Ashish Goswami) wrote…
complete the comment.
Done.
table/table.go, line 368 at r8 (raw file):
Previously, manishrjain (Manish R Jain) wrote…
This if isn't required? Because t.decompressData already does this check.
Done.
table/table_test.go, line 855 at r8 (raw file):
Previously, manishrjain (Manish R Jain) wrote…
100 chars.
Done.
The Travis build has been failing because coveralls is returning 500 error code. The tests are working fine. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 9 of 20 files reviewed, 17 unresolved discussions (waiting on @ashish-goswami and @manishrjain)
table/table.go, line 261 at r10 (raw file):
if !it2.Valid() { return errors.Wrapf(it2.err, "failed to initialize biggest for table %s", t.Filename()) }
This was changed so that we can return an appropriate error when incorrect checksum algorithm was used.
table/table_test.go, line 592 at r10 (raw file):
{"k2", "a2"}, }, opts) f2 := buildTable(t, [][]string{{"l1", "b1"}}, opts)
This was added because the changes in table.go requires a table to be non-empty.
table/table_test.go, line 636 at r10 (raw file):
func TestMergingIteratorTakeTwo(t *testing.T) { opts := getTestTableOptions() f1 := buildTable(t, [][]string{{"l1", "b1"}}, opts)
This was added because the changes in table.go requires a table to be non-empty.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 9 of 20 files reviewed, 10 unresolved discussions (waiting on @manishrjain)
Rename the builder method WithCompressionType added in #1013 to WithCompression to be consistent with the option struct field named Compression. Update the godoc for WithCompression with the default compression algorithm used.
This commit adds support for compression in badger. Two compression algorithms - Snappy and ZSTD are supported for now. The compression algorithm information is stored in the manifest file. We compress blocks (typically of size 4KB) stored in the SST. The compression algorithm can be specified via the CompressionType option to badger.
* Support compression in Badger (hypermodeinc#1013) This commit adds support for compression in badger. Two compression algorithms - Snappy and ZSTD are supported for now. The compression algorithm information is stored in the manifest file. We compress blocks (typically of size 4KB) stored in the SST. The compression algorithm can be specified via the CompressionType option to badger. * fix move down and address comments Co-authored-by: Ibrahim Jarif <[email protected]>
This PR adds support for compression of SST files.
Todo:
Mechanism to migrate uncompressed data to compressed dataThis is no longer needed since the compression information is stored in a protobuf in manifest file and protobufs are backward compatible.Fixes #792
This change is