Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support compression in Badger #1013

Merged
merged 24 commits into from
Sep 30, 2019
Merged

Support compression in Badger #1013

merged 24 commits into from
Sep 30, 2019

Conversation

jarifibrahim
Copy link
Contributor

@jarifibrahim jarifibrahim commented Aug 27, 2019

This PR adds support for compression of SST files.

Todo:

  • Support for changing compression algorithm.
  • Store compression information in the manifest file.
  • Mechanism to migrate uncompressed data to compressed data This is no longer needed since the compression information is stored in a protobuf in manifest file and protobufs are backward compatible.

Fixes #792


This change is Reviewable

Copy link

@pullrequest pullrequest bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ A review job has been created and sent to the PullRequest network.


@jarifibrahim you can click here to see the review status or cancel the code review job.

Ibrahim Jarif and others added 2 commits August 29, 2019 19:03
Copy link
Contributor

@manishrjain manishrjain left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let me know when this is ready for review.

Reviewable status: 0 of 15 files reviewed, 1 unresolved discussion (waiting on @jarifibrahim)

Copy link

@pullrequest pullrequest bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Due to inactivity, PullRequest has cancelled this review job. You can reactivate the code review job from the PullRequest dashboard

@coveralls
Copy link

coveralls commented Sep 25, 2019

Coverage Status

Coverage decreased (-0.4%) to 77.487% when pulling 8fa09d4 on ibrahim/compression into e3b5652 on master.

@jarifibrahim jarifibrahim marked this pull request as ready for review September 26, 2019 09:18
@jarifibrahim jarifibrahim requested a review from a team September 26, 2019 09:18
Copy link
Contributor Author

@jarifibrahim jarifibrahim left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 0 of 18 files reviewed, 3 unresolved discussions (waiting on @ashish-goswami and @jarifibrahim)


appveyor.yml, line 20 at r6 (raw file):

# scripts that run after cloning repository
install:
  - set PATH=%GOPATH%\bin;c:\go\bin;c:\msys64\mingw64\bin;%PATH%

The zstd library requires GCC and this change enables GCC on windows build.

Copy link
Contributor

@ashish-goswami ashish-goswami left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like PR is not complete yet. Will review again.

Reviewable status: 0 of 18 files reviewed, 8 unresolved discussions (waiting on @ashish-goswami and @jarifibrahim)


appveyor.yml, line 20 at r6 (raw file):

Previously, jarifibrahim (Ibrahim Jarif) wrote…

The zstd library requires GCC and this change enables GCC on windows build.

May be add this comment to file itself.


manifest.go, line 31 at r6 (raw file):

	"github.com/dgraph-io/badger/options"

merge with existing imports.


table/builder.go, line 28 at r6 (raw file):

	"github.com/golang/snappy"
	"github.com/pkg/errors"

merge all external import paths.


table/builder.go, line 158 at r6 (raw file):

	b.writeChecksum(blockBuf)

	// Compress the block

nit: period


table/builder.go, line 165 at r6 (raw file):

		blockBuf, err = b.compressData(b.buf.Bytes()[b.baseOffset:])
		y.Check(err)
		// Truncate already written data

nit: period.


table/builder.go, line 167 at r6 (raw file):

		// Truncate already written data
		b.buf.Truncate(int(b.baseOffset))
		// Write compressed data

nit: period.

Copy link
Contributor Author

@jarifibrahim jarifibrahim left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Dismissed @ashish-goswami and @golangcibot from 4 discussions.
Reviewable status: 0 of 18 files reviewed, all discussions resolved


manifest.go, line 435 at r4 (raw file):

Previously, golangcibot (Bot from GolangCI) wrote…

line is 104 characters (from lll)

Done.


manifest.go, line 31 at r6 (raw file):

Previously, ashish-goswami (Ashish Goswami) wrote…

merge with existing imports.

Done.


table/builder.go, line 305 at r1 (raw file):

Previously, golangcibot (Bot from GolangCI) wrote…

printf: Errorf format %s has arg ctype of wrong type github.com/dgraph-io/badger/options.CompressionType (from govet)

Done.


table/builder.go, line 28 at r6 (raw file):

Previously, ashish-goswami (Ashish Goswami) wrote…

merge all external import paths.

Done.

Copy link
Contributor

@ashish-goswami ashish-goswami left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add compression libraries in go.mod and go.sum files.

Reviewable status: 0 of 18 files reviewed, 9 unresolved discussions (waiting on @jarifibrahim)


options.go, line 117 at r7 (raw file):

		CompactL0OnClose:        true,
		KeepL0InMemory:          true,
		Compression:             options.ZSTDCompression,

We have ZSTDCompression as default. Will it work for windows by default? If not, we can mention it somewhere.


options.go, line 135 at r7 (raw file):

}

// BuildTableOptions ...

Please complete the comment.


options/options.go, line 50 at r7 (raw file):

type CompressionType uint32

// The following constants should always be kept in sync with CompressionType protobuf

nit: period.


options/options.go, line 54 at r7 (raw file):

	// NoCompression mode indicates that a block is not compressed.
	NoCompression CompressionType = 0
	// SnappyCompression mode indicates that a block is not compressed.

correct comment.


options/options.go, line 56 at r7 (raw file):

	// SnappyCompression mode indicates that a block is not compressed.
	SnappyCompression CompressionType = 1
	// ZSTDCompression mode indicates that a block is not compressed.

correct comment.


table/builder.go, line 342 at r7 (raw file):

}

// compressData compresses the given data

nit: period.


table/builder.go, line 352 at r7 (raw file):

		return zstd.Compress(nil, data)
	}
	return nil, errors.New("Unsupported compression type")

We can define this err at the top. can be used while checking errors.


table/table.go, line 65 at r7 (raw file):

	DataKey *pb.DataKey

	// Compression ...

Please complete the comment.


table/table.go, line 99 at r7 (raw file):

}

// CompressionType ..

complete the comment.

Copy link
Contributor

@manishrjain manishrjain left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm: Got a few comments.

Reviewed 1 of 2 files at r3, 2 of 14 files at r4, 9 of 11 files at r5, 1 of 3 files at r6, 2 of 4 files at r7, 5 of 5 files at r8.
Reviewable status: all files reviewed, 16 unresolved discussions (waiting on @jarifibrahim)


db.go, line 914 at r8 (raw file):

		return y.Wrapf(err, "failed to get datakey in db.handleFlushTask")
	}
	bopts := BuildTableOptions(db.opt)

Does this need to be public?


levels.go, line 652 at r8 (raw file):

	for _, table := range newTables {
		changes = append(changes,
			newCreateChange(table.ID(), cd.nextLevel.level, table.KeyID(), table.CompressionType()))

Just send table in newCreateChange, instead of sending 4 different arguments.


options/options.go, line 52 at r8 (raw file):

const (
	// NoCompression mode indicates that a block is not compressed.
	NoCompression CompressionType = 0

Go towards smaller names. Makes the code simpler to read. You already are calling it a "CompressionType", then the actual names can be smaller.

None


options/options.go, line 54 at r8 (raw file):

	NoCompression CompressionType = 0
	// SnappyCompression mode indicates that a block is compressed using Snappy algorithm.
	SnappyCompression CompressionType = 1

Snappy


options/options.go, line 56 at r8 (raw file):

	SnappyCompression CompressionType = 1
	// ZSTDCompression mode indicates that a block is compressed using ZSTD algorithm.
	ZSTDCompression CompressionType = 2

ZSTD


table/table.go, line 368 at r8 (raw file):

	}

	if t.opt.Compression != options.NoCompression {

This if isn't required? Because t.decompressData already does this check.


table/table_test.go, line 855 at r8 (raw file):

func getTableForBenchmarks(b *testing.B, count int) *Table {
	rand.Seed(time.Now().Unix())
	opts := Options{Compression: options.ZSTDCompression, BlockSize: 4 * 1024, BloomFalsePositive: 0.01}

100 chars.

Copy link
Contributor Author

@jarifibrahim jarifibrahim left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: all files reviewed, 14 unresolved discussions (waiting on @ashish-goswami and @manishrjain)


db.go, line 914 at r8 (raw file):

Previously, manishrjain (Manish R Jain) wrote…

Does this need to be public?

No. I've changed it now.


levels.go, line 652 at r8 (raw file):

Previously, manishrjain (Manish R Jain) wrote…

Just send table in newCreateChange, instead of sending 4 different arguments.

newCreateChange is called by table manifest and table also. I can't change newCreateChange without refactoring a lot of code.


options.go, line 117 at r7 (raw file):

Previously, ashish-goswami (Ashish Goswami) wrote…

We have ZSTDCompression as default. Will it work for windows by default? If not, we can mention it somewhere.

ZSTD works on windows. The windows build seems to be working fine.


options.go, line 135 at r7 (raw file):

Previously, ashish-goswami (Ashish Goswami) wrote…

Please complete the comment.

Done.


options/options.go, line 50 at r7 (raw file):

Previously, ashish-goswami (Ashish Goswami) wrote…

nit: period.

The comment was outdate. Removed it.


options/options.go, line 54 at r7 (raw file):

Previously, ashish-goswami (Ashish Goswami) wrote…

correct comment.

Done.


options/options.go, line 56 at r7 (raw file):

Previously, ashish-goswami (Ashish Goswami) wrote…

correct comment.

Done.


options/options.go, line 52 at r8 (raw file):

Previously, manishrjain (Manish R Jain) wrote…

Go towards smaller names. Makes the code simpler to read. You already are calling it a "CompressionType", then the actual names can be smaller.

None

Done.


options/options.go, line 54 at r8 (raw file):

Previously, manishrjain (Manish R Jain) wrote…

Snappy

Done.


options/options.go, line 56 at r8 (raw file):

Previously, manishrjain (Manish R Jain) wrote…

ZSTD

Done.


table/builder.go, line 352 at r7 (raw file):

Previously, ashish-goswami (Ashish Goswami) wrote…

We can define this err at the top. can be used while checking errors.

I don't think it will be reused anywhere. I'll prefer to keep it as it is.


table/table.go, line 65 at r7 (raw file):

Previously, ashish-goswami (Ashish Goswami) wrote…

Please complete the comment.

Done.


table/table.go, line 99 at r7 (raw file):

Previously, ashish-goswami (Ashish Goswami) wrote…

complete the comment.

Done.


table/table.go, line 368 at r8 (raw file):

Previously, manishrjain (Manish R Jain) wrote…

This if isn't required? Because t.decompressData already does this check.

Done.


table/table_test.go, line 855 at r8 (raw file):

Previously, manishrjain (Manish R Jain) wrote…

100 chars.

Done.

@jarifibrahim
Copy link
Contributor Author

The Travis build has been failing because coveralls is returning 500 error code. The tests are working fine.

Copy link
Contributor Author

@jarifibrahim jarifibrahim left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 9 of 20 files reviewed, 17 unresolved discussions (waiting on @ashish-goswami and @manishrjain)


table/table.go, line 261 at r10 (raw file):

	if !it2.Valid() {
		return errors.Wrapf(it2.err, "failed to initialize biggest for table %s", t.Filename())
	}

This was changed so that we can return an appropriate error when incorrect checksum algorithm was used.


table/table_test.go, line 592 at r10 (raw file):

		{"k2", "a2"},
	}, opts)
	f2 := buildTable(t, [][]string{{"l1", "b1"}}, opts)

This was added because the changes in table.go requires a table to be non-empty.


table/table_test.go, line 636 at r10 (raw file):

func TestMergingIteratorTakeTwo(t *testing.T) {
	opts := getTestTableOptions()
	f1 := buildTable(t, [][]string{{"l1", "b1"}}, opts)

This was added because the changes in table.go requires a table to be non-empty.

Copy link
Contributor

@ashish-goswami ashish-goswami left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

Reviewable status: 9 of 20 files reviewed, 10 unresolved discussions (waiting on @manishrjain)

@jarifibrahim jarifibrahim merged commit e7d0a7b into master Sep 30, 2019
@jarifibrahim jarifibrahim deleted the ibrahim/compression branch September 30, 2019 11:30
danielmai added a commit that referenced this pull request Nov 13, 2019
Rename the builder method WithCompressionType added in #1013 to
WithCompression to be consistent with the option struct field
named Compression.

Update the godoc for WithCompression with the default compression
algorithm used.
bobotu pushed a commit to bobotu/badger that referenced this pull request Dec 20, 2019
This commit adds support for compression in badger. Two compression
algorithms - Snappy and ZSTD are supported for now. The compression algorithm information
is stored in the manifest file. We compress blocks (typically of size 4KB) stored in the SST.

The compression algorithm can be specified via the CompressionType option to badger.
coocood pushed a commit to pingcap/badger that referenced this pull request Dec 20, 2019
* Support compression in Badger (hypermodeinc#1013)

This commit adds support for compression in badger. Two compression
algorithms - Snappy and ZSTD are supported for now. The compression algorithm information
is stored in the manifest file. We compress blocks (typically of size 4KB) stored in the SST.

The compression algorithm can be specified via the CompressionType option to badger.

* fix move down and address comments

Co-authored-by: Ibrahim Jarif <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

Support compression of SS-tables
5 participants