Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create a lot simpler sorted uint64 codec #2716

Merged
merged 4 commits into from
Oct 31, 2018
Merged

Create a lot simpler sorted uint64 codec #2716

merged 4 commits into from
Oct 31, 2018

Conversation

manishrjain
Copy link
Contributor

@manishrjain manishrjain commented Oct 31, 2018

We currently use bp128 package, a modification on a 3rd party library, which uses a complex piece of code, that hasn't been touched since it was written. It is hard to understand and debug, given a lot of it is using SIMD instructions, generated via a python library.

We have been seeing panics during rollups, and reads which indicate bugs in this implementation. Instead of trying to fix that up, I wrote this PR, which uses concepts from Binary Packing and the fact that protobufs use Varint encoding, to write a very simple encoder/decoder for posting lists.

This PR just contains the implementation, tests and benchmarks. Next PR would modify posting lists to replace the current bp128 with the new binary package.


This change is Reviewable

binary/codec.go Show resolved Hide resolved
binary/codec.go Show resolved Hide resolved
binary/codec.go Show resolved Hide resolved
binary/codec.go Show resolved Hide resolved
Copy link
Contributor Author

@manishrjain manishrjain left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 0 of 4 files reviewed, 4 unresolved discussions (waiting on @golangcibot)


binary/codec.go, line 7 at r1 (raw file):

Previously, golangcibot (Bot from GolangCI) wrote…

func packBlock is unused

Done.


binary/codec.go, line 27 at r1 (raw file):

Previously, golangcibot (Bot from GolangCI) wrote…

func Encode is unused

Done.


binary/codec.go, line 43 at r1 (raw file):

Previously, golangcibot (Bot from GolangCI) wrote…

func NumUids is unused

Done.


binary/codec.go, line 51 at r1 (raw file):

Previously, golangcibot (Bot from GolangCI) wrote…

func Decode is unused

Done.

@manishrjain manishrjain merged commit 6082501 into master Oct 31, 2018
@manishrjain manishrjain deleted the mrjn/binary branch October 31, 2018 23:14
dna2github pushed a commit to dna2fork/dgraph that referenced this pull request Jul 19, 2019
This PR contains a simple encoder/decoder using concepts from binary packing building on protobuf's default varint encoding.

We currently use bp128 package, a modification on a 3rd party library, which uses a complex piece of code, that hasn't been touched since it was written. It is hard to understand and debug, given a lot of it is using SIMD instructions generated via a python library.

We have been seeing panics during rollups, and reads which indicate bugs in this implementation. Instead of trying to fix that up, I wrote this PR, which uses concepts from Binary Packing and the fact that protobufs use Varint encoding to write a very simple encoder/decoder for posting lists.

This PR just contains the implementation, tests and benchmarks. Next PR would modify posting lists to replace the current bp128 with the new binary package.

Benchmark output with artificial dataset list:

```
goos: linux
goarch: amd64
pkg: github.com/dgraph-io/dgraph/binary
BenchmarkGzip-4      	      10	 112428082 ns/op
--- BENCH: BenchmarkGzip-4
	codec_test.go:77: Dataset Len=1000000. Size: 8.0 MB
	codec_test.go:95: Output size: 2.6 MB. Compression: 0.33
	codec_test.go:77: Dataset Len=1000000. Size: 8.0 MB
	codec_test.go:95: Output size: 2.6 MB. Compression: 0.33
BenchmarkUidPack/encode/128-4         	     100	  14213068 ns/op
--- BENCH: BenchmarkUidPack/encode/128-4
	codec_test.go:105: Dataset Len=1000000. Size: 8.0 MB
	codec_test.go:115: Output size: 1.1 MB. Compression: 0.13
	codec_test.go:105: Dataset Len=1000000. Size: 8.0 MB
	codec_test.go:115: Output size: 1.1 MB. Compression: 0.13
BenchmarkUidPack/encode/256-4         	     100	  12399005 ns/op
--- BENCH: BenchmarkUidPack/encode/256-4
	codec_test.go:105: Dataset Len=1000000. Size: 8.0 MB
	codec_test.go:115: Output size: 1.0 MB. Compression: 0.13
	codec_test.go:105: Dataset Len=1000000. Size: 8.0 MB
	codec_test.go:115: Output size: 1.0 MB. Compression: 0.13
BenchmarkUidPack/decode/128-4         	    1000	   2213527 ns/op
--- BENCH: BenchmarkUidPack/decode/128-4
	codec_test.go:140: Dataset Len=1000000. Size: 8.0 MB
	codec_test.go:145: Output size: 1.1 MB. Compression: 0.13
	codec_test.go:140: Dataset Len=1000000. Size: 8.0 MB
	codec_test.go:145: Output size: 1.1 MB. Compression: 0.13
	codec_test.go:140: Dataset Len=1000000. Size: 8.0 MB
	codec_test.go:145: Output size: 1.1 MB. Compression: 0.13
BenchmarkUidPack/decode/256-4         	    1000	   2014434 ns/op
--- BENCH: BenchmarkUidPack/decode/256-4
	codec_test.go:140: Dataset Len=1000000. Size: 8.0 MB
	codec_test.go:145: Output size: 1.0 MB. Compression: 0.13
	codec_test.go:140: Dataset Len=1000000. Size: 8.0 MB
	codec_test.go:145: Output size: 1.0 MB. Compression: 0.13
	codec_test.go:140: Dataset Len=1000000. Size: 8.0 MB
	codec_test.go:145: Output size: 1.0 MB. Compression: 0.13
PASS
ok  	github.com/dgraph-io/dgraph/binary	17.786s
```
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

2 participants