Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature/bp128 pack #1308

Merged
merged 8 commits into from
Aug 10, 2017
Merged

Feature/bp128 pack #1308

merged 8 commits into from
Aug 10, 2017

Conversation

janardhan1993
Copy link
Contributor

@janardhan1993 janardhan1993 commented Aug 4, 2017

This change is Reviewable

Janardhan Reddy added 3 commits August 9, 2017 10:14
@manishrjain
Copy link
Contributor

Reviewed 5 of 19 files at r1, 11 of 20 files at r2, 7 of 7 files at r3.
Review status: all files reviewed at latest revision, 24 unresolved discussions.


algo/uidlist.go, line 48 at r3 (raw file):

	if o.Uids == nil {
		o.Uids = make([]uint64, 0, n)

should be lower of n or m. Even then, we can just avoid allocating so much memory upfront. and let Go take care of it. So, don't preallocate, basically.


algo/uidlist.go, line 54 at r3 (raw file):

	if n < m {
		if n == 0 {
			n += 1

n is always the smaller one.
if n > m {
n, m = m, n
}

Also, if m == 0 -> m = 1. Then calculate ratio.


algo/uidlist.go, line 80 at r3 (raw file):

	m := len(v)
	for i, k := 0, 0; i < n && k < m; {
		uid := u[i]

Reuse the other code.


algo/uidlist.go, line 103 at r3 (raw file):

func IntersectCompressedWithJump(bi *bp128.BPackIterator, v []uint64, o *[]uint64) {
	u := bi.Uids()

Use the index at the front. In fact, don't need two different approaches: lin and jump.


algo/uidlist.go, line 225 at r3 (raw file):

	dst := o.Uids[:0]
	if n == 0 {
		n += 1

n = 1


bp128/bp128.go, line 162 at r3 (raw file):

func (bp *BPackEncoder) WriteTo(in []byte) {
	x.AssertTrue(bp.length > 0)
	binary.BigEndian.PutUint64(in[:8], uint64(bp.length))

4 bytes for length.


bp128/bp128.go, line 183 at r3 (raw file):

	valid     bool
	lastSeed  []uint64
	buf       []uint64

Really hard to understand the code without comments. Add comments, everywhere.


bp128/peachpy/pack.py, line 19 at r1 (raw file):

    with Function(func_name, (in_ptr, out_ptr, seed_ptr)):
        inp = GeneralPurposeRegister64()

curp


bp128/peachpy/pack.py, line 20 at r1 (raw file):

    with Function(func_name, (in_ptr, out_ptr, seed_ptr)):
        inp = GeneralPurposeRegister64()
        outp = GeneralPurposeRegister64()

deltap


bp128/peachpy/pack.py, line 21 at r1 (raw file):

        inp = GeneralPurposeRegister64()
        outp = GeneralPurposeRegister64()
        seedp = GeneralPurposeRegister64()

prevp // The max integers from the last block.


bp128/peachpy/pack.py, line 23 at r1 (raw file):

        seedp = GeneralPurposeRegister64()

        LOAD.ARGUMENT(inp, in_ptr)

This would load the pointer address in the register.


bp128/peachpy/pack.py, line 30 at r1 (raw file):

        # We can do inplace delta calculations with copying if we
        # iterate from back,i.e. we point to the last 16bytes(128bits)
        ADD(inp, 16*63)

Everything is in bytes. We have 128 integers, consuming 1024 bytes (8 bytes per uint64).

1024 - 16 byte (128 bit) register = 1008.

This is moving the pointer to the last integers in the slice, so they will fill up a register.


bp128/peachpy/pack.py, line 31 at r1 (raw file):

        # iterate from back,i.e. we point to the last 16bytes(128bits)
        ADD(inp, 16*63)
        ADD(outp, 16*(bit_size - 1))

Also explain the calculation here.


bp128/peachpy/pack.py, line 34 at r1 (raw file):

        # Store the last vector
        last = XMMRegister()

What is XMMRegister? 16 bytes.


bp128/peachpy/pack.py, line 34 at r1 (raw file):

        # Store the last vector
        last = XMMRegister()

Tail


bp128/peachpy/pack.py, line 36 at r1 (raw file):

        last = XMMRegister()
        # MOV unaligned
        MOVDQU(last, [inp])

We just stored 16 bytes from what inp is pointing to , into last.


posting/list.go, line 99 at r3 (raw file):

	pidx       int // index of postings
	// Redudant TODO
	ulen  int

Remove ulen, and uidx. Because now you have bi.


posting/list.go, line 606 at r3 (raw file):

		})
	}
	count := bi.Length() - uidx

bi.Length() - bi.StartIdx()


posting/list.go, line 763 at r3 (raw file):

// Uids returns the UIDs given some query params.
// We have to apply the filtering before applying (offset, count).
func (l *List) Uids(opt ListOptions) *protos.List {

Add a warning that this is expensive, and we should avoid doing this.


posting/list.go, line 768 at r3 (raw file):

	if len(l.mlayer) > 0 {
		l.RUnlock()
		l.SyncIfDirty(false)

Remove this logic for now.


x/x.go, line 238 at r3 (raw file):

type BytesBuffer struct {
	data [][]byte
	off  int

sz int // store it and advance it with off


x/x.go, line 267 at r3 (raw file):

// returns a slice of lenght n to be used to writing
func (b *BytesBuffer) TouchBytes(n int) []byte {

Slice(n int)


x/x.go, line 276 at r3 (raw file):

func (b *BytesBuffer) Length() int {
	length := 0
	for i, d := range b.data {

just return sz directly.


x/x.go, line 303 at r3 (raw file):

// Always give back <= touched bytes
func (b *BytesBuffer) TruncateBy(n int) {
	b.off -= n

Assert that b.off >= 0.


Comments from Reviewable

Janardhan Reddy added 3 commits August 9, 2017 17:29
@janardhan1993
Copy link
Contributor Author

Review status: 18 of 24 files reviewed at latest revision, 24 unresolved discussions.


algo/uidlist.go, line 48 at r3 (raw file):

Previously, manishrjain (Manish R Jain) wrote…

should be lower of n or m. Even then, we can just avoid allocating so much memory upfront. and let Go take care of it. So, don't preallocate, basically.

Done.


algo/uidlist.go, line 54 at r3 (raw file):

Previously, manishrjain (Manish R Jain) wrote…

n is always the smaller one.
if n > m {
n, m = m, n
}

Also, if m == 0 -> m = 1. Then calculate ratio.

Done.


algo/uidlist.go, line 80 at r3 (raw file):

Previously, manishrjain (Manish R Jain) wrote…

Reuse the other code.

Done.


algo/uidlist.go, line 103 at r3 (raw file):

Previously, manishrjain (Manish R Jain) wrote…

Use the index at the front. In fact, don't need two different approaches: lin and jump.

Don't need to use index i think, we only jump within a block. If we jump to next block we would end up decompressing and coming back to older block


algo/uidlist.go, line 225 at r3 (raw file):

Previously, manishrjain (Manish R Jain) wrote…

n = 1

Done.


bp128/bp128.go, line 162 at r3 (raw file):

Previously, manishrjain (Manish R Jain) wrote…

4 bytes for length.

Done.


bp128/bp128.go, line 183 at r3 (raw file):

Previously, manishrjain (Manish R Jain) wrote…

Really hard to understand the code without comments. Add comments, everywhere.

Done.


bp128/peachpy/pack.py, line 19 at r1 (raw file):

Previously, manishrjain (Manish R Jain) wrote…

curp

This is not current, this is start of input slice. We use in_offset to access current pointer value


bp128/peachpy/pack.py, line 20 at r1 (raw file):

Previously, manishrjain (Manish R Jain) wrote…

deltap

This is not delta, it points to the start of output slice.


bp128/peachpy/pack.py, line 21 at r1 (raw file):

Previously, manishrjain (Manish R Jain) wrote…

prevp // The max integers from the last block.

Done.


bp128/peachpy/pack.py, line 23 at r1 (raw file):

Previously, manishrjain (Manish R Jain) wrote…

This would load the pointer address in the register.

Done.


bp128/peachpy/pack.py, line 30 at r1 (raw file):

Previously, manishrjain (Manish R Jain) wrote…

Everything is in bytes. We have 128 integers, consuming 1024 bytes (8 bytes per uint64).

1024 - 16 byte (128 bit) register = 1008.

This is moving the pointer to the last integers in the slice, so they will fill up a register.

Done.


bp128/peachpy/pack.py, line 31 at r1 (raw file):

Previously, manishrjain (Manish R Jain) wrote…

Also explain the calculation here.

Done.


bp128/peachpy/pack.py, line 34 at r1 (raw file):

Previously, manishrjain (Manish R Jain) wrote…

What is XMMRegister? 16 bytes.

Done.


bp128/peachpy/pack.py, line 34 at r1 (raw file):

Previously, manishrjain (Manish R Jain) wrote…

Tail

Done.


bp128/peachpy/pack.py, line 36 at r1 (raw file):

Previously, manishrjain (Manish R Jain) wrote…

We just stored 16 bytes from what inp is pointing to , into last.

Done.


posting/list.go, line 99 at r3 (raw file):

Previously, manishrjain (Manish R Jain) wrote…

Remove ulen, and uidx. Because now you have bi.

Done.


posting/list.go, line 606 at r3 (raw file):

Previously, manishrjain (Manish R Jain) wrote…

bi.Length() - bi.StartIdx()

Done.


posting/list.go, line 763 at r3 (raw file):

Previously, manishrjain (Manish R Jain) wrote…

Add a warning that this is expensive, and we should avoid doing this.

Done.


posting/list.go, line 768 at r3 (raw file):

Previously, manishrjain (Manish R Jain) wrote…

Remove this logic for now.

Done.


x/x.go, line 238 at r3 (raw file):

Previously, manishrjain (Manish R Jain) wrote…

sz int // store it and advance it with off

Done.


x/x.go, line 267 at r3 (raw file):

Previously, manishrjain (Manish R Jain) wrote…

Slice(n int)

Done.


x/x.go, line 276 at r3 (raw file):

Previously, manishrjain (Manish R Jain) wrote…

just return sz directly.

Done.


x/x.go, line 303 at r3 (raw file):

Previously, manishrjain (Manish R Jain) wrote…

Assert that b.off >= 0.

Done.


Comments from Reviewable

@manishrjain
Copy link
Contributor

:lgtm: Let's get this in!


Reviewed 6 of 6 files at r4.
Review status: all files reviewed at latest revision, 2 unresolved discussions.


bp128/bp128.go, line 270 at r4 (raw file):

func (pi *BPackIterator) AfterUid(uid uint64) (found bool) {
	// Current uncompressed block doesn't have uid, search for appropriate
	// block, uncompress it and store it in pi.out

s/uncompress/decompress


Comments from Reviewable

@janardhan1993 janardhan1993 merged commit 87583ec into master Aug 10, 2017
@janardhan1993 janardhan1993 deleted the feature/bp128_pack branch August 10, 2017 22:52
jarifibrahim pushed a commit that referenced this pull request Jul 11, 2020
This commit brings following new changes from badger
This commit also disable conflict detection in badger to save memory.

```
0dfb8b4 Changelog for v20.07.0 (#1411)
03ba278 Add missing changelog for v2.0.3 (#1410)
6001230 Revert "Compress/Encrypt Blocks in the background (#1227)" (#1409)
800305e Revert "Buffer pool for decompression (#1308)" (#1408)
63d9309 Revert "fix: Fix race condition in block.incRef (#1337)" (#1407)
e0d058c Revert "add assert to check integer overflow for table size (#1402)" (#1406)
d981f47 return error if the vlog writes exceeds more that 4GB. (#1400)
7f4e4b5 add assert to check integer overflow for table size (#1402)
8e896a7 Add a contribution guide (#1379)
b79aeef Avoid panic on multiple closer.Signal calls (#1401)
717b89c Enable cross-compiled 32bit tests on TravisCI (#1392)
09dfa66 Update ristretto to commit f66de99 (#1391)
509de73 Update head while replaying value log (#1372)
e013bfd Rework DB.DropPrefix (#1381)
3042e37 pre allocate cache key for the block cache and the bloom filter cache (#1371)
675efcd Increase default valueThreshold from 32B to 1KB (#1346)
158d927 Remove second initialization of writech in Open (#1382)
d37ce36 Tests: Use t.Parallel in TestIteratePrefix tests  (#1377)
3f4761d Force KeepL0InMemory to be true when InMemory is true (#1375)
dd332b0 Avoid panic in filltables() (#1365)
c45d966 Fix assert in background compression and encryption. (#1366)
```
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

2 participants