Feature/bp128 pack #1308

janardhan1993 · 2017-08-04T00:31:47Z

This change is

manishrjain · 2017-08-09T05:29:38Z

Reviewed 5 of 19 files at r1, 11 of 20 files at r2, 7 of 7 files at r3.
Review status: all files reviewed at latest revision, 24 unresolved discussions.

algo/uidlist.go, line 48 at r3 (raw file):

	if o.Uids == nil {
		o.Uids = make([]uint64, 0, n)

should be lower of n or m. Even then, we can just avoid allocating so much memory upfront. and let Go take care of it. So, don't preallocate, basically.

algo/uidlist.go, line 54 at r3 (raw file):

	if n < m {
		if n == 0 {
			n += 1

n is always the smaller one.
if n > m {
n, m = m, n
}

Also, if m == 0 -> m = 1. Then calculate ratio.

algo/uidlist.go, line 80 at r3 (raw file):

	m := len(v)
	for i, k := 0, 0; i < n && k < m; {
		uid := u[i]

Reuse the other code.

algo/uidlist.go, line 103 at r3 (raw file):

func IntersectCompressedWithJump(bi *bp128.BPackIterator, v []uint64, o *[]uint64) {
	u := bi.Uids()

Use the index at the front. In fact, don't need two different approaches: lin and jump.

algo/uidlist.go, line 225 at r3 (raw file):

	dst := o.Uids[:0]
	if n == 0 {
		n += 1

n = 1

bp128/bp128.go, line 162 at r3 (raw file):

func (bp *BPackEncoder) WriteTo(in []byte) {
	x.AssertTrue(bp.length > 0)
	binary.BigEndian.PutUint64(in[:8], uint64(bp.length))

4 bytes for length.

bp128/bp128.go, line 183 at r3 (raw file):

	valid     bool
	lastSeed  []uint64
	buf       []uint64

Really hard to understand the code without comments. Add comments, everywhere.

bp128/peachpy/pack.py, line 19 at r1 (raw file):

    with Function(func_name, (in_ptr, out_ptr, seed_ptr)):
        inp = GeneralPurposeRegister64()

curp

bp128/peachpy/pack.py, line 20 at r1 (raw file):

    with Function(func_name, (in_ptr, out_ptr, seed_ptr)):
        inp = GeneralPurposeRegister64()
        outp = GeneralPurposeRegister64()

deltap

bp128/peachpy/pack.py, line 21 at r1 (raw file):

        inp = GeneralPurposeRegister64()
        outp = GeneralPurposeRegister64()
        seedp = GeneralPurposeRegister64()

prevp // The max integers from the last block.

bp128/peachpy/pack.py, line 23 at r1 (raw file):

        seedp = GeneralPurposeRegister64()

        LOAD.ARGUMENT(inp, in_ptr)

This would load the pointer address in the register.

bp128/peachpy/pack.py, line 30 at r1 (raw file):

        # We can do inplace delta calculations with copying if we
        # iterate from back,i.e. we point to the last 16bytes(128bits)
        ADD(inp, 16*63)

Everything is in bytes. We have 128 integers, consuming 1024 bytes (8 bytes per uint64).

1024 - 16 byte (128 bit) register = 1008.

This is moving the pointer to the last integers in the slice, so they will fill up a register.

bp128/peachpy/pack.py, line 31 at r1 (raw file):

        # iterate from back,i.e. we point to the last 16bytes(128bits)
        ADD(inp, 16*63)
        ADD(outp, 16*(bit_size - 1))

Also explain the calculation here.

bp128/peachpy/pack.py, line 34 at r1 (raw file):

        # Store the last vector
        last = XMMRegister()

What is XMMRegister? 16 bytes.

bp128/peachpy/pack.py, line 34 at r1 (raw file):

        # Store the last vector
        last = XMMRegister()

Tail

bp128/peachpy/pack.py, line 36 at r1 (raw file):

        last = XMMRegister()
        # MOV unaligned
        MOVDQU(last, [inp])

We just stored 16 bytes from what inp is pointing to , into last.

posting/list.go, line 99 at r3 (raw file):

	pidx       int // index of postings
	// Redudant TODO
	ulen  int

Remove ulen, and uidx. Because now you have bi.

posting/list.go, line 606 at r3 (raw file):

		})
	}
	count := bi.Length() - uidx

bi.Length() - bi.StartIdx()

posting/list.go, line 763 at r3 (raw file):

// Uids returns the UIDs given some query params.
// We have to apply the filtering before applying (offset, count).
func (l *List) Uids(opt ListOptions) *protos.List {

Add a warning that this is expensive, and we should avoid doing this.

posting/list.go, line 768 at r3 (raw file):

	if len(l.mlayer) > 0 {
		l.RUnlock()
		l.SyncIfDirty(false)

Remove this logic for now.

x/x.go, line 238 at r3 (raw file):

type BytesBuffer struct {
	data [][]byte
	off  int

sz int // store it and advance it with off

x/x.go, line 267 at r3 (raw file):

// returns a slice of lenght n to be used to writing
func (b *BytesBuffer) TouchBytes(n int) []byte {

Slice(n int)

x/x.go, line 276 at r3 (raw file):

func (b *BytesBuffer) Length() int {
	length := 0
	for i, d := range b.data {

just return sz directly.

x/x.go, line 303 at r3 (raw file):

// Always give back <= touched bytes
func (b *BytesBuffer) TruncateBy(n int) {
	b.off -= n

Assert that b.off >= 0.

Comments from Reviewable

janardhan1993 · 2017-08-09T13:03:04Z

Review status: 18 of 24 files reviewed at latest revision, 24 unresolved discussions.

algo/uidlist.go, line 48 at r3 (raw file):