Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replace integer compression in UID Pack with groupvariant alogorithm in Assembly #3527

Merged
merged 7 commits into from
Jul 4, 2019

Conversation

animesh2049
Copy link
Contributor

@animesh2049 animesh2049 commented Jun 6, 2019

Todo -

  • Add test to ensure that we are compiling the right code
  • Add more tests
  • Fix TODO comments
  • Check and report coverage in the PR

This change is Reviewable

Sorry, something went wrong.

codec/codec.go Outdated Show resolved Hide resolved
codec/codec.go Outdated Show resolved Hide resolved
@animesh2049 animesh2049 force-pushed the animesh2049/groupvarint branch 2 times, most recently from 974a1b2 to 11058d6 Compare June 6, 2019 15:18
@animesh2049 animesh2049 marked this pull request as ready for review June 6, 2019 15:19
@animesh2049 animesh2049 requested review from manishrjain and a team as code owners June 6, 2019 15:19
@animesh2049 animesh2049 requested a review from mangalaman93 June 6, 2019 15:20
@mangalaman93 mangalaman93 changed the title Animesh2049/groupvarint Replace integer compression in UID Pack with groupvariant alogorithm in Assembly Jun 6, 2019
codec/benchmark/benchmark.go Outdated Show resolved Hide resolved
codec/codec.go Outdated Show resolved Hide resolved
codec/codec.go Outdated Show resolved Hide resolved
codec/codec.go Outdated Show resolved Hide resolved
codec/codec.go Outdated Show resolved Hide resolved
codec/codec.go Outdated Show resolved Hide resolved
codec/codec.go Outdated Show resolved Hide resolved
codec/codec_test.go Outdated Show resolved Hide resolved
codec/codec_test.go Outdated Show resolved Hide resolved
@animesh2049 animesh2049 force-pushed the animesh2049/groupvarint branch 2 times, most recently from fca947c to bc813f5 Compare June 7, 2019 12:22
Copy link
Contributor

@manishrjain manishrjain left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got a few comments.

Reviewed 5 of 16 files at r1, 4 of 11 files at r2, 7 of 7 files at r4.
Reviewable status: all files reviewed, 23 unresolved discussions (waiting on @animesh2049)


algo/algo_test.go, line 1 at r4 (raw file):

package algo

license.


codec/codec.go, line 56 at r4 (raw file):

	var out bytes.Buffer
	buf := make([]byte, 17)

var buf [17]byte

This gets allocated on stack and is cheaper.


codec/codec.go, line 57 at r4 (raw file):

	var out bytes.Buffer
	buf := make([]byte, 17)
	tmpUids := make([]uint32, 4)

var tmpUids [4]uint32

This gets allocated on stack instead of heap and is cheaper.


codec/codec.go, line 60 at r4 (raw file):

	for {
		i := 0
		for ; i < 4; i++ {

for i := 0; i < 4; i++


codec/codec.go, line 130 at r4 (raw file):

	var tmpUids [4]uint32
	deltas := block.Deltas

Can you call it encoded or data to make this more clear. I didn't realize this was a byte array.


codec/codec.go, line 133 at r4 (raw file):

	// Decoding always expects the encoded byte array to be of
	// length >= 4. Padding doesn't affect the decoded values.

>= 4 or > 4?


codec/codec.go, line 134 at r4 (raw file):

	// Decoding always expects the encoded byte array to be of
	// length >= 4. Padding doesn't affect the decoded values.
	deltas = append(deltas, 0, 0, 0)

This can be an expensive operation. The entire deltas byte array would need to be relocated to another place which has a bigger capacity to accommodate the extra 3 bytes.

You'd be better off only doing this when needed, i.e. when deltas becomes much smaller as you consume it in the for loop below.


codec/codec.go, line 141 at r4 (raw file):

	// of 4, that requires atleast 5 bytes. So if we are left with deltas
	// of length < 5, those are probably the padded 0s.
	for len(deltas) >= 5 {

Add a comment that even the minimum encoded 4 ints would consume at least 5 bytes, the way group varint works. You say that, but sandwiched between other comments and it isn't obviously clear.

In fact, you can draw a small diagram here to generally explain how groupvarint works.


codec/codec.go, line 145 at r4 (raw file):

		deltas = deltas[groupvarint.BytesUsed[deltas[0]]:]
		for i := 0; i < 4; i++ {
			d.uids = append(d.uids, uint64(tmpUids[i])+last)

You can assign the sum to cur and then last to cur.


codec/codec_test.go, line 243 at r4 (raw file):

}

type UInts []uint64

type Uids []uint64?


codec/codec_test.go, line 244 at r4 (raw file):

type UInts []uint64

Remove vertical spaces here between funcs.


codec/decoderversion.go, line 9 at r4 (raw file):

func init() {
	fmt.Println("[Decoder]: Using assembly version of decoder")

glog.Infof(...)


protos/pb.proto, line 285 at r4 (raw file):

	// encoding and decoding for uint32. We want to wrap it around our logic to use it here.
	// Default Blocksize is 256 so uint32 would be sufficient.
	uint32 uidNum = 3;

Protobufs use snake case. num_uids = 3;

@manishrjain manishrjain requested a review from martinmr June 11, 2019 04:42
@manishrjain
Copy link
Contributor

@animesh2049 : Can you ask Damian to review this PR as well?

@animesh2049
Copy link
Contributor Author


codec/codec.go, line 56 at r4 (raw file):

Previously, manishrjain (Manish R Jain) wrote…

var buf [17]byte

This gets allocated on stack and is cheaper.

If we declare buf as an array, we'll have convert it to slice every time we pass it to Encode4 function. Won't that be costly in terms of execution time? Similar is the case with tmpUids

@animesh2049
Copy link
Contributor Author


codec/codec.go, line 60 at r4 (raw file):

Previously, manishrjain (Manish R Jain) wrote…

for i := 0; i < 4; i++

Done.

@animesh2049
Copy link
Contributor Author


codec/codec.go, line 83 at r1 (raw file):

Previously, mangalaman93 (Aman Mangal) wrote…

just use size*

Done.

@animesh2049
Copy link
Contributor Author


codec/codec.go, line 124 at r1 (raw file):

Previously, mangalaman93 (Aman Mangal) wrote…

This doesn't seem right!

Done.

@animesh2049
Copy link
Contributor Author


codec/codec.go, line 130 at r3 (raw file):

Previously, mangalaman93 (Aman Mangal) wrote…

fix

Done.

@animesh2049
Copy link
Contributor Author


codec/codec.go, line 133 at r4 (raw file):

Previously, manishrjain (Manish R Jain) wrote…

>= 4 or > 4?

It expects the length of byte array to be atleast 4. So >=4.

@animesh2049
Copy link
Contributor Author


codec/codec.go, line 130 at r4 (raw file):

Previously, manishrjain (Manish R Jain) wrote…

Can you call it encoded or data to make this more clear. I didn't realize this was a byte array.

Done.

@animesh2049
Copy link
Contributor Author


codec/codec.go, line 313 at r2 (raw file):

Previously, mangalaman93 (Aman Mangal) wrote…

Move this as a constant.

Done.

@animesh2049
Copy link
Contributor Author


codec/codec.go, line 136 at r3 (raw file):

Previously, mangalaman93 (Aman Mangal) wrote…

fix

Done.

@animesh2049
Copy link
Contributor Author


codec/decoderversion.go, line 9 at r4 (raw file):

Previously, manishrjain (Manish R Jain) wrote…

glog.Infof(...)

Done.

@animesh2049
Copy link
Contributor Author


codec/codec_test.go, line 236 at r3 (raw file):

Previously, mangalaman93 (Aman Mangal) wrote…

unnecessary!

Done.

@animesh2049
Copy link
Contributor Author


codec/codec_test.go, line 243 at r4 (raw file):

Previously, manishrjain (Manish R Jain) wrote…

type Uids []uint64?

Wanted to sort array of uint64. Couldn't find any native function. sort.Sort takes an interface as argument.

@animesh2049
Copy link
Contributor Author


codec/codec_test.go, line 244 at r4 (raw file):

Previously, manishrjain (Manish R Jain) wrote…

Remove vertical spaces here between funcs.

Done.

@animesh2049
Copy link
Contributor Author


protos/pb.proto, line 285 at r4 (raw file):

Previously, manishrjain (Manish R Jain) wrote…

Protobufs use snake case. num_uids = 3;

Done.

@animesh2049 animesh2049 force-pushed the animesh2049/groupvarint branch from b97c1f2 to 9f9b035 Compare June 11, 2019 11:11
@animesh2049 animesh2049 force-pushed the animesh2049/groupvarint branch from 0a73db1 to d021e72 Compare June 18, 2019 10:24
Copy link
Contributor Author

@animesh2049 animesh2049 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 10 of 15 files reviewed, 21 unresolved discussions (waiting on @animesh2049, @campoy, @mangalaman93, @manishrjain, and @martinmr)


codec/codec.go, line 56 at r4 (raw file):

Previously, manishrjain (Manish R Jain) wrote…

No, it won't be costly. Slice is just a pointer to an underlying byte array. Based on my understanding, please do verify. CC: @campoy

Escape analysis shows it is getting allocated on stack.

./codec.go:56:13: (*Encoder).packBlock make([]byte, 17) does not escape
./codec.go:57:17: (*Encoder).packBlock make([]uint32, 4) does not escape

@animesh2049 animesh2049 force-pushed the animesh2049/groupvarint branch from d021e72 to 3100387 Compare June 18, 2019 11:38
Copy link
Member

@mangalaman93 mangalaman93 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor comments. LGTM. We can squash and merge into single commit once Manish approves it.

codec/codec.go Outdated Show resolved Hide resolved
codec/codec_test.go Outdated Show resolved Hide resolved
codec/codec_test.go Show resolved Hide resolved
Copy link
Contributor Author

@animesh2049 animesh2049 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 10 of 15 files reviewed, 24 unresolved discussions (waiting on @animesh2049, @campoy, @mangalaman93, @manishrjain, and @martinmr)


codec/codec.go, line 133 at r6 (raw file):

Previously, mangalaman93 (Aman Mangal) wrote…

newline not needed here

Done.


codec/codec_test.go, line 113 at r6 (raw file):

Previously, mangalaman93 (Aman Mangal) wrote…

newline not needed

Done.


codec/codec_test.go, line 280 at r6 (raw file):

Previously, mangalaman93 (Aman Mangal) wrote…

we have sortUint64 function now. Can we use that here?

I defined sortUint64 in uidlist_test.go. Also sort.Slice is being used only once here.

@animesh2049 animesh2049 force-pushed the animesh2049/groupvarint branch from 3100387 to 5c72966 Compare June 18, 2019 12:11
Copy link
Contributor

@manishrjain manishrjain left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm: Got a comment. In general, I think if you know the length of the slice and it is a constant, you could just declare it on stack. You don't have to rely upon Go compiler to do the right thing. I also noticed that you create a slice like x = make([]int, 4). But when you need to pass it, you send it as x[:]. If you make a slice using make, you don't need to do the x[:]. You only need to do this if you're creating it using x [4]int.

Rest looks good. Looking forward to this going into v1.1.

Reviewed 1 of 6 files at r6, 6 of 6 files at r8.
Dismissed @mangalaman93 from a discussion.
Reviewable status: all files reviewed, 14 unresolved discussions (waiting on @animesh2049, @mangalaman93, and @martinmr)


codec/codec.go, line 128 at r8 (raw file):

	d.uids = append(d.uids, last)

	tmpUids := make([]uint32, 4)

I still think that if you do know that the length of a slice is constant, then it should just be defined like var tmpUids [4]uint32.

Copy link
Member

@mangalaman93 mangalaman93 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think declaring method guarantees a variable going on stack or heap. The only way to really know is to do the escape analysis. Agree that doesn't make sense to do [:] while passing the slice any more. The whole point is of using make([]int, 4) was to do avoid the conversion from array to slice.

Reviewable status: all files reviewed, 5 unresolved discussions (waiting on @animesh2049 and @martinmr)


codec/codec.go, line 140 at r8 (raw file):

		}

		groupvarint.Decode4(tmpUids[:], encData)

Agree with Manish that we don't need [:] here.

Copy link
Contributor

@manishrjain manishrjain left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My understanding is that it does make a difference. x [4]int would always go on stack, and be copied over to heap if changed.

Reviewable status: all files reviewed, 5 unresolved discussions (waiting on @animesh2049 and @martinmr)

Copy link
Contributor

@manishrjain manishrjain left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Irrespective, the PR is good. Feel free to see if my comments need to be addressed and merge.

Reviewable status: all files reviewed, 5 unresolved discussions (waiting on @animesh2049 and @martinmr)

Copy link
Member

@mangalaman93 mangalaman93 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should remove the [:] before merging the PR while calling the function.

Reviewable status: all files reviewed, 5 unresolved discussions (waiting on @animesh2049 and @martinmr)

@animesh2049 animesh2049 force-pushed the animesh2049/groupvarint branch from 5c72966 to 08eb07e Compare June 24, 2019 08:07
@animesh2049
Copy link
Contributor Author


codec/codec.go, line 140 at r8 (raw file):

Previously, mangalaman93 (Aman Mangal) wrote…

Agree with Manish that we don't need [:] here.

Done.

Copy link
Contributor

@manishrjain manishrjain left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this PR still pending? Can we merge it?

Reviewable status: 14 of 15 files reviewed, 5 unresolved discussions (waiting on @animesh2049, @mangalaman93, @manishrjain, and @martinmr)

@animesh2049 animesh2049 merged commit 6665444 into master Jul 4, 2019
@animesh2049 animesh2049 deleted the animesh2049/groupvarint branch July 4, 2019 12:23
animesh2049 added a commit that referenced this pull request Jul 11, 2019
animesh2049 added a commit that referenced this pull request Jul 11, 2019
animesh2049 added a commit that referenced this pull request Jul 12, 2019
dna2github pushed a commit to dna2fork/dgraph that referenced this pull request Jul 19, 2019
@rstorr
Copy link

rstorr commented Oct 2, 2020

@CodeLingoBot capture Files need to include license

Unless another license is specified, each file needs to include the apache license.

@codelingo
Copy link

codelingo bot commented Oct 2, 2020

@rstorr
Copy link

rstorr commented Oct 2, 2020

@CodeLingoBot capture Prefer fixed size array allocations on stack over heap

@codelingo
Copy link

codelingo bot commented Oct 2, 2020

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants