Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Always set BlockSize in encoder. #5255

Merged
merged 5 commits into from
Apr 22, 2020
Merged

Always set BlockSize in encoder. #5255

merged 5 commits into from
Apr 22, 2020

Conversation

martinmr
Copy link
Contributor

@martinmr martinmr commented Apr 21, 2020

Internal Ref:

  • DGRAPH-1267

Rollup was setting the BlockSize of the encoder for multi-part list but not for
normal lists.

This caused ApproxLen to return 0. Then the intersect algorithm exited early in this case.
Existing data will be encoded with the right block size the next time the list is rolled up.

This PR does the following.

  1. Set the blocksize.
  2. Add an assert in rollup to verify that the blocksize is always the right value.
  3. Eliminate the check in the intersect algorithm to not return immediately even if ApproxLen
    returns 0. There's no difference if the actual length is zero since the smallest list is picked
    to perform the iteration. If the list has data but a block size of zero, removing this check allows
    the query to return the right result. Added a test to verify this scenario.
  4. Added missing tests for the binary search intersect algorithm.

Fixes #5102


This change is Reviewable

Docs Preview: Dgraph Preview

@martinmr martinmr requested a review from manishrjain as a code owner April 21, 2020 00:55
Copy link
Contributor

@danielmai danielmai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this affect only trigram indices or @filter (intersections) with other indices like eq() too?

Reviewable status: 0 of 1 files reviewed, all discussions resolved (waiting on @manishrjain)

Copy link
Contributor Author

@martinmr martinmr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It affects all the queries that:

  1. Use the Uids method
  2. The list only has an immutable layer.
  3. The list of uids needs to be intersected.

I'll update the description.

Reviewable status: 0 of 1 files reviewed, all discussions resolved (waiting on @manishrjain)

@martinmr martinmr changed the title Fix error in regex queries by removing optimization. Fix bug in Uids by removing optimization. Apr 21, 2020
@martinmr martinmr requested a review from danielmai April 21, 2020 18:03
Copy link
Contributor

@manishrjain manishrjain left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 0 of 1 files reviewed, 1 unresolved discussion (waiting on @danielmai, @manishrjain, and @martinmr)


posting/list.go, line 985 at r1 (raw file):

	res := make([]uint64, 0, len(l.mutationMap)+codec.ApproxLen(l.plist.Pack))
	out := &pb.List{}
	if len(l.mutationMap) == 0 && opt.Intersect != nil && len(l.plist.Splits) == 0 {

Removing this would affect performance quite a bit I think. Better to determine why IntersectCompressedWith fails.

@martinmr martinmr requested a review from a team as a code owner April 22, 2020 01:17
@martinmr martinmr force-pushed the martinmr/regex-query branch from 8416e96 to 42cdc5d Compare April 22, 2020 20:24
@martinmr martinmr changed the title Fix bug in Uids by removing optimization. Always set BlockSize in encoder. Apr 22, 2020
@martinmr martinmr requested a review from manishrjain April 22, 2020 20:38
Copy link
Contributor

@manishrjain manishrjain left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm: Got a few comments. Please address before merging.

Reviewable status: 0 of 4 files reviewed, 4 unresolved discussions (waiting on @danielmai, @manishrjain, and @martinmr)


algo/uidlist.go, line 99 at r2 (raw file):

	lq := len(q)

	if ld == 0 || lq == 0 {

We could perhaps keep lq at least -- that is exact and not approximate.


posting/list.go, line 935 at r2 (raw file):

	})
	// Finish  writing the last part of the list (or the whole list if not a multi-part list).
	x.Check(err)

This can perhaps also be an error return.


posting/list.go, line 938 at r2 (raw file):

	plist.Pack = enc.Done()
	if plist.Pack != nil {
		x.AssertTrue(plist.Pack.BlockSize == uint32(blockSize))

If there's an error return by this func, please return error. I think we should go away from doing Asserts now.

Copy link
Contributor Author

@martinmr martinmr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 0 of 4 files reviewed, 4 unresolved discussions (waiting on @danielmai and @manishrjain)


algo/uidlist.go, line 99 at r2 (raw file):

Previously, manishrjain (Manish R Jain) wrote…

We could perhaps keep lq at least -- that is exact and not approximate.

Done.


posting/list.go, line 985 at r1 (raw file):

Previously, manishrjain (Manish R Jain) wrote…

Removing this would affect performance quite a bit I think. Better to determine why IntersectCompressedWith fails.

Done. Not removing this anymore.


posting/list.go, line 935 at r2 (raw file):

Previously, manishrjain (Manish R Jain) wrote…

This can perhaps also be an error return.

Done.


posting/list.go, line 938 at r2 (raw file):

Previously, manishrjain (Manish R Jain) wrote…

If there's an error return by this func, please return error. I think we should go away from doing Asserts now.

Done.

@martinmr martinmr merged commit ab93005 into master Apr 22, 2020
@martinmr martinmr deleted the martinmr/regex-query branch April 22, 2020 23:19
martinmr added a commit that referenced this pull request Apr 22, 2020
martinmr added a commit that referenced this pull request Apr 22, 2020
dna2github pushed a commit to dna2fork/dgraph that referenced this pull request Jul 18, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

Dgraph v20.03.0 Regexp Not returning expected results
3 participants