Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issue 3366: Offset doesn't return correct results with multiple order statements. #3400

Merged
merged 7 commits into from
May 21, 2019

Conversation

pawanrawal
Copy link
Contributor

@pawanrawal pawanrawal commented May 12, 2019

Multisort happens in two stages.

  1. In the first stage the uids are sorted by the first predicate and
  2. In the second stage the other sorts are executed with the ids that are the output from the first stage.

While considering the uids that are input for the second stage, we must also consider uids with equal values at the boundary where offset/count is applied. This was being done for count but not for offset.

This PR adds multiSortOffsets to sortResult which keeps track of the pending offset (which would be <= offset in the query). The multiSortOffset must be applied to individual uid lists after all the sorts are done.


This change is Reviewable

@pawanrawal pawanrawal marked this pull request as ready for review May 12, 2019 13:24
@pawanrawal pawanrawal requested a review from manishrjain as a code owner May 12, 2019 13:24
@pawanrawal pawanrawal requested a review from a team May 12, 2019 13:24
Copy link
Contributor

@martinmr martinmr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 3 of 3 files at r1.
Reviewable status: all files reviewed, 6 unresolved discussions (waiting on @manishrjain and @pawanrawal)


query/query1_test.go, line 1586 at r1 (raw file):

}

func TestMultiSort8PaginateWithOffset(t *testing.T) {

what's the number 8 doing in the test name?'


worker/sort.go, line 42 at r1 (raw file):

 we apply offset

apply the offset


worker/sort.go, line 43 at r1 (raw file):

in

into


worker/sort.go, line 148 at r1 (raw file):

			}
			if len(ts.Order) > 1 {
				var offset int32

I see that some parts of the code need int while others require int32, causing a lot of casts back and forth between the two types. Would it be possible to refactor the code so that only one type is used?

In case it's possible, feel free to mark this as a TODO and do it in a separate PR.


worker/sort.go, line 527 at r1 (raw file):

	for i, ul := range ts.UidMatrix {
		il := &out[i]
		if count > 0 && len(il.ulist.Uids)-int(il.multiSortOffset) >= count {

Can you add a comment on what this if statement is checking for. It's not readily apparent by just looking at the code.


worker/sort.go, line 581 at r1 (raw file):

dont

don't

@martinmr martinmr requested a review from a team May 13, 2019 19:20
worker/sort.go Outdated
@@ -39,8 +39,14 @@ var emptySortResult pb.SortResult

type sortresult struct {
reply *pb.SortResult
vals [][]types.Val
err error
// For multi sort we apply the offset in two stages. In the first stage a part of the offset is applied but

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

line is 108 characters (from lll)

worker/sort.go Outdated
vals [][]types.Val
err error
// For multi sort we apply the offset in two stages. In the first stage a part of the offset is applied but
// equal values in the bucket that the offset falls into are skipped. This slice stores the remaining offset

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

line is 109 characters (from lll)

worker/sort.go Outdated
// For multi sort we apply the offset in two stages. In the first stage a part of the offset is applied but
// equal values in the bucket that the offset falls into are skipped. This slice stores the remaining offset
// for individual uid lists that must be applied after all multi sort is done.
// TODO (pawan) - Offset has type int32 whereas paginate function returns an int. We should use a common type

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

line is 110 characters (from lll)

worker/sort.go Outdated
@@ -167,12 +184,12 @@ func sortWithIndex(ctx context.Context, ts *pb.SortMessage) *sortresult {
order := ts.Order[0]
typ, err := schema.State().TypeOf(order.Attr)
if err != nil {
return &sortresult{&emptySortResult, nil, fmt.Errorf("Attribute %s not defined in schema", order.Attr)}
return &sortresult{&emptySortResult, nil, nil, fmt.Errorf("Attribute %s not defined in schema", order.Attr)}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

line is 110 characters (from lll)

worker/sort.go Outdated
}

// Get the tokenizers and choose the corresponding one.
if !schema.State().IsIndexed(order.Attr) {
return &sortresult{&emptySortResult, nil, x.Errorf("Attribute %s is not indexed.", order.Attr)}
return &sortresult{&emptySortResult, nil, nil, x.Errorf("Attribute %s is not indexed.", order.Attr)}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

line is 102 characters (from lll)

worker/sort.go Outdated
x.Errorf("Attribute:%s does not have exact index for sorting.", order.Attr)}
}
// Other types just have one tokenizer, so if we didn't find a
// sortable tokenizer, then attribute isn't sortable.
return &sortresult{&emptySortResult, nil, x.Errorf("Attribute:%s is not sortable.", order.Attr)}
return &sortresult{&emptySortResult, nil, nil, x.Errorf("Attribute:%s is not sortable.", order.Attr)}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

line is 103 characters (from lll)

worker/sort.go Outdated
if len(ts.Order) == 1 {
result.Uids = result.Uids[il.offset:n]
} else {
// Incase of multi sort we can't apply the offset yet, as the order might change after other sort

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

line is 101 characters (from lll)

Copy link
Contributor Author

@pawanrawal pawanrawal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 1 of 3 files reviewed, 13 unresolved discussions (waiting on @manishrjain, @martinmr, and @pawanrawal)


query/query1_test.go, line 1586 at r1 (raw file):

Previously, martinmr (Martin Martinez Rivera) wrote…

what's the number 8 doing in the test name?'

Removed. Notice how the previous tests have number 6, 7 and so on. It was a continuation from that but doesn't need to be.


worker/sort.go, line 42 at r1 (raw file):

Previously, martinmr (Martin Martinez Rivera) wrote…
 we apply offset

apply the offset

Done.


worker/sort.go, line 43 at r1 (raw file):

Previously, martinmr (Martin Martinez Rivera) wrote…
in

into

Done. Hopefully, I changed the correct in.


worker/sort.go, line 148 at r1 (raw file):

Previously, martinmr (Martin Martinez Rivera) wrote…

I see that some parts of the code need int while others require int32, causing a lot of casts back and forth between the two types. Would it be possible to refactor the code so that only one type is used?

In case it's possible, feel free to mark this as a TODO and do it in a separate PR.

True, this is because paginate returns an int whereas the Offset proto in the query uses int32. They can be converted to the same type. I have added a TODO and will address it in a separate PR.


worker/sort.go, line 527 at r1 (raw file):

Previously, martinmr (Martin Martinez Rivera) wrote…

Can you add a comment on what this if statement is checking for. It's not readily apparent by just looking at the code.

Added a comment. Hopefully its more clear now.


worker/sort.go, line 581 at r1 (raw file):

Previously, martinmr (Martin Martinez Rivera) wrote…
dont

don't

Done.

Copy link
Contributor Author

@pawanrawal pawanrawal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 1 of 3 files reviewed, 13 unresolved discussions (waiting on @golangcibot, @manishrjain, and @martinmr)


worker/sort.go, line 42 at r2 (raw file):

Previously, golangcibot (Bot from GolangCI) wrote…

line is 108 characters (from lll)

Done.


worker/sort.go, line 43 at r2 (raw file):

Previously, golangcibot (Bot from GolangCI) wrote…

line is 109 characters (from lll)

Done.


worker/sort.go, line 45 at r2 (raw file):

Previously, golangcibot (Bot from GolangCI) wrote…

line is 110 characters (from lll)

Done.


worker/sort.go, line 187 at r2 (raw file):

Previously, golangcibot (Bot from GolangCI) wrote…

line is 110 characters (from lll)

Done.


worker/sort.go, line 192 at r2 (raw file):

Previously, golangcibot (Bot from GolangCI) wrote…

line is 102 characters (from lll)

Done.


worker/sort.go, line 214 at r2 (raw file):

Previously, golangcibot (Bot from GolangCI) wrote…

line is 103 characters (from lll)

Done.


worker/sort.go, line 575 at r2 (raw file):

Previously, golangcibot (Bot from GolangCI) wrote…

line is 101 characters (from lll)

Done.

Copy link
Contributor

@martinmr martinmr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm: It looks good to me but I am also not very familiar with the sorting logic so you should have Manish take a look before merging.

Reviewed 1 of 2 files at r2, 1 of 1 files at r3.
Reviewable status: all files reviewed, 8 unresolved discussions (waiting on @golangcibot, @manishrjain, and @pawanrawal)


worker/sort.go, line 579 at r3 (raw file):

				result.Uids = result.Uids[il.offset:n]
			} else {
				// Incase of multi sort we can't apply the offset yet, as the order might change

s/Incase/In case/

@pawanrawal
Copy link
Contributor Author

@manishrjain could you please have a look and provide feedback?

Copy link
Contributor

@manishrjain manishrjain left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I honestly haven't looked at this code forever. I just left a few comments. Address those and feel free to merge.

If you want to run a Dgraph cluster, you should look into the compose directory. Build the main there to generate a docker-compose.yml. And then use run.sh to build Dgraph and bring up a cluster.

:lgtm:

Reviewable status: all files reviewed, 11 unresolved discussions (waiting on @golangcibot and @pawanrawal)


contrib/scripts/install.sh, line 16 at r3 (raw file):

pushd $GOPATH/src/google.golang.org/grpc
  # TODO(pawan) - This file seems to be useless. Delete it as dgraph doesn't compile with grpc v1.8.2

100 chars.


worker/sort.go, line 138 at r3 (raw file):

		select {
		case <-ctx.Done():
			return &sortresult{&emptySortResult, nil, nil, ctx.Err()}

It would make sense here to have a small func which can help create a sortResult object with {&emptySortResult, nil, nil, error}


worker/sort.go, line 507 at r3 (raw file):

	values          []types.Val
	uset            map[uint64]struct{}
	multiSortOffset int32

Instead of using multiSortOffset, it might make sense to use the UID where we broke off. That way, if the list changes in between, we can still anchor the results based on the UID that was returned?

Copy link
Contributor

@manishrjain manishrjain left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

Reviewable status: all files reviewed, 11 unresolved discussions (waiting on @golangcibot and @pawanrawal)

Copy link
Contributor Author

@pawanrawal pawanrawal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the tip about the compose directory.

Reviewable status: 1 of 3 files reviewed, 11 unresolved discussions (waiting on @golangcibot, @manishrjain, and @martinmr)


contrib/scripts/install.sh, line 16 at r3 (raw file):

Previously, manishrjain (Manish R Jain) wrote…

100 chars.

Done.


worker/sort.go, line 138 at r3 (raw file):

Previously, manishrjain (Manish R Jain) wrote…

It would make sense here to have a small func which can help create a sortResult object with {&emptySortResult, nil, nil, error}

Done.


worker/sort.go, line 507 at r3 (raw file):

Previously, manishrjain (Manish R Jain) wrote…

Instead of using multiSortOffset, it might make sense to use the UID where we broke off. That way, if the list changes in between, we can still anchor the results based on the UID that was returned?

I can't do that as the order of uids and the UID that we broke off at also can change after all the sorts are applied.


worker/sort.go, line 579 at r3 (raw file):

Previously, martinmr (Martin Martinez Rivera) wrote…

s/Incase/In case/

Done.

@pawanrawal
Copy link
Contributor Author

Some tests seem to be failing which seem to be unrelated to this change. Will wait for them to be fixed in master before merging the PR.

[11:50:12][Step 3/5] TestHelloWorld
[11:50:12][TestHelloWorld] [Test Output]
    bulk_live_fixture_test.go:103: Bulkloader didn't run: exit status 1
[11:54:34][Step 3/5] Failed tests detected
[11:51:13][Step 3/5] TestFacets
[11:51:13][TestFacets] [Test Output]
    bulk_live_fixture_test.go:103: Bulkloader didn't run: exit status 1
[11:52:13][Step 3/5] TestCountIndex
[11:52:13][TestCountIndex] [Test Output]
    bulk_live_fixture_test.go:103: Bulkloader didn't run: exit status 1

@martinmr
Copy link
Contributor

This should have fixed those tests: https://github.com/dgraph-io/dgraph/pull/3440/files

@pawanrawal pawanrawal merged commit a799030 into master May 21, 2019
@pawanrawal pawanrawal deleted the pawanrawal/multisort-offset branch May 21, 2019 05:16
pawanrawal added a commit that referenced this pull request May 21, 2019
…nts. (#3400)

Multisort happens in two stages.

In the first stage the uids are sorted by the first predicate and
In the second stage the other sorts are executed with the ids that are the output from the first stage.
While considering the uids that are input for the second stage, we must also consider uids with equal values at the boundary where offset/count is applied. This was being done for count but not for offset.

This PR adds multiSortOffsets to sortResult which keeps track of the pending offset (which would be <= offset in the query). The multiSortOffset must be applied to individual uid lists after all the sorts are done.
martinmr added a commit that referenced this pull request May 21, 2019
pawanrawal added a commit that referenced this pull request May 21, 2019
…nts. (#3400)

Multisort happens in two stages.

In the first stage the uids are sorted by the first predicate and
In the second stage the other sorts are executed with the ids that are the output from the first stage.
While considering the uids that are input for the second stage, we must also consider uids with equal values at the boundary where offset/count is applied. This was being done for count but not for offset.

This PR adds multiSortOffsets to sortResult which keeps track of the pending offset (which would be <= offset in the query). The multiSortOffset must be applied to individual uid lists after all the sorts are done.
pawanrawal added a commit that referenced this pull request May 21, 2019
… into release/v1.0 (#3455)

* Fix offset doesn't return correct results with multiple order statements. (#3400)

Multisort happens in two stages.

In the first stage the uids are sorted by the first predicate and
In the second stage the other sorts are executed with the ids that are the output from the first stage.
While considering the uids that are input for the second stage, we must also consider uids with equal values at the boundary where offset/count is applied. This was being done for count but not for offset.

This PR adds multiSortOffsets to sortResult which keeps track of the pending offset (which would be <= offset in the query). The multiSortOffset must be applied to individual uid lists after all the sorts are done.

* Use processToFastJsonNoErr in query_test.
dna2github pushed a commit to dna2fork/dgraph that referenced this pull request Jul 19, 2019
…nts. (hypermodeinc#3400)

Multisort happens in two stages.

In the first stage the uids are sorted by the first predicate and
In the second stage the other sorts are executed with the ids that are the output from the first stage.
While considering the uids that are input for the second stage, we must also consider uids with equal values at the boundary where offset/count is applied. This was being done for count but not for offset.

This PR adds multiSortOffsets to sortResult which keeps track of the pending offset (which would be <= offset in the query). The multiSortOffset must be applied to individual uid lists after all the sorts are done.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

4 participants