Issue 3366: Offset doesn't return correct results with multiple order statements. #3400

pawanrawal · 2019-05-12T12:55:06Z

Multisort happens in two stages.

In the first stage the uids are sorted by the first predicate and
In the second stage the other sorts are executed with the ids that are the output from the first stage.

While considering the uids that are input for the second stage, we must also consider uids with equal values at the boundary where offset/count is applied. This was being done for count but not for offset.

This PR adds multiSortOffsets to sortResult which keeps track of the pending offset (which would be <= offset in the query). The multiSortOffset must be applied to individual uid lists after all the sorts are done.

This change is

martinmr

Reviewed 3 of 3 files at r1.
Reviewable status: all files reviewed, 6 unresolved discussions (waiting on @manishrjain and @pawanrawal)

query/query1_test.go, line 1586 at r1 (raw file):

}

func TestMultiSort8PaginateWithOffset(t *testing.T) {

what's the number 8 doing in the test name?'

worker/sort.go, line 42 at r1 (raw file):

 we apply offset

apply the offset

worker/sort.go, line 43 at r1 (raw file):

in

into

worker/sort.go, line 148 at r1 (raw file):

			}
			if len(ts.Order) > 1 {
				var offset int32

I see that some parts of the code need int while others require int32, causing a lot of casts back and forth between the two types. Would it be possible to refactor the code so that only one type is used?

In case it's possible, feel free to mark this as a TODO and do it in a separate PR.

worker/sort.go, line 527 at r1 (raw file):

	for i, ul := range ts.UidMatrix {
		il := &out[i]
		if count > 0 && len(il.ulist.Uids)-int(il.multiSortOffset) >= count {

Can you add a comment on what this if statement is checking for. It's not readily apparent by just looking at the code.

worker/sort.go, line 581 at r1 (raw file):

dont

don't

golangcibot · 2019-05-14T07:28:16Z

worker/sort.go

@@ -39,8 +39,14 @@ var emptySortResult pb.SortResult

 type sortresult struct {
 	reply *pb.SortResult
-	vals  [][]types.Val
-	err   error
+	// For multi sort we apply the offset in two stages. In the first stage a part of the offset is applied but


line is 108 characters (from lll)

golangcibot · 2019-05-14T07:28:17Z

worker/sort.go

-	vals  [][]types.Val
-	err   error
+	// For multi sort we apply the offset in two stages. In the first stage a part of the offset is applied but
+	// equal values in the bucket that the offset falls into are skipped. This slice stores the remaining offset


line is 109 characters (from lll)

golangcibot · 2019-05-14T07:28:17Z

worker/sort.go

+	// For multi sort we apply the offset in two stages. In the first stage a part of the offset is applied but
+	// equal values in the bucket that the offset falls into are skipped. This slice stores the remaining offset
+	// for individual uid lists that must be applied after all multi sort is done.
+	// TODO (pawan) - Offset has type int32 whereas paginate function returns an int. We should use a common type


line is 110 characters (from lll)

golangcibot · 2019-05-14T07:28:17Z

worker/sort.go

@@ -167,12 +184,12 @@ func sortWithIndex(ctx context.Context, ts *pb.SortMessage) *sortresult {
 	order := ts.Order[0]
 	typ, err := schema.State().TypeOf(order.Attr)
 	if err != nil {
-		return &sortresult{&emptySortResult, nil, fmt.Errorf("Attribute %s not defined in schema", order.Attr)}
+		return &sortresult{&emptySortResult, nil, nil, fmt.Errorf("Attribute %s not defined in schema", order.Attr)}


line is 110 characters (from lll)

golangcibot · 2019-05-14T07:28:18Z

worker/sort.go

 	}

 	// Get the tokenizers and choose the corresponding one.
 	if !schema.State().IsIndexed(order.Attr) {
-		return &sortresult{&emptySortResult, nil, x.Errorf("Attribute %s is not indexed.", order.Attr)}
+		return &sortresult{&emptySortResult, nil, nil, x.Errorf("Attribute %s is not indexed.", order.Attr)}


line is 102 characters (from lll)

golangcibot · 2019-05-14T07:28:18Z

worker/sort.go

 				x.Errorf("Attribute:%s does not have exact index for sorting.", order.Attr)}
 		}
 		// Other types just have one tokenizer, so if we didn't find a
 		// sortable tokenizer, then attribute isn't sortable.
-		return &sortresult{&emptySortResult, nil, x.Errorf("Attribute:%s is not sortable.", order.Attr)}
+		return &sortresult{&emptySortResult, nil, nil, x.Errorf("Attribute:%s is not sortable.", order.Attr)}


line is 103 characters (from lll)

golangcibot · 2019-05-14T07:28:18Z

worker/sort.go

+			if len(ts.Order) == 1 {
+				result.Uids = result.Uids[il.offset:n]
+			} else {
+				// Incase of multi sort we can't apply the offset yet, as the order might change after other sort


line is 101 characters (from lll)

pawanrawal

Reviewable status: 1 of 3 files reviewed, 13 unresolved discussions (waiting on @manishrjain, @martinmr, and @pawanrawal)

query/query1_test.go, line 1586 at r1 (raw file):

Previously, martinmr (Martin Martinez Rivera) wrote…

what's the number 8 doing in the test name?'

Removed. Notice how the previous tests have number 6, 7 and so on. It was a continuation from that but doesn't need to be.

worker/sort.go, line 42 at r1 (raw file):

Previously, martinmr (Martin Martinez Rivera) wrote…

 we apply offset
apply the offset

Done.

worker/sort.go, line 43 at r1 (raw file):

Previously, martinmr (Martin Martinez Rivera) wrote…

in
into

Done. Hopefully, I changed the correct in.

worker/sort.go, line 148 at r1 (raw file):

Previously, martinmr (Martin Martinez Rivera) wrote…

I see that some parts of the code need int while others require int32, causing a lot of casts back and forth between the two types. Would it be possible to refactor the code so that only one type is used?

In case it's possible, feel free to mark this as a TODO and do it in a separate PR.

True, this is because paginate returns an int whereas the Offset proto in the query uses int32. They can be converted to the same type. I have added a TODO and will address it in a separate PR.

worker/sort.go, line 527 at r1 (raw file):

Previously, martinmr (Martin Martinez Rivera) wrote…

Can you add a comment on what this if statement is checking for. It's not readily apparent by just looking at the code.

Added a comment. Hopefully its more clear now.

worker/sort.go, line 581 at r1 (raw file):

Previously, martinmr (Martin Martinez Rivera) wrote…

dont
don't

Done.

pawanrawal

Reviewable status: 1 of 3 files reviewed, 13 unresolved discussions (waiting on @golangcibot, @manishrjain, and @martinmr)

worker/sort.go, line 42 at r2 (raw file):