-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Issue 3366: Offset doesn't return correct results with multiple order statements. #3400
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 3 of 3 files at r1.
Reviewable status: all files reviewed, 6 unresolved discussions (waiting on @manishrjain and @pawanrawal)
query/query1_test.go, line 1586 at r1 (raw file):
} func TestMultiSort8PaginateWithOffset(t *testing.T) {
what's the number 8 doing in the test name?'
worker/sort.go, line 42 at r1 (raw file):
we apply offset
apply the offset
worker/sort.go, line 43 at r1 (raw file):
in
into
worker/sort.go, line 148 at r1 (raw file):
} if len(ts.Order) > 1 { var offset int32
I see that some parts of the code need int
while others require int32
, causing a lot of casts back and forth between the two types. Would it be possible to refactor the code so that only one type is used?
In case it's possible, feel free to mark this as a TODO and do it in a separate PR.
worker/sort.go, line 527 at r1 (raw file):
for i, ul := range ts.UidMatrix { il := &out[i] if count > 0 && len(il.ulist.Uids)-int(il.multiSortOffset) >= count {
Can you add a comment on what this if statement is checking for. It's not readily apparent by just looking at the code.
worker/sort.go, line 581 at r1 (raw file):
dont
don't
worker/sort.go
Outdated
@@ -39,8 +39,14 @@ var emptySortResult pb.SortResult | |||
|
|||
type sortresult struct { | |||
reply *pb.SortResult | |||
vals [][]types.Val | |||
err error | |||
// For multi sort we apply the offset in two stages. In the first stage a part of the offset is applied but |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
line is 108 characters (from lll
)
worker/sort.go
Outdated
vals [][]types.Val | ||
err error | ||
// For multi sort we apply the offset in two stages. In the first stage a part of the offset is applied but | ||
// equal values in the bucket that the offset falls into are skipped. This slice stores the remaining offset |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
line is 109 characters (from lll
)
worker/sort.go
Outdated
// For multi sort we apply the offset in two stages. In the first stage a part of the offset is applied but | ||
// equal values in the bucket that the offset falls into are skipped. This slice stores the remaining offset | ||
// for individual uid lists that must be applied after all multi sort is done. | ||
// TODO (pawan) - Offset has type int32 whereas paginate function returns an int. We should use a common type |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
line is 110 characters (from lll
)
worker/sort.go
Outdated
@@ -167,12 +184,12 @@ func sortWithIndex(ctx context.Context, ts *pb.SortMessage) *sortresult { | |||
order := ts.Order[0] | |||
typ, err := schema.State().TypeOf(order.Attr) | |||
if err != nil { | |||
return &sortresult{&emptySortResult, nil, fmt.Errorf("Attribute %s not defined in schema", order.Attr)} | |||
return &sortresult{&emptySortResult, nil, nil, fmt.Errorf("Attribute %s not defined in schema", order.Attr)} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
line is 110 characters (from lll
)
worker/sort.go
Outdated
} | ||
|
||
// Get the tokenizers and choose the corresponding one. | ||
if !schema.State().IsIndexed(order.Attr) { | ||
return &sortresult{&emptySortResult, nil, x.Errorf("Attribute %s is not indexed.", order.Attr)} | ||
return &sortresult{&emptySortResult, nil, nil, x.Errorf("Attribute %s is not indexed.", order.Attr)} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
line is 102 characters (from lll
)
worker/sort.go
Outdated
x.Errorf("Attribute:%s does not have exact index for sorting.", order.Attr)} | ||
} | ||
// Other types just have one tokenizer, so if we didn't find a | ||
// sortable tokenizer, then attribute isn't sortable. | ||
return &sortresult{&emptySortResult, nil, x.Errorf("Attribute:%s is not sortable.", order.Attr)} | ||
return &sortresult{&emptySortResult, nil, nil, x.Errorf("Attribute:%s is not sortable.", order.Attr)} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
line is 103 characters (from lll
)
worker/sort.go
Outdated
if len(ts.Order) == 1 { | ||
result.Uids = result.Uids[il.offset:n] | ||
} else { | ||
// Incase of multi sort we can't apply the offset yet, as the order might change after other sort |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
line is 101 characters (from lll
)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 1 of 3 files reviewed, 13 unresolved discussions (waiting on @manishrjain, @martinmr, and @pawanrawal)
query/query1_test.go, line 1586 at r1 (raw file):
Previously, martinmr (Martin Martinez Rivera) wrote…
what's the number 8 doing in the test name?'
Removed. Notice how the previous tests have number 6, 7 and so on. It was a continuation from that but doesn't need to be.
worker/sort.go, line 42 at r1 (raw file):
Previously, martinmr (Martin Martinez Rivera) wrote…
we apply offset
apply the offset
Done.
worker/sort.go, line 43 at r1 (raw file):
Previously, martinmr (Martin Martinez Rivera) wrote…
in
into
Done. Hopefully, I changed the correct in
.
worker/sort.go, line 148 at r1 (raw file):
Previously, martinmr (Martin Martinez Rivera) wrote…
I see that some parts of the code need
int
while others requireint32
, causing a lot of casts back and forth between the two types. Would it be possible to refactor the code so that only one type is used?In case it's possible, feel free to mark this as a TODO and do it in a separate PR.
True, this is because paginate
returns an int
whereas the Offset
proto in the query uses int32
. They can be converted to the same type. I have added a TODO and will address it in a separate PR.
worker/sort.go, line 527 at r1 (raw file):
Previously, martinmr (Martin Martinez Rivera) wrote…
Can you add a comment on what this if statement is checking for. It's not readily apparent by just looking at the code.
Added a comment. Hopefully its more clear now.
worker/sort.go, line 581 at r1 (raw file):
Previously, martinmr (Martin Martinez Rivera) wrote…
dont
don't
Done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 1 of 3 files reviewed, 13 unresolved discussions (waiting on @golangcibot, @manishrjain, and @martinmr)
worker/sort.go, line 42 at r2 (raw file):
Previously, golangcibot (Bot from GolangCI) wrote…
line is 108 characters (from
lll
)
Done.
worker/sort.go, line 43 at r2 (raw file):
Previously, golangcibot (Bot from GolangCI) wrote…
line is 109 characters (from
lll
)
Done.
worker/sort.go, line 45 at r2 (raw file):
Previously, golangcibot (Bot from GolangCI) wrote…
line is 110 characters (from
lll
)
Done.
worker/sort.go, line 187 at r2 (raw file):
Previously, golangcibot (Bot from GolangCI) wrote…
line is 110 characters (from
lll
)
Done.
worker/sort.go, line 192 at r2 (raw file):
Previously, golangcibot (Bot from GolangCI) wrote…
line is 102 characters (from
lll
)
Done.
worker/sort.go, line 214 at r2 (raw file):
Previously, golangcibot (Bot from GolangCI) wrote…
line is 103 characters (from
lll
)
Done.
worker/sort.go, line 575 at r2 (raw file):
Previously, golangcibot (Bot from GolangCI) wrote…
line is 101 characters (from
lll
)
Done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks good to me but I am also not very familiar with the sorting logic so you should have Manish take a look before merging.
Reviewed 1 of 2 files at r2, 1 of 1 files at r3.
Reviewable status: all files reviewed, 8 unresolved discussions (waiting on @golangcibot, @manishrjain, and @pawanrawal)
worker/sort.go, line 579 at r3 (raw file):
result.Uids = result.Uids[il.offset:n] } else { // Incase of multi sort we can't apply the offset yet, as the order might change
s/Incase/In case/
@manishrjain could you please have a look and provide feedback? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I honestly haven't looked at this code forever. I just left a few comments. Address those and feel free to merge.
If you want to run a Dgraph cluster, you should look into the compose directory. Build the main there to generate a docker-compose.yml. And then use run.sh to build Dgraph and bring up a cluster.
Reviewable status: all files reviewed, 11 unresolved discussions (waiting on @golangcibot and @pawanrawal)
contrib/scripts/install.sh, line 16 at r3 (raw file):
pushd $GOPATH/src/google.golang.org/grpc # TODO(pawan) - This file seems to be useless. Delete it as dgraph doesn't compile with grpc v1.8.2
100 chars.
worker/sort.go, line 138 at r3 (raw file):
select { case <-ctx.Done(): return &sortresult{&emptySortResult, nil, nil, ctx.Err()}
It would make sense here to have a small func which can help create a sortResult object with {&emptySortResult, nil, nil, error}
worker/sort.go, line 507 at r3 (raw file):
values []types.Val uset map[uint64]struct{} multiSortOffset int32
Instead of using multiSortOffset, it might make sense to use the UID where we broke off. That way, if the list changes in between, we can still anchor the results based on the UID that was returned?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: all files reviewed, 11 unresolved discussions (waiting on @golangcibot and @pawanrawal)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the tip about the compose directory.
Reviewable status: 1 of 3 files reviewed, 11 unresolved discussions (waiting on @golangcibot, @manishrjain, and @martinmr)
contrib/scripts/install.sh, line 16 at r3 (raw file):
Previously, manishrjain (Manish R Jain) wrote…
100 chars.
Done.
worker/sort.go, line 138 at r3 (raw file):
Previously, manishrjain (Manish R Jain) wrote…
It would make sense here to have a small func which can help create a sortResult object with
{&emptySortResult, nil, nil, error}
Done.
worker/sort.go, line 507 at r3 (raw file):
Previously, manishrjain (Manish R Jain) wrote…
Instead of using multiSortOffset, it might make sense to use the UID where we broke off. That way, if the list changes in between, we can still anchor the results based on the UID that was returned?
I can't do that as the order of uids and the UID that we broke off at also can change after all the sorts are applied.
worker/sort.go, line 579 at r3 (raw file):
Previously, martinmr (Martin Martinez Rivera) wrote…
s/Incase/In case/
Done.
Some tests seem to be failing which seem to be unrelated to this change. Will wait for them to be fixed in master before merging the PR.
|
This should have fixed those tests: https://github.com/dgraph-io/dgraph/pull/3440/files |
…nts. (#3400) Multisort happens in two stages. In the first stage the uids are sorted by the first predicate and In the second stage the other sorts are executed with the ids that are the output from the first stage. While considering the uids that are input for the second stage, we must also consider uids with equal values at the boundary where offset/count is applied. This was being done for count but not for offset. This PR adds multiSortOffsets to sortResult which keeps track of the pending offset (which would be <= offset in the query). The multiSortOffset must be applied to individual uid lists after all the sorts are done.
…nts. (#3400) Multisort happens in two stages. In the first stage the uids are sorted by the first predicate and In the second stage the other sorts are executed with the ids that are the output from the first stage. While considering the uids that are input for the second stage, we must also consider uids with equal values at the boundary where offset/count is applied. This was being done for count but not for offset. This PR adds multiSortOffsets to sortResult which keeps track of the pending offset (which would be <= offset in the query). The multiSortOffset must be applied to individual uid lists after all the sorts are done.
… into release/v1.0 (#3455) * Fix offset doesn't return correct results with multiple order statements. (#3400) Multisort happens in two stages. In the first stage the uids are sorted by the first predicate and In the second stage the other sorts are executed with the ids that are the output from the first stage. While considering the uids that are input for the second stage, we must also consider uids with equal values at the boundary where offset/count is applied. This was being done for count but not for offset. This PR adds multiSortOffsets to sortResult which keeps track of the pending offset (which would be <= offset in the query). The multiSortOffset must be applied to individual uid lists after all the sorts are done. * Use processToFastJsonNoErr in query_test.
…nts. (hypermodeinc#3400) Multisort happens in two stages. In the first stage the uids are sorted by the first predicate and In the second stage the other sorts are executed with the ids that are the output from the first stage. While considering the uids that are input for the second stage, we must also consider uids with equal values at the boundary where offset/count is applied. This was being done for count but not for offset. This PR adds multiSortOffsets to sortResult which keeps track of the pending offset (which would be <= offset in the query). The multiSortOffset must be applied to individual uid lists after all the sorts are done.
Multisort happens in two stages.
While considering the uids that are input for the second stage, we must also consider uids with equal values at the boundary where offset/count is applied. This was being done for count but not for offset.
This PR adds
multiSortOffsets
tosortResult
which keeps track of the pending offset (which would be <= offset in the query). The multiSortOffset must be applied to individual uid lists after all the sorts are done.This change is