Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(DQL): optimize query for has function with offset. (#7727) #8431

Merged
merged 3 commits into from
Dec 2, 2022

Conversation

all-seeing-code
Copy link
Contributor

@all-seeing-code all-seeing-code commented Nov 14, 2022

Fixes DGRAPH-574.

This PR reduces the latency of the has function slightly. For example the given query:-

{
    me(func: has(actor.film), first: 10, offset: 300000){
       name@en
    }
}

when run on the 21million dataset takes around 80 milliseconds to execute as compared to earlier around 93 milliseconds.

Machine Specs:-

Model Name:	MacBook Pro
  Model Identifier:	MacBookPro16,1
  Processor Name:	6-Core Intel Core i7
  Processor Speed:	2.6 GHz
  Number of Processors:	1
  Total Number of Cores:	6
  L2 Cache (per Core):	256 KB
  L3 Cache:	12 MB
  Memory:	16 GB

Note:-
This does not optimize for the has function on predicates with the @lang tag.

(cherry picked from commit 9ba15b7)

@CLAassistant
Copy link

CLAassistant commented Nov 14, 2022

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you all sign our Contributor License Agreement before we can accept your contribution.
1 out of 2 committers have signed the CLA.

✅ anurags92
❌ minhaj-shakeel
You have signed the CLA already but the status is still pending? Let us recheck it.

@meghalims meghalims added cherry-pick cherry-pick-22.0.2 Label for all cherry-picks for 22.0.2 Release and removed cherry-pick labels Nov 14, 2022
@MichelDiz
Copy link
Contributor

@anurags92 this needs to clean up the hidden comments.

return
// for `has` function when there is no filtering and ordering, we fetch
// correct paginated results so no need to apply pagination here.
if !(len(sg.Filters) == 0 && sg.SrcFunc != nil && sg.SrcFunc.Name == "has") {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a case possible where this if evaluates to false but we still need to perform pagination like before?

}
}
return int32(count)
// make offset = 0, if there is need to fetch all the results.
return int32(count), 0

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should probably be:
return int32(count), int32(sg.Params.Offset)
There is a case where one might want all the results after skipping the first few (denoted by the Offset).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@akon-dey Actually no, I think that in this case worker/task.go needs pb.query.Offset to in fact be 0 in order to function correctly.

@all-seeing-code
Copy link
Contributor Author

This PR has a failing test, which needs to be properly understood. I am marking this as draft for now.

@all-seeing-code all-seeing-code marked this pull request as draft November 29, 2022 22:32
minhaj-shakeel and others added 3 commits December 1, 2022 17:31
Fixes DGRAPH-574.

This PR reduces the latency of the `has` function slightly. For example the given query:-
```
{
    me(func: has(actor.film), first: 10, offset: 300000){
       name@en
    }
}
```
when run on the `21million` dataset takes around `80` milliseconds to execute as compared to earlier around `93` milliseconds.

Machine Specs:-
```
Model Name:	MacBook Pro
  Model Identifier:	MacBookPro16,1
  Processor Name:	6-Core Intel Core i7
  Processor Speed:	2.6 GHz
  Number of Processors:	1
  Total Number of Cores:	6
  L2 Cache (per Core):	256 KB
  L3 Cache:	12 MB
  Memory:	16 GB
```

Note:-
This does not optimize for the `has` function on predicates with the `@lang` tag.
@coveralls
Copy link

coveralls commented Dec 1, 2022

Coverage Status

Coverage decreased (-0.007%) to 37.175% when pulling 5e6f44c on cherry-pick-7727 into a4729b3 on main.

@all-seeing-code
Copy link
Contributor Author

I don't see a significant performance improvement on my end.

main

Dgraph version   : local
Dgraph codename  : dgraph
Dgraph SHA-256   : 8525cf5541e0d9d63398304254cc9638d92a94b4e0dece17a1b0212d509aac81
Commit SHA-1     : a4729b3ca
Commit timestamp : 2022-11-30 17:57:07 -0300
Branch           : main
Go version       : go1.19.2
jemalloc enabled : true

Benchmark results:

goos: linux
goarch: amd64
pkg: github.com/dgraph-io/test
cpu: 12th Gen Intel(R) Core(TM) i7-12700H
BenchmarkQuery100_0-20               403           2635436 ns/op
BenchmarkQuery100_20-20              444           3041838 ns/op
BenchmarkQuery100_200-20             336           3470780 ns/op
BenchmarkQuery100_20000-20           134           7816592 ns/op
PASS
ok      github.com/dgraph-io/test       7.107s

cherry-pick-7727

Dgraph version   : local
Dgraph codename  : dgraph
Dgraph SHA-256   : 23a5ae17dfef9391a7131d1953595e30f5a919f5db61ba9635ac6a1bd7383432
Commit SHA-1     : 5e6f44c03
Commit timestamp : 2022-12-01 17:31:05 +0530
Branch           : cherry-pick-7727
Go version       : go1.19.2
jemalloc enabled : true

Benchmark results:

goos: linux
goarch: amd64
pkg: github.com/dgraph-io/test
cpu: 12th Gen Intel(R) Core(TM) i7-12700H
BenchmarkQuery100_0-20               391           3201947 ns/op
BenchmarkQuery100_20-20              363           3034455 ns/op
BenchmarkQuery100_200-20             481           2211461 ns/op
BenchmarkQuery100_20000-20           229           8433493 ns/op
PASS
ok      github.com/dgraph-io/test       6.790s

Here's the code I used to get the results. It expects a local dgraph running with ldbc-sf.03 dataset inserted.

@all-seeing-code all-seeing-code marked this pull request as ready for review December 1, 2022 17:12
@all-seeing-code
Copy link
Contributor Author

This PR has a failing test, which needs to be properly understood. I am marking this as draft for now.

This was fixed. It had a test which required a different set of changes. They are being brought in via: #8471

@MichelDiz
Copy link
Contributor

@anurags92 the description have some hidden comments. Please, remove it.

@all-seeing-code
Copy link
Contributor Author

all-seeing-code commented Dec 2, 2022

On a larger dataset (sf10).

main

alvis@alvis-alien-ubuntu:~/go/src/github.com/dgraph-io/test$ go test -run=XXX -bench=BenchmarkQuery
goos: linux
goarch: amd64
pkg: github.com/dgraph-io/test
cpu: 12th Gen Intel(R) Core(TM) i7-12700H
BenchmarkQuery100_0-20               502           2370298 ns/op
BenchmarkQuery100_20-20              696           2391215 ns/op
BenchmarkQuery100_200-20             552           2456952 ns/op
BenchmarkQuery100_20000-20           187           7336503 ns/op
PASS
ok      github.com/dgraph-io/test       8.886s

cherry-pick-7727

alvis@alvis-alien-ubuntu:~/go/src/github.com/dgraph-io/test$ go test -run=XXX -bench=BenchmarkQuery
goos: linux
goarch: amd64
pkg: github.com/dgraph-io/test
cpu: 12th Gen Intel(R) Core(TM) i7-12700H
BenchmarkQuery100_0-20               470           2335904 ns/op
BenchmarkQuery100_20-20              489           2371475 ns/op
BenchmarkQuery100_200-20             469           2380480 ns/op
BenchmarkQuery100_20000-20           166           6956870 ns/op
PASS
ok      github.com/dgraph-io/test       5.815s

Code to repro: https://go.dev/play/p/Mv6Y6YOG_BR

@all-seeing-code all-seeing-code merged commit eed8dde into main Dec 2, 2022
@all-seeing-code all-seeing-code deleted the cherry-pick-7727 branch December 2, 2022 15:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cherry-pick-22.0.2 Label for all cherry-picks for 22.0.2 Release
Development

Successfully merging this pull request may close these issues.

9 participants