Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for first and offset directive in has function. #3970

Merged
merged 12 commits into from
Sep 24, 2019

Conversation

poonai
Copy link
Contributor

@poonai poonai commented Sep 12, 2019

close #3964
Implementation:
first is passed as a parameter in worker query and the iteration is stopped after we get the required
result.

TODO:

  • support for offset
  • test case

Signed-off-by: பாலாஜி ஜின்னா [email protected]


This change is Reviewable

Signed-off-by: பாலாஜி ஜின்னா <[email protected]>
Copy link

@pullrequest pullrequest bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ A review job has been created and sent to the PullRequest network.


@balajijinnah you can click here to see the review status or cancel the code review job.

@manishrjain
Copy link
Contributor

Super excited about this!

Copy link

@pullrequest pullrequest bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've modified very minor typos. Please see the inline comments.


Reviewed with ❤️ by PullRequest

x/list.go Outdated
"github.com/dgraph-io/dgraph/protos/pb"
)

// IsListFull check wheather the posting list is full or not based on the limit.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

// IsListFull check whether the posting List is full based on the limit

protos/pb.proto Outdated
@@ -70,6 +70,7 @@ message Query {

uint64 read_ts = 13;
int32 cache = 14;
int32 count = 15; // used to limit the number of result. Typically, the count is value of first
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First what? Perhaps rewrite the comment to say

// limit result count. Typically equal to value of first

Signed-off-by: பாலாஜி ஜின்னா <[email protected]>
@poonai poonai requested a review from pawanrawal September 13, 2019 10:27
@poonai poonai marked this pull request as ready for review September 13, 2019 10:28
@poonai poonai requested a review from a team as a code owner September 13, 2019 10:28
Signed-off-by: பாலாஜி ஜின்னா <[email protected]>
Copy link
Contributor

@pawanrawal pawanrawal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got some comments. Can you please make sure that we already have test cases in query package covering the common use cases for has function?

  1. First
  2. Offset
  3. Offset + first
  4. First + offset + filter
  5. First + offset + order
  6. First + offset + filter + order

Reviewed 4 of 5 files at r1, 1 of 1 files at r2.
Reviewable status: all files reviewed, 10 unresolved discussions (waiting on @balajijinnah, @manishrjain, and @martinmr)


protos/pb.proto, line 73 at r2 (raw file):

	uint64 read_ts = 13;
	int32 cache = 14;
	int32 count = 15; // used to limit the number of result. Typically, the count is value of first

Please also say its only being used for has query right now.


query/query.go, line 885 at r2 (raw file):

	// {
	//   q(func: has(name), first:1) {
	//     count(name)

So for count(name) SubGraph DoCount would be true but that is different from the has SubGraph. In this case DoCount won't be true for the root subgraph, infact is never is true at root. It is only true when you are requesting for count of a predicate.

Look into this case as well.

{
  q(func: has(name), first: 1) {
    count(uid)
  }
}

In this case uidCount would be true for subgraph at root so you should exclude that case.


query/query.go, line 896 at r2 (raw file):

	// - No facet ordering
	// {
	//   q(func: has(mobile), first:1) {

We can't have facet ordering at root and all the work that we are doing here is for the root has function, so is this case relevant?


query/query.go, line 900 at r2 (raw file):

	//   }
	// }
	// - subgraph should not be a filter, which has connective operations like (and, or , not)

Again this case would be discarded when you check for SrcFunc.Name to be `has.


query/query.go, line 912 at r2 (raw file):

	// }
	// here one case is missing how to differentiate that this not the subgraph of
	// allofterms(name@en, "jurassic park") filter. This will fail the connective filter.

If its SubGraph of allofterms filter, then SrcFunc.Name won't be has. Which case are you talking about?


query/query.go, line 918 at r2 (raw file):

		return sg.SrcFunc != nil && sg.SrcFunc.Name == "has"
	}
	if len(sg.Filters) == 0 && len(sg.Params.Order) == 0 && !sg.Params.DoCount &&

Please check that this should work with after argument as well.


query/query.go, line 919 at r2 (raw file):

	}
	if len(sg.Filters) == 0 && len(sg.Params.Order) == 0 && !sg.Params.DoCount &&
		isSupportedFunction() && len(sg.Params.FacetOrder) == 0 && len(sg.FilterOp) == 0 {

Do we need to check FilterOp here?


query/query.go, line 920 at r2 (raw file):

	if len(sg.Filters) == 0 && len(sg.Params.Order) == 0 && !sg.Params.DoCount &&
		isSupportedFunction() && len(sg.Params.FacetOrder) == 0 && len(sg.FilterOp) == 0 {
		// Offset also added because, we need n results to trim the offset.

Would be nice to do some benchmarks with large values of offsets. At some point, we might want to pass this in as well to the worker and only return the relevant ids.

பாலாஜி ஜின்னா added 3 commits September 16, 2019 03:41
Signed-off-by: பாலாஜி ஜின்னா <[email protected]>
Signed-off-by: பாலாஜி ஜின்னா <[email protected]>
Signed-off-by: பாலாஜி ஜின்னா <[email protected]>
Copy link
Contributor Author

@poonai poonai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 2 of 6 files reviewed, 10 unresolved discussions (waiting on @balajijinnah, @manishrjain, @martinmr, @pawanrawal, and @pullrequest[bot])


query/query.go, line 896 at r2 (raw file):

Previously, pawanrawal (Pawan Rawal) wrote…

We can't have facet ordering at root and all the work that we are doing here is for the root has function, so is this case relevant?

Removed it


query/query.go, line 912 at r2 (raw file):

Previously, pawanrawal (Pawan Rawal) wrote…

If its SubGraph of allofterms filter, then SrcFunc.Name won't be has. Which case are you talking about?

Didn't know that we don't support first and offset in filter. I wrote this case by having has in the mind :(


query/query.go, line 918 at r2 (raw file):

Previously, pawanrawal (Pawan Rawal) wrote…

Please check that this should work with after argument as well.

AfterUID is passed as part for taskQuery. and we starting iteration from the uid onward. So, it should be fine. It won't affect the limit of the result.


query/query.go, line 919 at r2 (raw file):

Previously, pawanrawal (Pawan Rawal) wrote…

Do we need to check FilterOp here?

Didn't know that we don't support first in the filter


x/list.go, line 23 at r1 (raw file):

Previously, pullrequest[bot] wrote…

// IsListFull check whether the posting List is full based on the limit

Done.

Copy link
Contributor

@pawanrawal pawanrawal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

I have one comment about the result of one of the tests. Let me know what do you find out about it. One more thing I'd recommend is to run all the tests by commenting out the new code and verify that you get some results as you do now. This would make sure results are identical using the new and old way.

Reviewed 1 of 4 files at r3.
Reviewable status: 3 of 6 files reviewed, 7 unresolved discussions (waiting on @balajijinnah, @manishrjain, @martinmr, @pawanrawal, and @pullrequest[bot])


query/query4_test.go, line 358 at r3 (raw file):

			"q": [
			  {
				"name": ""

This is a bit strange that empty name is the first one that we get here. Can you check why this happens since "" < "Badger"

@poonai
Copy link
Contributor Author

poonai commented Sep 17, 2019

@pawanrawal results are taken from master branch and cross verified.

Regarding the sorting, It coming in the wrong order in master as well.

I'll raise a separate issue for that

Copy link
Contributor

@manishrjain manishrjain left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 3 of 6 files reviewed, 13 unresolved discussions (waiting on @balajijinnah, @martinmr, @pawanrawal, and @pullrequest[bot])


query/query.go, line 886 at r4 (raw file):

		FacetsFilter: sg.facetsFilter,
		ExpandAll:    sg.Params.ExpandAll,
		Count:        int32(count),

Call it First to avoid confusion with docount. You are getting the first N results.


query/query.go, line 896 at r4 (raw file):

// calculateCountResult returns the count of result we need to proceed query further down.
func calculateCountResult(sg *SubGraph) int {

CalculateFirstN


query/query.go, line 897 at r4 (raw file):

// calculateCountResult returns the count of result we need to proceed query further down.
func calculateCountResult(sg *SubGraph) int {
	// by default count is zero. (zero will retrive all the results)

Should be maxint. Not zero by default.


query/query.go, line 919 at r4 (raw file):

	//   }
	// }
	isSupportedFunction := func() bool {

Why is this a function ? Can be a bool.


worker/task.go, line 2033 at r4 (raw file):

			result.Uids = append(result.Uids, pk.Uid)
			// We'll stop fetching if we fetch the required count.
			if x.IsListFull(result, q.Count) {

no Need for a func. Len >= q.Count should be sufficient.


x/list.go, line 25 at r4 (raw file):

// IsListFull check whether the posting List is full based on the limit
// by default it return false, if the limit is set to zero.
func IsListFull(l *pb.List, limit int32) bool {

Don’t need this function . Remove.

பாலாஜி ஜின்னா added 2 commits September 18, 2019 07:00
Signed-off-by: பாலாஜி ஜின்னா <[email protected]>
Signed-off-by: பாலாஜி ஜின்னா <[email protected]>
Copy link
Contributor Author

@poonai poonai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 1 of 5 files reviewed, 13 unresolved discussions (waiting on @balajijinnah, @manishrjain, @martinmr, @pawanrawal, and @pullrequest[bot])


protos/pb.proto, line 73 at r1 (raw file):

Previously, pullrequest[bot] wrote…

First what? Perhaps rewrite the comment to say

// limit result count. Typically equal to value of first

Done.


protos/pb.proto, line 73 at r2 (raw file):

Previously, pawanrawal (Pawan Rawal) wrote…

Please also say its only being used for has query right now.

Done.


query/query.go, line 885 at r2 (raw file):

Previously, pawanrawal (Pawan Rawal) wrote…

So for count(name) SubGraph DoCount would be true but that is different from the has SubGraph. In this case DoCount won't be true for the root subgraph, infact is never is true at root. It is only true when you are requesting for count of a predicate.

Look into this case as well.

{
  q(func: has(name), first: 1) {
    count(uid)
  }
}

In this case uidCount would be true for subgraph at root so you should exclude that case.

Done.


query/query.go, line 900 at r2 (raw file):

Previously, pawanrawal (Pawan Rawal) wrote…

Again this case would be discarded when you check for SrcFunc.Name to be `has.

Done.


query/query.go, line 886 at r4 (raw file):

Previously, manishrjain (Manish R Jain) wrote…

Call it First to avoid confusion with docount. You are getting the first N results.

Done.


query/query.go, line 896 at r4 (raw file):

Previously, manishrjain (Manish R Jain) wrote…

CalculateFirstN

Done.


query/query.go, line 897 at r4 (raw file):

Previously, manishrjain (Manish R Jain) wrote…

Should be maxint. Not zero by default.

Done.


query/query.go, line 919 at r4 (raw file):

Previously, manishrjain (Manish R Jain) wrote…

Why is this a function ? Can be a bool.

Done.


worker/task.go, line 2033 at r4 (raw file):

Previously, manishrjain (Manish R Jain) wrote…

no Need for a func. Len >= q.Count should be sufficient.

Done.


x/list.go, line 25 at r4 (raw file):

Previously, manishrjain (Manish R Jain) wrote…

Don’t need this function . Remove.

Done.

Copy link
Contributor

@manishrjain manishrjain left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 1 of 5 files reviewed, 13 unresolved discussions (waiting on @balajijinnah, @manishrjain, @martinmr, @pawanrawal, and @pullrequest[bot])


query/query.go, line 896 at r5 (raw file):

// calculateFirstN returns the count of result we need to proceed query further down.
func calculateFirstN(sg *SubGraph) int {

This can just return int32.


query/query.go, line 898 at r5 (raw file):

func calculateFirstN(sg *SubGraph) int {
	// by default count is zero. (zero will retrive all the results)
	count := math.MaxInt8

MaxInt8 would be 128. Surely, that's not what you want.

Signed-off-by: பாலாஜி ஜின்னா <[email protected]>
Copy link
Contributor Author

@poonai poonai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 1 of 5 files reviewed, 13 unresolved discussions (waiting on @balajijinnah, @manishrjain, @martinmr, @pawanrawal, and @pullrequest[bot])


query/query.go, line 896 at r5 (raw file):

Previously, manishrjain (Manish R Jain) wrote…

This can just return int32.

Done.


query/query.go, line 898 at r5 (raw file):

Previously, manishrjain (Manish R Jain) wrote…

MaxInt8 would be 128. Surely, that's not what you want.

Done.

…aji/task_offset

Signed-off-by: பாலாஜி ஜின்னா <[email protected]>
Copy link
Contributor

@manishrjain manishrjain left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

Reviewed 3 of 5 files at r5, 3 of 3 files at r6.
Reviewable status: all files reviewed, 8 unresolved discussions (waiting on @balajijinnah, @manishrjain, @martinmr, @pawanrawal, and @pullrequest[bot])

Signed-off-by: பாலாஜி ஜின்னா <[email protected]>
@mangalaman93 mangalaman93 merged commit 0f9b5ba into master Sep 24, 2019
@mangalaman93 mangalaman93 deleted the balaji/task_offset branch September 24, 2019 09:16
@poonai poonai restored the balaji/task_offset branch September 24, 2019 09:40
@poonai poonai deleted the balaji/task_offset branch September 24, 2019 09:40
danielmai pushed a commit that referenced this pull request Nov 11, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

Has function is painfully slow
4 participants