Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[OPTIMIZATION] Various optimizations for facets filter queries #4923

Merged
merged 12 commits into from
Mar 17, 2020

Conversation

ashish-goswami
Copy link
Contributor

@ashish-goswami ashish-goswami commented Mar 12, 2020

This PR tries to optimize queries with facets filter. It has following changes:

  • Improving applyFacetsTree function by avoiding some multiple call to parseFuncTypeHelper and
    conversion calls to ftree.function.val
  • Minor optimizations in types.Convert
  • Minor optimizations in valToBytes

This change is Reviewable

Copy link
Contributor

@pawanrawal pawanrawal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

Reviewed 4 of 4 files at r1.
Reviewable status: all files reviewed, 5 unresolved discussions (waiting on @ashish-goswami and @manishrjain)


query/outputnode.go, line 145 at r1 (raw file):

		return []byte(fmt.Sprintf("%q", v.Value)), nil
	case types.IntID:
		switch v.Value.(type) {

You can store the value in val here and then use that.


query/outputnode.go, line 147 at r1 (raw file):

		switch v.Value.(type) {
		case int64: // In types.Convert(), we always convert to int64 in types.Convert().
			return []byte(strconv.Itoa(int(v.Value.(int64)))), nil

stconv.FormatInt can be used here


types/conversion.go, line 58 at r1 (raw file):

				*res = data
			case StringID, DefaultID:
				*res = *(*string)(unsafe.Pointer(&data))

Add a comment that we never modify the posting.Val in place, hence this should be safe to do.


worker/task.go, line 1906 at r1 (raw file):

			v, ok := ftree.function.typesToValMap[fVal.Tid]
			if !ok {
				// Not found in map and hence convert is here.

convert it here


worker/task.go, line 1998 at r1 (raw file):

	val           types.Val
	fnType        FuncType
	typesToValMap map[types.TypeID]types.Val

Add a comment here that we convert the value of the function arg to common types and store it here to allow fast comparison later.

@mangalaman93 mangalaman93 requested a review from pawanrawal March 13, 2020 11:57
Copy link
Member

@mangalaman93 mangalaman93 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: all files reviewed, 8 unresolved discussions (waiting on @ashish-goswami, @manishrjain, and @pawanrawal)


types/conversion.go, line 58 at r1 (raw file):

Previously, pawanrawal (Pawan Rawal) wrote…

Add a comment that we never modify the posting.Val in place, hence this should be safe to do.

How much performance does it really improve? Can we live without it? If the performance improvement is not significant, we should avoid using unsafe.


worker/task.go, line 1929 at r1 (raw file):

	}

	res := make([]bool, 0, 2)

Add a comment as to why size 2


worker/task.go, line 1998 at r1 (raw file):

Previously, pawanrawal (Pawan Rawal) wrote…

Add a comment here that we convert the value of the function arg to common types and store it here to allow fast comparison later.

We could use an array here instead? We could define a type at the end of the list of types to figure out the max possible value for types.TypeID and use that as the size of the array.

Could just call it typesToVal.


worker/task.go, line 2029 at r1 (raw file):

		case compareAttrFn:
			ftree.function.val = types.Val{Tid: types.StringID, Value: []byte(tree.Func.Args[0])}
			ftree.function.typesToValMap = make(map[types.TypeID]types.Val)

If we decide to use map, we could preallocate the size.


worker/task.go, line 2030 at r1 (raw file):

			ftree.function.val = types.Val{Tid: types.StringID, Value: []byte(tree.Func.Args[0])}
			ftree.function.typesToValMap = make(map[types.TypeID]types.Val)
			typeIDs := []types.TypeID{types.StringID, types.IntID, types.FloatID,

define an array instead using the syntax [...]{1, 2, 3} and could move it outside.

Copy link
Member

@mangalaman93 mangalaman93 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 4 of 4 files at r1.
Reviewable status: all files reviewed, 8 unresolved discussions (waiting on @ashish-goswami, @manishrjain, and @pawanrawal)

Copy link
Contributor Author

@ashish-goswami ashish-goswami left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 1 of 4 files reviewed, 8 unresolved discussions (waiting on @mangalaman93, @manishrjain, and @pawanrawal)


query/outputnode.go, line 145 at r1 (raw file):

Previously, pawanrawal (Pawan Rawal) wrote…

You can store the value in val here and then use that.

Done.


query/outputnode.go, line 147 at r1 (raw file):

Previously, pawanrawal (Pawan Rawal) wrote…

stconv.FormatInt can be used here

strconv.Itoa() calls it internally.


types/conversion.go, line 58 at r1 (raw file):

Previously, mangalaman93 (Aman Mangal) wrote…

How much performance does it really improve? Can we live without it? If the performance improvement is not significant, we should avoid using unsafe.

I also wanted to avoid this change. But we call types.Convert very frequently in our queries. Also most of the time conversion are to strings. Without this change profiles were showing close to 1 sec for this line. It was gone after this change.


worker/task.go, line 1906 at r1 (raw file):

Previously, pawanrawal (Pawan Rawal) wrote…

convert it here

Done.


worker/task.go, line 1929 at r1 (raw file):

Previously, mangalaman93 (Aman Mangal) wrote…

Add a comment as to why size 2

Done.


worker/task.go, line 2029 at r1 (raw file):

Previously, mangalaman93 (Aman Mangal) wrote…

If we decide to use map, we could preallocate the size.

Done.


worker/task.go, line 2030 at r1 (raw file):

Previously, mangalaman93 (Aman Mangal) wrote…

define an array instead using the syntax [...]{1, 2, 3} and could move it outside.

Done.

Copy link
Contributor

@manishrjain manishrjain left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm: I'd run this over a two group cluster, to ensure that we're not breaking any thing when doing tasks over the network. You could move the predicate to the other group, then hit the query to the first group, so it would have to transmit the query over -- that way, we can ensure that the results are sane.

Reviewed 1 of 4 files at r1, 3 of 3 files at r2.
Reviewable status: all files reviewed, 10 unresolved discussions (waiting on @ashish-goswami, @mangalaman93, and @pawanrawal)


query/outputnode.go, line 147 at r1 (raw file):

Previously, ashish-goswami (Ashish Goswami) wrote…

strconv.Itoa() calls it internally.

I'd say use FormatInt because it takes int64. So, you don't lose precision by converting it to int (on 32 bit machines). Also, might not need the two cases.

[]byte(Format(int64(num), 10)


query/outputnode.go, line 133 at r2 (raw file):

var (
	boolTrueBytes    = []byte("true")

Remove the suffix Bytes.


query/outputnode.go, line 149 at r2 (raw file):

			return []byte(strconv.Itoa(int(num))), nil
		default:
			return []byte(strconv.Itoa(num.(int))), nil

This can be dangerous. If it is uint32, then num.(int) might panic?

Copy link
Contributor Author

@ashish-goswami ashish-goswami left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ran following mutation on 6 alpha, 3 zero node cluster:

{
  set {
    _:1 <name> "alice" (nick=true) .
    _:1 <age> "27" (real=true) .
    
    _:2 <name> "bob" (nick=false) .
    _:2 <age> "28" (real=false) .
  }
}

Ensured that name belongs to group1 and age belongs to group2.

Ran below query:

{
  q(func: eq(name, "bob")) {
    name @facets(eq(nick,false))
    age @facets(eq(real,true))
  }
}

Also tried changing params values. Getting correct result every time.

Reviewable status: all files reviewed, 10 unresolved discussions (waiting on @mangalaman93, @manishrjain, and @pawanrawal)


query/outputnode.go, line 147 at r1 (raw file):

Previously, manishrjain (Manish R Jain) wrote…

I'd say use FormatInt because it takes int64. So, you don't lose precision by converting it to int (on 32 bit machines). Also, might not need the two cases.

[]byte(Format(int64(num), 10)

Done. Two cases are still required so that I get exact type inside a case. If I combine both like,

case int64, int:

I get interface type inside.


query/outputnode.go, line 133 at r2 (raw file):

Previously, manishrjain (Manish R Jain) wrote…

Remove the suffix Bytes.

Done.


query/outputnode.go, line 149 at r2 (raw file):

Previously, manishrjain (Manish R Jain) wrote…

This can be dangerous. If it is uint32, then num.(int) might panic?

Yes this panics I pass it uint32. Since int and int64 are most common types for IntID, FormatInt is only called for them. For rest we are still using fmt.Spintf.

@ashish-goswami ashish-goswami merged commit 4e3c141 into master Mar 17, 2020
@ashish-goswami ashish-goswami deleted the ashish/new-ff-optimization branch March 17, 2020 11:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

4 participants