Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix count with facets filter #4751

Merged
merged 11 commits into from
Feb 25, 2020
Merged

Fix count with facets filter #4751

merged 11 commits into from
Feb 25, 2020

Conversation

ashish-goswami
Copy link
Contributor

@ashish-goswami ashish-goswami commented Feb 10, 2020

Fixes: #4659

Currently count with facets filter is not returning correct result. This is because, while calculating count we use pl.Length() function, which returns length of posting list. This will return correct result
if there are no facets filter. But if there are facets filter, we need to filter posting from the posting list which satisfy facets filter.

This PR fixes above issue.


This change is Reviewable

Copy link
Contributor

@pawanrawal pawanrawal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

Reviewed 2 of 2 files at r1.
Reviewable status: all files reviewed, 4 unresolved discussions (waiting on @ashish-goswami and @manishrjain)


query/query_facets_test.go, line 2219 at r1 (raw file):

			{
			  "name": "Michonne",
			  "count(friend)": 2

Also fetch cf: count(friend)


query/query_facets_test.go, line 2293 at r1 (raw file):

			name
			alt_name
			count(alt_name) @facets(eq(origin, "french"))

Also retrieve count(alt_name) without filter.


worker/task.go, line 506 at r1 (raw file):

}

func retrieveValuesAndFacetsOrCount(args funcArgs, pl *posting.List, facetsTree *facetsTree,

Add some comments about in which scenarios the function only returns count and in which scenarios all of count, fcs and vals.


worker/task.go, line 596 at r1 (raw file):

}

func (qs *queryState) facetsFilteredUidOrCount(q *pb.Query, pl *posting.List,

This should have a similar structure to the function above, you can return list, fcs and count from here as well.

Copy link
Contributor

@martinmr martinmr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think the way this code is structured is the best way to go about this. I think it's preferable two have different functions for the facet filter and the count (even if some code ends up being repeated) than to have a function that sometimes returns some value and sometimes some other value of a different type. From what I gather about the code, queries will either want the count or the filter so we are not duplicating the work if there's separate methods.

Reviewable status: all files reviewed, 8 unresolved discussions (waiting on @ashish-goswami and @manishrjain)


worker/task.go, line 394 at r1 (raw file):

				return err
			}
			// If count is getting asked we don't need to populate value and facets matrix.

If count is being requested, there's no need to populate value and facets matrix


worker/task.go, line 402 at r1 (raw file):

			}

			// Means list was empty or got nothing in filtering.

Reaching here means ...


worker/task.go, line 506 at r1 (raw file):

}

func retrieveValuesAndFacetsOrCount(args funcArgs, pl *posting.List, facetsTree *facetsTree,

I am not sure if this is the right way to do structure the code. Ideally, there's a function to retrieve values and facets and another one for the count and the caller decides which one to use.

Otherwise it's hard to understand what this function is doing.


worker/task.go, line 596 at r1 (raw file):

}

func (qs *queryState) facetsFilteredUidOrCount(q *pb.Query, pl *posting.List,

Same issue here. Now this function is doing two different things.

Copy link
Contributor Author

@ashish-goswami ashish-goswami left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 0 of 2 files reviewed, 8 unresolved discussions (waiting on @manishrjain, @martinmr, and @pawanrawal)


query/query_facets_test.go, line 2219 at r1 (raw file):

Previously, pawanrawal (Pawan Rawal) wrote…

Also fetch cf: count(friend)

Done.


query/query_facets_test.go, line 2293 at r1 (raw file):

Previously, pawanrawal (Pawan Rawal) wrote…

Also retrieve count(alt_name) without filter.

Done.


worker/task.go, line 394 at r1 (raw file):

Previously, martinmr (Martin Martinez Rivera) wrote…

If count is being requested, there's no need to populate value and facets matrix

Done.


worker/task.go, line 402 at r1 (raw file):

Previously, martinmr (Martin Martinez Rivera) wrote…

Reaching here means ...

Done.


worker/task.go, line 506 at r1 (raw file):

Previously, martinmr (Martin Martinez Rivera) wrote…

I am not sure if this is the right way to do structure the code. Ideally, there's a function to retrieve values and facets and another one for the count and the caller decides which one to use.

Otherwise it's hard to understand what this function is doing.

Had a chat with @pawanrawal as well. Now I have two functions one calculating count and other values/facets.


worker/task.go, line 506 at r1 (raw file):

Previously, pawanrawal (Pawan Rawal) wrote…

Add some comments about in which scenarios the function only returns count and in which scenarios all of count, fcs and vals.

Now have different functions for count and facets. Each is self explanatory.


worker/task.go, line 596 at r1 (raw file):

Previously, pawanrawal (Pawan Rawal) wrote…

This should have a similar structure to the function above, you can return list, fcs and count from here as well.

All functions have same structure now.


worker/task.go, line 596 at r1 (raw file):

Previously, martinmr (Martin Martinez Rivera) wrote…

Same issue here. Now this function is doing two different things.

Again here, have two functions here, one for count and other for uids/facets.

Copy link
Contributor

@pawanrawal pawanrawal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

Reviewed 2 of 2 files at r2.
Reviewable status: all files reviewed, 5 unresolved discussions (waiting on @ashish-goswami, @manishrjain, and @martinmr)


worker/task.go, line 633 at r2 (raw file):

}

func retrieveCountForUidPostings(args funcArgs, pl *posting.List, facetsTree *facetsTree,

This function has some unnecessary parameters, like out.

Copy link
Contributor

@martinmr martinmr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

Reviewed 2 of 2 files at r2.
Reviewable status: all files reviewed, 1 unresolved discussion (waiting on @ashish-goswami and @manishrjain)

Copy link
Contributor Author

@ashish-goswami ashish-goswami left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 1 of 2 files reviewed, 1 unresolved discussion (waiting on @manishrjain, @martinmr, and @pawanrawal)


worker/task.go, line 633 at r2 (raw file):

Previously, pawanrawal (Pawan Rawal) wrote…

This function has some unnecessary parameters, like out.

Done.

Copy link
Contributor

@manishrjain manishrjain left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 1 of 2 files reviewed, 2 unresolved discussions (waiting on @ashish-goswami, @martinmr, and @pawanrawal)


worker/task.go, line 675 at r3 (raw file):

	}

	err := pl.Postings(opts, func(p *pb.Posting) error {

Find a way to consolidate all of this into one function. But, then you can pass a func to run on l.Iterate. So, there could be many funcs, creating this internal func, and passing it to a common func which has the logic.

Copy link
Contributor

@pawanrawal pawanrawal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 0 of 2 files reviewed, 5 unresolved discussions (waiting on @ashish-goswami, @martinmr, and @pawanrawal)


worker/task.go, line 510 at r4 (raw file):

}

func facestFilterValuesPostingList(args funcArgs, pl *posting.List, facetsTree *facetsTree,

facets


worker/task.go, line 520 at r4 (raw file):

	// 2. Attr type is list and len(q.Langs)>0. Just checking len(q.Langs)>0 is not enough, because
	// in case of "*", len(q.Langs)>0 && q.ExpandAll is true.
	if (listType && len(q.Langs) > 0) || !listType || !q.ExpandAll {

if len(q.Langs) > 0 && !q.ExpandAll {

}


worker/task.go, line 550 at r4 (raw file):

			fn(p)
		}

pickAll := q.ExpandAll || (q.Langs == 0 && listType)

// continue iteration
if pickAll {
return nil
}

// We already got the posting that we need.
return posting.ErrStopIteration

Copy link
Contributor Author

@ashish-goswami ashish-goswami left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 0 of 2 files reviewed, 5 unresolved discussions (waiting on @ashish-goswami, @martinmr, and @pawanrawal)


worker/task.go, line 510 at r4 (raw file):

Previously, pawanrawal (Pawan Rawal) wrote…

facets

Done.


worker/task.go, line 550 at r4 (raw file):

Previously, pawanrawal (Pawan Rawal) wrote…

pickAll := q.ExpandAll || (q.Langs == 0 && listType)

// continue iteration
if pickAll {
return nil
}

// We already got the posting that we need.
return posting.ErrStopIteration

Done.

Copy link
Contributor

@manishrjain manishrjain left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

Reviewed 1 of 2 files at r4, 1 of 1 files at r6.
Reviewable status: all files reviewed, 5 unresolved discussions (waiting on @ashish-goswami and @pawanrawal)

Copy link
Contributor

@pawanrawal pawanrawal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 1 of 2 files at r4, 1 of 1 files at r6.
Reviewable status: all files reviewed, 5 unresolved discussions (waiting on @ashish-goswami and @pawanrawal)

@ashish-goswami ashish-goswami merged commit 333d6e7 into master Feb 25, 2020
@ashish-goswami ashish-goswami deleted the ashish/count-facetsfilter branch February 25, 2020 17:40
vardhanapoorv pushed a commit that referenced this pull request Mar 3, 2020
Fixes: #4659

Currently count with facets filter is not returning correct result. This is because, while calculating count we use pl.Length() function, which returns length of posting list. This will return correct result if there are no facets filter. But if there are facets filter, we need to filter postings from the
posting list which satisfy facets filter.

This PR fixes above issue.
@mileung
Copy link

mileung commented Mar 8, 2020

I still get this issue on v1.2.1 and I also get this issue when I docker pull dgraph/dgraph:master. How do I get this to work? Screen Shot 2020-03-08 at 12 46 40 AM

@ashish-goswami
Copy link
Contributor Author

Hey @mileung, can you provide us the sample data on which it is returning incorrect result?

@mileung
Copy link

mileung commented Mar 9, 2020

How do I do that? Can I use dgo?

@ashish-goswami
Copy link
Contributor Author

ashish-goswami commented Mar 9, 2020

Yes. You can use dgo to run mutation with data and run above query to see count.
You can also use ratel to run mutation and query. @mileung

@mileung
Copy link

mileung commented Mar 9, 2020

@ashish-goswami I thought you wanted me to send a exported db, but here's a simple query and response

{
  q(func: uid(0x1d7fa)) {
    totalUnreadPings: count(pings (orderdesc: date) @facets(eq(unread, true)) @normalize)
    pings (orderdesc: date) @facets(unread) @normalize {
      text: text
      author {
        username: username
      }
    }
  }
}
"data": {
    "q": [
      {
        "totalUnreadPings": 2,
        "pings": [
          {
            "text": "sdf",
            "username": "7BA5414DAEEE170180ADE1E27A1BA8A9"
          },
          {
            "text": "sdf",
            "username": "7BA5414DAEEE170180ADE1E27A1BA8A9"
          }
        ],
        "pings|unread": {
          "0": true
        }
      }
    ]
  }

@ashish-goswami
Copy link
Contributor Author

Hey @mileung, please help me reproduce this issue. I ran following mutation:

{
  set {
    _:1 <text> "sdf" .
    _:1 <date> "2019-01-01" .
    _:1 <author> _:3 .

    _:2 <text> "sdf" .
    _:2 <date> "2020-01-01" .
    _:2 <author> _:4 .

    _:3 <username> "7BA5414DAEEE170180ADE1E27A1BA8A9" .
    _:4 <username> "7BA5414DAEEE170180ADE1E27A1BA8A9" .

    _:5 <pings> _:1 .
    _:5 <pings> _:2 (unread=true) .
  }
}

and got the expected response.

{
  "data": {
    "q": [
      {
        "totalUnreadPings": 1,
        "pings": [
          {
            "date": "2020-01-01",
            "text": "sdf",
            "username": "7BA5414DAEEE170180ADE1E27A1BA8A9"
          },
          {
            "date": "2019-01-01",
            "text": "sdf",
            "username": "7BA5414DAEEE170180ADE1E27A1BA8A9"
          }
        ],
        "pings|unread": {
          "0": true
        }
      }
    ]
  },

@mileung
Copy link

mileung commented Mar 12, 2020

I still have the issue. try making the mutation with JSON notation?

@ashish-goswami
Copy link
Contributor Author

Hi @mileung, I tried json mutation:

{
    "set": {
        "uid": "_:5",
        "pings": [
            {
                "uid": "_:1",
                "text": "sdf",
                "date": "2019-01-01",
                "author": {
                    "uid": "_:3",
                    "username": "7BA5414DAEEE170180ADE1E27A1BA8A9"
                }
            },
            {
                "uid": "_:2",
                "text": "sdf",
                "date": "2020-01-01",
                "author": {
                    "uid": "_:4",
                    "username": "7BA5414DAEEE170180ADE1E27A1BA8A9"
                },
                "pings|unread": "true"
            }
        ]
    }
}

Still getting the correct response.

@mileung
Copy link

mileung commented Mar 17, 2020

I'm stumped then
Screen Shot 2020-03-17 at 11 14 14 AM

@ashish-goswami
Copy link
Contributor Author

Hey @mileung, can you share the raw rdf/json data inserted in DB?

@mileung
Copy link

mileung commented Mar 19, 2020

{
  "set": {
    "uid": "0x20120",
    "pings": {
      "uid": "0x20150",
      "pings|unread": "true"
    }
  }
}

For me, setting pings|unread to true or false doesn't matter - it will still be counted in the totalUnreadPings query

@ashish-goswami
Copy link
Contributor Author

Hi @mileung, I tried following mutation and query as suggested by you:
Mutation:

{
  "set": {
    "uid": "_:1",
    "pings": {
      "uid": "_:2",
      "pings|unread": "true"
    }
  }
}

Query:

{
  q(func: uid(0x1)) {
    totalUnreadPings: count(pings @facets(eq(unread, false)) @normalize)
    pings @facets(unread) @normalize {
      uid
    }
  }
}

Got the following result:

{
  "data": {
    "q": [
      {
        "totalUnreadPings": 0,
        "pings": [
          {
            "uid": "0x2"
          }
        ],
        "pings|unread": "true"
      }
    ]
  }

I think, this is the expected result.

@mileung
Copy link

mileung commented Mar 20, 2020

My bad, the unread facet should be set to true, but either way, I am still getting the wrong result. The commented out query in the image below yields the correct result, but is a bit more complicated.

Screen Shot 2020-03-20 at 11 06 40 AM

On http://localhost:8080/health, I get {"version":"v1.2.1","instance":"alpha","uptime":592040}. We are running the same version right?

@MichelDiz
Copy link
Contributor

MichelDiz commented Mar 22, 2020

hey guys,

There is confusion here. This PR is not in version v1.2.1 @mileung . The only version this PR is in is version v20.03.0-beta.20200320.

If you read the changelog you will see that there is no such commit there. And if you analyze the dates, this PR joined the master 25 days ago. And version v1.2.1 was built +24 days (February 6th, 46 days ago). At most this would be in version v1.2.2 which was 2 days ago. But it's not there.

If you need this fix, I suggest you wait for a stable version or use the recently released beta. Also, you could try to ask a cherry-pick for v1.x.x. I'm not sure if we gonna have more releases for v1.x.x. Need to check it.

The second point to note, the use of facet with count is wrong.
It is not like this:
totalUnreadPings: count(pings (orderdesc: date) @facets(eq(unread, true)) @normalize)

The correct form is like this:

totalUnreadPings: count(pings) @facets(eq(unread, true)) @normalize

The facet and filters and any other directive must be outside the parenthesis of the count. And there's no need to use (orderdesc: date). As it just does a count.

Cheers.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

Bug with count() and facets
6 participants