Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support fetching facets from value edge lists #4081

Closed
robsws opened this issue Sep 27, 2019 · 15 comments · Fixed by #4267
Closed

Support fetching facets from value edge lists #4081

robsws opened this issue Sep 27, 2019 · 15 comments · Fixed by #4267
Labels
area/facets Issues related to face handling, querying, etc. area/querylang Issues related to the query language specification and implementation. kind/enhancement Something could be better. priority/P1 Serious issue that requires eventual attention (can wait a bit) status/accepted We accept to investigate/work on it.
Milestone

Comments

@robsws
Copy link

robsws commented Sep 27, 2019

It seems that currently, if facets are added on value edge lists (with type [string] for example), it is impossible to retrieve any facets added on edges within the list. Here is a toy example where I have two names recorded for Alice, each with it's own since facet associated:

{
  set {
    _:alice <name> "Alice Smith" (since=1990-01-01) .
    _:alice <name> "Alice Baker" (since=2019-01-01) .
  }
}

The response for this mutation shows a success, so one can assume that the facets were added successfully:

{
  "data": {
    "code": "Success",
    "message": "Done",
    "uids": {
      "alice": "0x1"
    }
  }
}

If you try and query the facet data, only the values of the edges are returned and not their facets:

{
  query(func: has(name)) {
    name @facets
  }
}

Response:

{
  "data": {
    "query": [
      {
        "name": [
          "Alice Smith",
          "Alice Baker"
        ]
      }
    ]
  }
}

Would it be possible to implement the ability to fetch these facets?

Secondly, it would be good to combine this with issue #4034 and allow filtering on these facets as well.

@campoy
Copy link
Contributor

campoy commented Sep 27, 2019

Yes, this is indeed the part that I mentioned would not be necessarily fixed as part of #4034.

The hard part here is basically agreeing on how it should be formatted.

One option would be:

{
  "data": {
    "query": [
      {
        "name": [
          "Alice Smith",
          "Alice Baker"
        ],
        "name|since": [
          "1990-01-01",
          "2019-01-01",
        ]
      }
    ]
  }
}

This approach breaks if some of the edges are missing that facet.

Another option:

{
  "data": {
    "query": [
      {
        "name": [
          {
            "@value": "Alice Smith",
            "since": "1990-01-01"
          },
          {
            "@value": "Alice Baker",
            "since": "2019-01-01"
          }
        ]
      }
    ]
  }
}

This one has the slight issue that when you ask for facets, even if none are returned, strings are reified into objects. Also, we'll need to find a field name for @value that minimizes collisions.

Do you have any opinions on this, @robsws ?

@campoy campoy added area/facets Issues related to face handling, querying, etc. priority/P2 Somehow important but would not block a release. area/querylang Issues related to the query language specification and implementation. kind/enhancement Something could be better. status/accepted We accept to investigate/work on it. labels Sep 27, 2019
@campoy campoy added this to the Dgraph v1.2 milestone Sep 27, 2019
@robsws
Copy link
Author

robsws commented Sep 30, 2019

For our particular use case, I think either approach would work. In my own opinion, the first approach is probably more in line with the current facet response format and I think you could solve the problem of facets missing by adding nulls to the array where the facets don't exist.

@manishrjain
Copy link
Contributor

We could potentially encode the facets in the name itself.

“Name”: “Alice Smith (since = yymmdd, weight = 0.3)”

Might get tricky if the facet value is also a string.

Another approach is we allow data to be returned in RDF format. That would make this trivial.

@campoy campoy added priority/P1 Serious issue that requires eventual attention (can wait a bit) and removed priority/P2 Somehow important but would not block a release. labels Oct 16, 2019
@campoy
Copy link
Contributor

campoy commented Oct 16, 2019

We've decided to fix this issue by unifying the way we support facets in Dgraph both for value and uid predicates.

In the proposed solution, we will handle facets the same way no matter the type of predicate.

Facets are returned in a different field with a name composed of predicate name, followed by pipe character (|), and

Facets on a singular value predicate

This behavior remains unchanged with v1.1.0.
A new field named "predicate_name|facet_name" is added as a sibling of the requested predicate containing the associated value (if any).

Schema:

<name>: string .

Data:

<0x1> <name> "Francesc" (kind="first") .

Query:

{
   q(func: has(name)) {
      name @facets
   }
}

Result:

{
  "data": {
    "q": [
      {
        "name|kind": "first",
        "name": "Francesc"
      }
    ]
  }
}

Facets on lists of value predicates

This behavior was not supported in v1.1.0, follows the same idea as the one above.
A new field named "predicate_name|facet_name" is added as a sibling of the requested predicate containing the associated values.
To avoid losing what facet values correspond to what predicate, we could use null for missing values so each position in the predicate field and the facets are all related.
Instead of doing this, we decided to provide a map from the index of the corresponding predicate value to the corresponding facet value.

Schema:

<nickname>: [string] .

Data:

<0x1> <nickname> "Francesc" (kind="official") .
<0x1> <nickname> "Tete" .
<0x1> <nickname> "Cesc" (kind="friends") .

Query:

{
   q(func: has(nickname)) {
      nickname @facets
   }
}

Result:

{
  "data": {
    "q": [
      {
        "nickname": [
          "Cesc",
          "Francesc",
          "Tete"
        ],
        "nickname|kind": {
          "0": "friends",
          "1": "official"
        }
      }
    ]
  }
}

This solution makes it so parsing nickname is exactly the same no matter whether one is asking for facets or not.

Facets on a singular object predicate

This behavior will change to become more similar to the way we handle value predicates.

Schema:

<name>: string .
<state>: uid .

Data:

<0x1> <name> "San Francisco" .
<0x1> <state> <0x2> (capital=false) .
<0x2> <name> "California" .

Query:

{
   q(func: has(state)) {
      name
      state @facets {
         name
      }
   }
}

The current behavior moves the value of the facet into the object pointed by the predicate.

{
  "data": {
    "q": [
      {
        "name": "San Francisco",
        "state": [
          {
            "name": "California",
            "state|capital": false
          }
        ]
      }
    ]
  }
}

Instead, we propose to keep the facet value at the same level as the predicate as we do for value predicates.

{
  "data": {
    "q": [
      {
        "name": "San Francisco",
        "state": [
          {
            "name": "California"
          },
          "state|capital": false
        ]
      }
    ]
  }
}

Facets on a list of object predicates

This behavior will also change to become more similar to the way we handle value predicates.

Schema:

<name>: string .
<speaks>: [uid] .

Data:

<0x1> <name> "Francesc" .
<0x1> <speaks> <0x2> (fluent=true) .
<0x1> <speaks> <0x3> (fluent=false) .
<0x2> <name> "Spanish" .
<0x3> <name> "Chinese" .

Query:

{
  q(func: uid(0x1)) {
    name
    speaks @facets {
      name
    }
  }
}

The current behavior moves the value of the facet into the object pointed by the predicate.

{
  "data": {
    "q": [
      {
        "name": "Francesc",
        "speaks": [
          {
            "name": "Spanish",
            "speaks|fluent": true
          },
          {
            "name": "Chinese",
            "speaks|fluent": false
          }
        ]
      }
    ]
  }
}

Instead, we propose to keep the facet value at the same level as the predicate as we do for value predicates.

{
  "data": {
    "q": [
      {
        "name": "Francesc",
        "speaks": [
          {
            "name": "Spanish"
          },
          {
            "name": "Chinese"
          }
        ],
        "speaks|fluent": {
           "0": true,
           "1": false
        }
      }
    ]
  }
}

@MichelDiz
Copy link
Contributor

MichelDiz commented Oct 16, 2019

The result you’ve shared feels not ideal. The mutation you have a “order” (the kind is linked to the value)

<0x1> <nickname> "Francesc" (kind="official") .
<0x1> <nickname> "Tete" .
<0x1> <nickname> "Cesc" (kind="friends") .

but the response isn’t ordered.
I would expect the response order to be in sync as we intend to be using indexed positioning:

{
  "data": {
    "q": [
      {
        "nickname": [
          "Cesc",
          "Francesc",
          "Tete"
        ],
        "nickname|kind": {
          "0": "friends",
          "1": "official",
          "2": "null"
        }
      }
    ]
  }
}

We need a way to know “who owns the facet”. Because the answer needs to be predictable.

We could also apply index positioning to the values of the list.

{
  "data": {
    "q": [
      {
        "nickname": {
          "0": "Cesc",
          "1": "Francesc",
          "2": "Tete"
       },
        "nickname|kind": {
          "0": "friends",
          "1": "official",
          "2": "null"
        }
      }
    ]
  }
}

Or

{
  "data": {
    "q": [
      {
        "nickname": {
          "0": "Cesc",
          "0|kind": "friends",
          "1": "Francesc",
          "1|kind": "official",
          "2": "Tete"
        },
      }
    ]
  }
}

@campoy
Copy link
Contributor

campoy commented Oct 17, 2019

There is no order in the data, but there's an order in the result we send back since it's an array.
I do not care whether the values appear in one order or the other, as long as the keys in the facet map correspond to what we encode.

That is exactly why we use a map, not an array, for the facet results. It allows us to keep it in sync while avoiding having potentially many nulls.

@MichelDiz
Copy link
Contributor

MichelDiz commented Oct 17, 2019

{
  "data": {
    "q": [
      {
        "nickname": [
          "Cesc",
          "Francesc",
          "Tete"
        ],
        "nickname": {
          "0": "official",
          "2": "friends"
        }
      }
    ]
  }
}

So, by the response, you've shared. How do we can tell what belongs to what? There's no relation with the name and its facet in that response. I would assume that 0 belongs to "Cesc" (which is wrong, cuz "official" belongs to "Francesc") and 2 belongs to "Tete" (which is also wrong, cuz "friends" belongs to "Cesc").

If facets come unsorted, there's no point in using it. But I can assume now after your response that the response example is slightly wrong. And everything should be alright. Only the example took my attention.

BTW, I think we should use "nickname|kind" as we do in object to avoid duplicate keys. If we have more facets like "since, points, tag" it would be a mess. So"nickname|since", "nickname|points", "nickname|tag"

@campoy
Copy link
Contributor

campoy commented Oct 19, 2019

I think the misunderstanding was based on my typos, sorry about that!

The actual response should be:

{
  "data": {
    "q": [
      {
        "nickname": [
          "Cesc",
          "Francesc",
          "Tete"
        ],
        "nickname|kind": {
          "0": "friends",
          "1": "official"
        }
      }
    ]
  }
}

Note that indeed the field name with the facets is nickname|kind and the key in the map points to the corresponding field in the predicates.

@animesh2049
Copy link
Contributor

In case of Facets on lists of value predicates we are putting all the facets for all the values into the same facetList. Later there is no way to find out the mapping between values and facets. Here is a sample test case for the same.
schema

 <nickname>: [string] .

mutatoin

{
    set {
      _:0x1 <nickname> "Francesc" .
      _:0x1 <nickname> "Tete" (kind="official") .
      _:0x1 <nickname> "Cesc" (kind="friends", real=true) .
    }
}

query

{
  me(func: has(nickname)) {
    nickname @facets
  }
}

Right now on master the query doesn't return any facet, but we can look into facetmatrix to verify this.

@animesh2049
Copy link
Contributor

In case of Facets on a list of object predicates , we return the facet even in case of empty node.
Consider the sample query

{
    me(func: uid(1)) {
        friend @facets(orderasc:since) {
	    name
	}
    }
} 

sample response according to current behaviour

{
  "data": {
    "me": [
      {
        "friend": [
          {
            "name": "Glenn Rhee",
            "friend|since": "2004-05-02T15:04:05Z"
          },
          {
            "friend|since": "2005-05-02T15:04:05Z" # Facet shows up even if the node is empty
          },
          {
            "name": "Rick Grimes",
            "friend|since": "2006-01-02T15:04:05Z"
          },
          {
            "name": "Andrea",
            "friend|since": "2006-01-02T15:04:05Z"
          },
          {
            "name": "Daryl Dixon",
            "friend|since": "2007-05-02T15:04:05Z"
          }
        ]
      }
    ]
  }
}

but if we try to index facets with values we will have the following response

{
  "data": {
    "me": [
      {
        "friend|since": {
          "1": "2004-05-02T15:04:05Z",
          "2": "2005-05-02T15:04:05Z",
          "3": "2006-01-02T15:04:05Z",
          "4": "2006-01-02T15:04:05Z",
          "5": "2007-05-02T15:04:05Z"
        },
        "friend": [
          {
            "name": "Glenn Rhee"
          },
          {
            "name": "Rick Grimes"
          },
          {
            "name": "Andrea"
          },
          {
            "name": "Daryl Dixon"
          }
        ]
      }
    ]
  }
}

The response above is incorrect because the empty node now has been discarded in values. I think we should eliminate all the empty nodes and their facets in response. A facet without any value anyway doesn't make sense.
cc: @campoy

@campoy
Copy link
Contributor

campoy commented Nov 12, 2019

Yes, agreed. Only the facets for the nodes that appear in the response should be encoded.

@pepoospina
Copy link

Hi @campoy, this actually breaks our whole strategy 😟. Without access to the facets of a list of predicates, for us, it's not possible to reconstruct the persisted data.

Is there a way to get around this issue and somehow get the facet values? I mean, in the meanwhile, until this new feature is released?

@campoy
Copy link
Contributor

campoy commented Dec 11, 2019

Wait, so this will help you with your problem. Right, @pepoospina?

I just want to make sure I understand correctly.

@pepoospina
Copy link

pepoospina commented Dec 11, 2019

Yes @campoy. The solution in this PR should fix our issue.

ashish-goswami added a commit that referenced this issue Jan 13, 2020
Fixes #4081

This PR covers following things:
* Support for fetching facets from value edge list.
* Response format change in case of facets to have uniform response format
  across uid/value/list edges.

Co-authored-by: Animesh Chandra Pathak <[email protected]>
@sleto-it
Copy link
Contributor

Hi all,
This ticket was fixed by PR #4267, which has now been merged

The fix will be delivered in v1.2

Please comment if more help is needed, and we will reopen

Many thanks,

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/facets Issues related to face handling, querying, etc. area/querylang Issues related to the query language specification and implementation. kind/enhancement Something could be better. priority/P1 Serious issue that requires eventual attention (can wait a bit) status/accepted We accept to investigate/work on it.
Development

Successfully merging a pull request may close this issue.

7 participants