Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for pull query #30

Open
metasoarous opened this issue Dec 13, 2018 · 2 comments
Open

Support for pull query #30

metasoarous opened this issue Dec 13, 2018 · 2 comments

Comments

@metasoarous
Copy link

I see that support for pull queries is on the README query feature map, but I didn't see an issue for it, so I thought I'd start one. Of all the pending query features, this one to me feels like the biggest hole in my day to day usage of Datomic & DataScript.

There's potentially a bit of a discussion here about interface and scoping. There seems to be a clear target for

  • individual pull queries
  • pull queries on a set collection of ids (pull-many basically)
  • pull queries within a datalog query like [:find (pull ?e [...]) :where ...]

In all of these cases, it would seam reasonable to send diffs corresponding to the relevant [e a v] triples. The thing this misses vs convention pull is obviously "where in the nested structure is this relevant". It would probably be fine to ignore this, but it is interesting to consider that with some kind of Reagent like api, you could return reactions which resolve to maps, which themselves might point to nested reactions.

The problem I see is what if you have a query like [:find (pull ?e [...]) (pull ?d [...]) :where ...], which is effectively a relationship between pull structures. This is legal in either Datomic or DataScript, but I'm not sure how you would interpret it here, because here you don't just have a collection of facts, you have a relation between collections of facts. So maybe this just isn't supported. Or maybe you can come up with some clever indexing scheme that pairs the pull diffs with a concept of where they are in the outer relation. What's interesting is that if we again consider the Reagent model, this again fits quite nicely into the idea of returning a reaction of nested reactions.

Again, thanks for the great work!

@comnik
Copy link
Member

comnik commented Dec 14, 2018

There are no decisions made regarding pull, but I have a few thoughts and I'm very happy for input on this as my mind is currently occupied with Datalog-related stuff.

Scope
At work, pull suffices for many of the inter-service data dependencies we have. Our version of pull differs from that in Datomic in that we optimize for pulling across all entities, and we offer the ability to include simple constraints. We use this style of pull query at every layer of the stack, down to the individual components in a web-frontend. In a large scale setting it probably makes sense to enforce at least one constraint on root entities and to forbid the use of attribute wildcards. Wildcards for interactive documentation purposes would be handled separately (we can talk more about this).

Implementation
What seems most interesting from an implementation-perspective is that relations constrain only the codomain, whereas a Datalog clause like [?parent :parent/child ?child] would constrain both. Given a pull expression

[:parent/name 
 {:parent/child [:child/name]}]

we'd need to create separate output streams for each path, each with different constraints applied to them. E.g.:

'(?parent) <- [:parent/name] | _ ;; no constraints
'(?parent :parent/child ?child) <- [:child/name] | [?parent :parent/child ?child]

The joins involved in the nested path would then produce [?parent ?child a v] tuples, whereas the top-level path produces [?parent a v] tuples. Such outputs could then be merged into nested maps on the client.

I did not quite follow your comment that

it is interesting to consider that with some kind of Reagent like api, you could return reactions which resolve to maps, which themselves might point to nested reactions.
Could you explain?

Similarly for your last example. If the root constraints don't share symbols, I don't see how there would be a constraining relation between the two pull queries? Implemented as separate dataflows, Differential would still update them consistently. One could also join the outputs back together afterwards, we do something similar for multiple aggregations within a :find.

But once these things are cleared up it shouldn't be that much effort to come up with a proof-of-concept (I hope?).

@comnik
Copy link
Member

comnik commented Jan 5, 2019

A first step is done in comnik/declarative-dataflow@4a2c8df. That commit introduces a PullLevel operator which can be used to implement each of the individual streams mentioned above.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants