You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
First of all, thank you for designing and implementing Differential Dataflow. It looks incredibly powerful, and seems like the perfect tool for one of my projects.
Onto my question: I have a query that runs in "stages", where each stage depends on the previous stage to materialize itself, and each stage outputs a Collection. Based on the result of the previous stage, a new Collection is built using DD operators; more precisely, one stage produces a set of elements that is updated over time, and a subsequent stage fetches data from multiple sources corresponding to each element, producing a Collection for each element. This is somewhat similar to Differential Datalog, except that some rules may dynamically create new rules (which must then be applied).
Ideally, a stage would instead produce a Collection of distinct elements, then each element would be mapped into a new Collection, and then these collections would be concatenated (it's actually slightly harder than this, as these elements are grouped together to produce fewer collections overall).
A few naive alternatives I thought about:
Create one dataflow per stage, and then re-create all subsequent dataflows anytime a dependency changes, but IIRC this will make it very hard to re-use computations efficiently (my guess is that it's possible using traces and memoization, but then complexity increases as compactions have to be handled manually).
Use a single dataflow, and discard no-longer-needed collections when e.g. a dependency is removed from the set. However my understanding of collections is that they cannot be discarded, so as previous dataflows keep updating, new collections will be created and previous collections will be discarded, but will still consume resources / be computed. I'm not sure how .leave() interacts with this assumption.
Use a Variable for parts derived from previous collections, and periodically update it with .set(concatenate(current_collections)), but again I'm worried about discarded collections still being active.
The text was updated successfully, but these errors were encountered:
Hi!
First of all, thank you for designing and implementing Differential Dataflow. It looks incredibly powerful, and seems like the perfect tool for one of my projects.
Onto my question: I have a query that runs in "stages", where each stage depends on the previous stage to materialize itself, and each stage outputs a
Collection
. Based on the result of the previous stage, a newCollection
is built using DD operators; more precisely, one stage produces a set of elements that is updated over time, and a subsequent stage fetches data from multiple sources corresponding to each element, producing aCollection
for each element. This is somewhat similar to Differential Datalog, except that some rules may dynamically create new rules (which must then be applied).Ideally, a stage would instead produce a
Collection
of distinct elements, then each element would be mapped into a newCollection
, and then these collections would beconcatenate
d (it's actually slightly harder than this, as these elements are grouped together to produce fewer collections overall).A few naive alternatives I thought about:
.leave()
interacts with this assumption.Variable
for parts derived from previous collections, and periodically update it with.set(concatenate(current_collections))
, but again I'm worried about discarded collections still being active.The text was updated successfully, but these errors were encountered: