Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This PR introduces the
explode
operator, whose role in life is to move parts of the data that can be aggregated into thediff
component, which differential dataflow will aggregate in-place. For example, if we had a collection of(name, salary)
pairs and wanted the total salaries by name, the only way to do this within differential dataflow wasThis is horrible for several reasons. Other than being verbose, differential dataflow is obliged to maintain the collection of salaries as distinct elements, because we could be computing the median or something horrible like that. We would prefer a way to explain to differential dataflow that it can accumulate the
salary
components.The
explode
operator does this, mapping each input datum into a sequence of pairs (data, diff), and producing the collection that is the accumulation of all of the diffs for each of the datas produced. The example above would becomeIn addition to being more efficient, this is meant to be more idiomatic, in that our program does not need to understand the underlying differences, etc. It also means that we can introduce other operators,
filter
,map
,join
, etc before doing the finalcount
accumulation.It may be that we need a more idiomatic name, but I'm not entirely sure what to use (
accumulate
, because the result is an accumulation?). Any thoughts here would be welcome.