-
Notifications
You must be signed in to change notification settings - Fork 184
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Pass data from batcher to builder by chunk #491
Pass data from batcher to builder by chunk #491
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is good. We discussed a bit about reduce.rs
and how to avoid potentially exotic regressions there with large keys and multiple updates. That feels like something we want to think carefully about. Other discussion point was around ergonomics for the BuilderInput
trait, and we guessed a bit there and maybe there is a better thing to do and maybe not.
src/operators/reduce.rs
Outdated
@@ -448,10 +450,10 @@ where | |||
// TODO: It would be better if all updates went into one batch, but timely dataflow prevents | |||
// this as long as it requires that there is only one capability for each message. | |||
let mut buffers = Vec::<(G::Timestamp, Vec<(V, G::Timestamp, T2::Diff)>)>::new(); | |||
let mut builders = Vec::new(); | |||
let mut chains = Vec::new(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Worth noting that this is a potential regression, on account of staging all the data. You mentioned that another pattern is just to flush the buffers .. whenever one likes. I figure we should either jot that down in a comment, or make it be the case if we are worried about it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I looked into whether it's possible to populate a builder from ((&K, V), T, R)
instead of cloning K
. The short answer is no, the long answer is maybe. At the moment, we have two blockers:
- Reductions write their output into an arrangement, but bypass the merge batcher. We still need to provide a merge batcher type, even tho it'll never be used. We can test removing it by providing a dummy
Batcher
trait implementation that is mostlyunimplemented!()
. - The input chunks to the builder need to implement
Container
, which implies'static
. No way to store references! Remove Container: Clone + 'static timely-dataflow#540 might help here.
I think this is something we should keep as a to-do, but not prioritize until we have evidence it causes performance issues.
src/trace/implementations/mod.rs
Outdated
@@ -320,6 +363,109 @@ impl BatchContainer for OffsetList { | |||
} | |||
} | |||
|
|||
/// Behavior to split an update into principal components. | |||
pub trait BuilderInput<L: Layout> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We discussed whether this might work with a BuilderInput<'a, L>
. We didn't conclude that it could or couldn't, but if it ended up being C::Item<'a>: BuilderInput<'a, L>
that might be clearer than the alternative.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another thought we had: Perhaps BuilderInput<L>: Container
, and rather than introduce an Item<'a>
here and equate it to C::Item<'a>
, the signatures could just use C::Item<'a>
.
f62ef78
to
44bdd05
Compare
This is looking good, modulo the conflicts to resolve! |
Currently, the data shared between the batcher and the builder are individual tuples, either moved or by reference. This limits flexibility around what kind of data can be provided to a builder, i.e., it has to be in the form of tuples, either owned or a reference to a fully-formed one. This works fine for vector-like structures, but will not work for containers that like to arrange their data differently. This change alters the contract between the batcher and the builder to provide chunks instead of individual items (it does not require _chains_.) The data in the chunks must be sorted, and subsequent calls must maintain order, too. The input containers need to implement `BuilderInput`, a type that describes how a container's items can be broken into key, value, time, and diff, where key and value can be references or owned data, as long as they can be pushed into the underlying key and value containers. The change has some quirks around comparing keys to keys already in the builder. The types can differ, and the best solution I could come up with was to add two explicit comparison functions to `BuilderInput` to compare keys and values. While it is not elegant, it allows us to move forward with this change, without adding nightmare-inducing trait bounds all-over. Signed-off-by: Moritz Hoffmann <[email protected]>
Signed-off-by: Moritz Hoffmann <[email protected]>
44bdd05
to
2834002
Compare
Currently, the data shared between the batcher and the builder are individual tuples, either moved or by reference. This limits flexibility around what kind of data can be provided to a builder, i.e., it has to be in the form of tuples, either owned or a reference to a fully-formed one. This works fine for vector-like structures, but will not work for containers that like to arrange their data differently.
This change alters the contract between the batcher and the builder to provide chunks instead of individual items (it does not require chains.) The data in the chunks must be sorted, and subsequent calls must maintain order, too. The input containers need to implement
BuilderInput
, a type that describes how a container's items can be broken into key, value, time, and diff, where key and value can be references or owned data, as long as they can be pushed into the underlying key and value containers.The change has some quirks around comparing keys to keys already in the builder. The types can differ, and the best solution I could come up with was to add two explicit comparison functions to
BuilderInput
to compare keys and values. While it is not elegant, it allows us to move forward with this change, without adding nightmare-inducing trait bounds all-over.