-
Notifications
You must be signed in to change notification settings - Fork 271
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Avoid temporary allocations by improved APIs #580
Avoid temporary allocations by improved APIs #580
Conversation
@@ -64,7 +64,7 @@ impl<G: Scope> OperatorBuilder<G> { | |||
where | |||
P: ParallelizationContract<G::Timestamp, C> { | |||
|
|||
let connection = vec![Antichain::from_elem(Default::default()); self.builder.shape().outputs()]; | |||
let connection = (0..self.builder.shape().outputs()).map(|_| Antichain::from_elem(Default::default())).collect(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note for future selves, this is a common enough pattern (antichain with default summary) that it probably wants to be represented more compactly, perhaps as an enum variant. It seems that such enums around vectors are surprisingly cheap now (more than just the one additional variant allowed by e.g. Option<Vec<_>>
).
timely/src/dataflow/scopes/child.rs
Outdated
@@ -97,6 +97,15 @@ where | |||
{ | |||
fn name(&self) -> String { self.subgraph.borrow().name.clone() } | |||
fn addr(&self) -> Vec<usize> { self.subgraph.borrow().path.clone() } | |||
|
|||
fn addr_for(&self, index: usize) -> Vec<usize> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should eventually add a comment here, or enrich the name to addr_for_child
to be clear. Or addr_plus
or .. idk something to be clear what is going on here. It's not a great mystery once you read the code, ofc. Don't worry about this if you don't want, and I can tidy in post.
timely/src/dataflow/scopes/mod.rs
Outdated
/// A sequence of scope identifiers describing the path from the worker root to the index in | ||
/// this scope. | ||
fn addr_for(&self, index: usize) -> Vec<usize>; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh boy. You did have the docs and I read the files out of order. I'm sorry! But maybe "to the index" -> "to the child indicated by index"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks great! I left some notes, but they are all minor.
Another thing we discussed offline, and I wanted to record here, is that perhaps we should be using Rc<[usize]>
for the addresses, rather than Vec<usize>
. We don't need to mutate anything, and there are moments that we have multiple copies of the same thing. Another potentially useful idea is to have the worker support a per-dataflow allocation for storing addresses, as they all have the same shared lifetime (the dataflow), and they never change. So, e.g. a map from operator id (integer) back to the path (integer sequence).
Anyhow, this seems great independently, and these ideas are potentially good follow-ons!
Signed-off-by: Moritz Hoffmann <[email protected]>
Signed-off-by: Moritz Hoffmann <[email protected]>
5874de0
to
c11e2e3
Compare
Signed-off-by: Moritz Hoffmann <[email protected]>
This PR aims at removing temporary allocations by:
addr
function, i.e,addr_for
that allocates the correct size.Vec<usize>
instead of&[usize]
whenever the receiver needs to take ownership. In this case, the caller can determine whether they can pass ownership or need to clone, instead of unconditionally forcing the receiver to allocating a new vector to take ownership.