Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replace weights with split length control #260

Merged
merged 17 commits into from
Mar 21, 2017

Conversation

cuviper
Copy link
Member

@cuviper cuviper commented Mar 12, 2017

This adds with_min_len and with_max_len on IndexedParallelIterator to control the length of the splits that will be used in each thread. Rayon will try to split iterators to lengths within this range, without making guarantees about where in the range it will fall. If the min and max are at odds, the min wins.
(e.g. with any 2 * min > max, say min=10 and max=15, we can't split length=16 further.)

Fixes #111?

TODO:

  • Add targeted testing
  • Add "representative" benchmarks
  • Leave deprecated no-op placeholders for the old weight APIs?

min: cmp::max(min, 1),
};

let max_splits = len / cmp::max(max, 1);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel this logic merits a comment =)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ideally, with a few examples that show how different min/max/len combinations work out

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a bit confused by the max_splits computation here, actually. So in particular if the default is that max is usize::MAX, then wouldn't max_splits almost always be 0? I guess tomorrow I'll have to trace this out in more detail.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps it's poorly named - it represents the number of splits derived from the max, and just below it's only used if it's more than the normal splitter would have used.

I'll fill in a comment later. Thanks for looking it over!

@cuviper
Copy link
Member Author

cuviper commented Mar 15, 2017

OK, added some comments, and flipped the name of that variable. Hope it makes sense now! :)

I still need to address the TODOs, but I threw in some quick additions to the join benchmarks.

test join_microbench::increment_all                          ... bench:      35,475 ns/iter (+/- 2,704)
test join_microbench::increment_all_atomized                 ... bench:   1,460,769 ns/iter (+/- 21,970)
test join_microbench::increment_all_max                      ... bench:      67,500 ns/iter (+/- 14,403)
test join_microbench::increment_all_min                      ... bench:      20,404 ns/iter (+/- 1,771)
test join_microbench::increment_all_serialized               ... bench:      34,150 ns/iter (+/- 1,349)

At least set_min_len() shows some value here. I'll have to think about how to use max better, but my thought was that some workloads may want to be kept within the cache size, for instance.

@@ -133,7 +133,14 @@ pub trait UnindexedProducer: Send + Sized {
/// job is actually stolen into a different thread.
#[derive(Clone, Copy)]
struct Splitter {

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: extra newline? Is this what rustfmt produces?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nah, that's my doing. I go back and forth on preferring a blank line before doc comments, but I'll remove it if you find this nit-worthy. :)

let max_splits = len / cmp::max(max, 1);
if max_splits > splitter.inner.splits {
splitter.inner.splits = max_splits;
// Divide the given length by the max working lenght to get the minimum
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: "lenght"

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ack

@nikomatsakis
Copy link
Member

This all seems quite good. I do have one bikeshed-y question: the name set_min_len suggests to me an &mut self "setting" operation, but in fact it produces a wrapper. Maybe just min_len(N)? I guess it's not a verb, which is odd. I also thought of with_min_len(N)?

I can't really find a comparable thing in the existing iterator API to guide the naming here.

@nikomatsakis
Copy link
Member

With respect to this question:

Leave deprecated no-op placeholders for the old weight APIs?

Probably that would be the nice thing to do, but we should be sure to file an issue to remove them before we release the real 1.0.

@cuviper
Copy link
Member Author

cuviper commented Mar 16, 2017

I added some tests, and renamed to with_min_len / with_max_len.

Incidentally, I tried --jobs 1 on the demo tests to see if that would avoid the stack overflows, but it didn't help, so I reverted it. It sure would be nice to know what's causing that...

@nikomatsakis
Copy link
Member

I'm inclined to merge this at this point. Not sure if you had plans to add a "representative benchmark"?

@cuviper
Copy link
Member Author

cuviper commented Mar 16, 2017

The benchmark todo comes from your emphasized requirement in #111. If you're satisfied as-is, so am I. 😀

@nikomatsakis
Copy link
Member

@cuviper heh, fair. Well, I'm not sure what such a benchmark would be just now, but at least we have an unrepresentative one. =)

Actually, I think that the "do a lot of tiny tiny work" isn't such a bad thing to measure, I guess.

@cuviper
Copy link
Member Author

cuviper commented Mar 18, 2017

I struck out the benchmark TODO, as I think we agreed on gitter that we don't really need to worry about that. Ready to merge?

@nikomatsakis
Copy link
Member

@cuviper yep -- my only concern was the "merge from master" commit, did you want to rebase instead?

@cuviper
Copy link
Member Author

cuviper commented Mar 19, 2017

Merge commits don't bother me, but I can rebase later today if you prefer.

@cuviper
Copy link
Member Author

cuviper commented Mar 20, 2017

Rebased.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants