-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
A legendary tale of why we should make pmap default to using CachingPool #33892
Conversation
I now feel like every PR should be mandated to start with a tale or poem 👍 |
To be clear the main argument against this is it introduces some additional overhead at start of pmap. but it saves so much that I think it should be the default, and one should opt out. |
@ararslan can the Needs News label be removed? |
bump? |
@tanmaykm can you review this PR? |
Sure, will take a look. |
While the changes look fine to me, this seems to be catering to a very specific situation, which is probably an exception rather than the norm. Using a |
Though, maybe a pointer to |
What is the use case you say is niche and what is typical? In my experience closures are typical. For nonclosures, it does indeed give a slow-down (a constant overhead). Following are my timings, on a very small closure. Closure,no pool
pool
Nonclosureno pool:
pool
using Distributed
@time let
x = ones(1_000_000)
pmap(i->sum(i.*x), 1:1000);
end;
@time let
x = ones(1_000_000)
pmap(i->sum(i.*x), CachingPool(workers()), 1:1000);
end;
@time let
pmap(i->sum(i.*ones(1_000_000)), 1:1000);
end;
@time let
pmap(i->sum(i.*ones(1_000_000)), CachingPool(workers()), 1:1000);
end; |
Thanks @oxinabox, that clarifies some of my doubts. The downside does seem small compared to the benefits. Would you squash and rebase the PR? I think we can merge this. |
We may also be able to reduce the overhead of
With this change:
And without, it looked like:
|
We could make it smart about thing. |
@tanmaykm cool, done. |
I believe this would close #21946 also, if merged. It would also make it possible to close pull req #22843, so then it'd also be possible to close JuliaLang/Distributed.jl#46. EDIT: I suppose you could say that this would bring closure to many things. |
Triage is ok with this. We also like the idea of conditioning it based on whether the function has fields, but that can be done in the future. |
So how about we merge this then? |
Update stdlib/Distributed/src/pmap.jl Update NEWS.md use some workers
…ool (JuliaLang/julia#33892) Once upon a time, there was a young julia user first getting started with parallelism. And she found it fearsomely slow. And so she did investigate, and she did illuminate upon her issue. Her closures, they were being reserialized again and again. And so this young woman, she openned an issue JuliaLang/julia#16345. Lo and behold, a noble soul did come and resolve it, by making the glorious `CachingPool()` in JuliaLang/julia#16808. 3 long years a later this julia user did bravely return to the world of parallism, with many battle worn scars. and once more she did face the demon that is `pmap` over closures. But to her folly, she felt no fear, for she believed the demon to be crippled and chained by the glorious `CachingPool`. Fearlessly, she threw his closure over 2GB of data into the maw of the demon `pmap`. But alas, alas indeed, she was wrong. The demon remained unbound, and it slew her, and slew her again. 100 times did it slay her for 101 items was the user iterating upon. For the glorious chains of the the `CachingPool()` remains unused, left aside in the users tool chest, forgotten.
Once upon a time, there was a young julia user first getting started with parallelism.
And she found it fearsomely slow.
And so she did investigate, and she did illuminate upon her issue.
Her closures, they were being reserialized again and again.
And so this young woman, she openned an issue #16345.
Lo and behold, a noble soul did come and resolve it,
by making the glorious
CachingPool()
in #16808.3 long years a later this julia user did bravely return to the world of parallism, with many battle worn scars.
and once more she did face the demon that is
pmap
over closures.But to her folly, she felt no fear, for she believed the demon to be crippled and chained by the glorious
CachingPool
.Fearlessly, she threw his closure over 2GB of data into the maw of the demon
pmap
.But alas, alas indeed, she was wrong.
The demon remained unbound, and it slew her, and slew her again.
100 times did it slay her for 101 items was the user iterating upon.
For the glorious chains of the the
CachingPool()
remains unused, left aside in the users tool chest, forgotten.