A legendary tale of why we should make pmap default to using CachingPool #33892

oxinabox · 2019-11-19T18:48:01Z

Once upon a time, there was a young julia user first getting started with parallelism.
And she found it fearsomely slow.
And so she did investigate, and she did illuminate upon her issue.
Her closures, they were being reserialized again and again.
And so this young woman, she openned an issue #16345.
Lo and behold, a noble soul did come and resolve it,
by making the glorious CachingPool() in #16808.

3 long years a later this julia user did bravely return to the world of parallism, with many battle worn scars.
and once more she did face the demon that is pmap over closures.
But to her folly, she felt no fear, for she believed the demon to be crippled and chained by the glorious CachingPool.
Fearlessly, she threw his closure over 2GB of data into the maw of the demon pmap.
But alas, alas indeed, she was wrong.
The demon remained unbound, and it slew her, and slew her again.
100 times did it slay her for 101 items was the user iterating upon.
For the glorious chains of the the CachingPool() remains unused, left aside in the users tool chest, forgotten.

stdlib/Distributed/src/pmap.jl

musm · 2019-11-19T19:06:21Z

I now feel like every PR should be mandated to start with a tale or poem 👍

oxinabox · 2019-11-19T21:22:00Z

To be clear the main argument against this is it introduces some additional overhead at start of pmap. but it saves so much that I think it should be the default, and one should opt out.

stdlib/Distributed/src/pmap.jl

oxinabox · 2019-11-28T15:07:05Z

@ararslan can the Needs News label be removed?

nickrobinson251 · 2020-02-07T01:24:29Z

bump?

ViralBShah · 2020-02-07T10:56:20Z

@tanmaykm can you review this PR?

tanmaykm · 2020-02-08T07:45:06Z

Sure, will take a look.

tanmaykm · 2020-02-08T18:18:33Z

While the changes look fine to me, this seems to be catering to a very specific situation, which is probably an exception rather than the norm. Using a CachingPool in the API call seems pretty simple to use. Since #16808 mentions a penalty of 10-20% for regular workloads if CachingPool is used, I am not sure if this should be the default.

tanmaykm · 2020-02-08T18:23:39Z

Though, maybe a pointer to CachingPool and its usefulness in the documentation of pmap would help address the cause of this PR.

oxinabox · 2020-02-08T20:39:18Z

While the changes look fine to me, this seems to be catering to a very specific situation, which is probably an exception rather than the norm.

What is the use case you say is niche and what is typical?
Why do you say that?

In my experience closures are typical.
I don't think I have ever used pmap not on a closure.
and for closures, CachingPool gives a big speed-up.
(an order of time complexity speedup)

For nonclosures, it does indeed give a slow-down (a constant overhead).

Following are my timings, on a very small closure.
and again my experiences is mostly one is doing big closures, like closing over a ML model, and/or its training data.
As such I would argue one should opt-out of the CachingPool.

Closure,

no pool

first: 3.559014 seconds (793.17 k allocations: 47.080 MiB, 0.22% gc time)
second: 2.025315 seconds (299.26 k allocations: 21.447 MiB)

pool

first: 1.987681 seconds (1.42 M allocations: 77.463 MiB, 1.82% gc time)
second; 1.652644 seconds (760.81 k allocations: 44.492 MiB, 2.99% gc time)

Nonclosure

no pool:

first: 1.765842 seconds (302.51 k allocations: 14.290 MiB, 0.53% gc time)
second: 1.664156 seconds (302.54 k allocations: 14.102 MiB)

pool

1.929067 seconds (563.57 k allocations: 26.727 MiB, 2.19% gc time)
2.007292 seconds (555.20 k allocations: 26.463 MiB)

using Distributed

@time let
    x = ones(1_000_000)
    pmap(i->sum(i.*x), 1:1000);
end;

@time let
    x = ones(1_000_000)
    pmap(i->sum(i.*x), CachingPool(workers()), 1:1000);
end;

@time let
    pmap(i->sum(i.*ones(1_000_000)), 1:1000);
end;

@time let
    pmap(i->sum(i.*ones(1_000_000)), CachingPool(workers()), 1:1000);
end;

tanmaykm · 2020-02-10T05:58:26Z

Thanks @oxinabox, that clarifies some of my doubts. The downside does seem small compared to the benefits.

Would you squash and rebase the PR? I think we can merge this.

tanmaykm · 2020-02-10T07:22:18Z

We may also be able to reduce the overhead of CachingPool somewhat with this change:

diff --git a/stdlib/Distributed/src/workerpool.jl b/stdlib/Distributed/src/workerpool.jl
index 628876334c..3830a420cb 100644
--- a/stdlib/Distributed/src/workerpool.jl
+++ b/stdlib/Distributed/src/workerpool.jl
@@ -338,15 +338,20 @@ function clear!(pool::CachingPool)
 end
 
 exec_from_cache(rr::RemoteChannel, args...; kwargs...) = fetch(rr)(args...; kwargs...)
-function exec_from_cache(f_ref::Tuple{Function, RemoteChannel}, args...; kwargs...)
-    put!(f_ref[2], f_ref[1])        # Cache locally
-    f_ref[1](args...; kwargs...)
-end
 
 function remotecall_pool(rc_f, f, pool::CachingPool, args...; kwargs...)
     worker = take!(pool)
-    f_ref = get(pool.map_obj2ref, (worker, f), (f, RemoteChannel(worker)))
-    isa(f_ref, Tuple) && (pool.map_obj2ref[(worker, f)] = f_ref[2])   # Add to tracker
+    f_ref = get!(pool.map_obj2ref, (worker, f)) do
+        chan = RemoteChannel(worker)
+        put!(chan, f)
+        chan
+    end
 
     try
         rc_f(exec_from_cache, worker, f_ref, args...; kwargs...)

With this change:

julia> @time let
           pmap(i->sum(i.*ones(1_000_000)), 1:1000);
       end;
  2.110482 seconds (329.84 k allocations: 15.280 MiB)

julia> @time let
           pmap(i->sum(i.*ones(1_000_000)), pool, 1:1000);
       end;
  2.142915 seconds (491.90 k allocations: 23.318 MiB)

And without, it looked like:

julia> @time let
           pmap(i->sum(i.*ones(1_000_000)), 1:1000);
       end;
  2.092032 seconds (329.69 k allocations: 15.264 MiB)

julia> @time let
           pmap(i->sum(i.*ones(1_000_000)), pool, 1:1000);
       end;
  2.181026 seconds (570.54 k allocations: 27.101 MiB)

oxinabox · 2020-02-10T08:47:55Z

We could make it smart about thing.
idk if its too magic but we can detect closures using fieldcount.
and then we could decide to use a CachingPool or not.

oxinabox · 2021-09-29T20:10:36Z

Would you squash and rebase the PR? I think we can merge this.

@tanmaykm cool, done.

quildtide · 2021-09-29T20:47:23Z

I believe this would close #21946 also, if merged.

It would also make it possible to close pull req #22843, so then it'd also be possible to close JuliaLang/Distributed.jl#46.

EDIT: I suppose you could say that this would bring closure to many things.

JeffBezanson · 2021-09-30T19:01:01Z

Triage is ok with this. We also like the idea of conditioning it based on whether the function has fields, but that can be done in the future.

oxinabox · 2023-07-28T21:13:14Z

So how about we merge this then?

Update stdlib/Distributed/src/pmap.jl Update NEWS.md use some workers

…ool (JuliaLang/julia#33892) Once upon a time, there was a young julia user first getting started with parallelism. And she found it fearsomely slow. And so she did investigate, and she did illuminate upon her issue. Her closures, they were being reserialized again and again. And so this young woman, she openned an issue JuliaLang/julia#16345. Lo and behold, a noble soul did come and resolve it, by making the glorious `CachingPool()` in JuliaLang/julia#16808. 3 long years a later this julia user did bravely return to the world of parallism, with many battle worn scars. and once more she did face the demon that is `pmap` over closures. But to her folly, she felt no fear, for she believed the demon to be crippled and chained by the glorious `CachingPool`. Fearlessly, she threw his closure over 2GB of data into the maw of the demon `pmap`. But alas, alas indeed, she was wrong. The demon remained unbound, and it slew her, and slew her again. 100 times did it slay her for 101 items was the user iterating upon. For the glorious chains of the the `CachingPool()` remains unused, left aside in the users tool chest, forgotten.

oxinabox commented Nov 19, 2019

View reviewed changes

stdlib/Distributed/src/pmap.jl Outdated Show resolved Hide resolved

ararslan added parallelism Parallel or distributed computation needs news A NEWS entry is required for this change labels Nov 19, 2019

oxinabox commented Nov 19, 2019

View reviewed changes

stdlib/Distributed/src/pmap.jl Show resolved Hide resolved

KristofferC removed the needs news A NEWS entry is required for this change label Nov 28, 2019

fredrikekre added the triage This should be discussed on a triage call label Feb 7, 2020

tanmaykm requested review from tanmaykm and removed request for tanmaykm February 10, 2020 06:03

oxinabox mentioned this pull request Jul 7, 2020

always use a CachingPool with robust_pmap invenia/Parallelism.jl#2

Merged

oxinabox force-pushed the patch-25 branch from 8dfa5a8 to 886f1f3 Compare September 29, 2021 20:10

JeffBezanson removed the triage This should be discussed on a triage call label Sep 30, 2021

oxinabox force-pushed the patch-25 branch from 886f1f3 to 621aebf Compare September 30, 2021 19:07

oxinabox force-pushed the patch-25 branch from 621aebf to f66a82b Compare July 28, 2023 21:58

oscardssmith added the merge me PR is reviewed. Merge when all tests are passing label Jul 28, 2023

make pmap default to using a CachingPool

f08a3d2

Update stdlib/Distributed/src/pmap.jl Update NEWS.md use some workers

oxinabox force-pushed the patch-25 branch from f66a82b to f08a3d2 Compare July 28, 2023 22:32

oscardssmith merged commit 4825a0c into JuliaLang:master Jul 28, 2023

oscardssmith removed the merge me PR is reviewed. Merge when all tests are passing label Jul 28, 2023

ConnectedSystems mentioned this pull request Jul 31, 2023

ADRIAmod uses excessive amounts of memory during runs open-AIMS/ADRIA.jl#405

Closed

tecosaur mentioned this pull request Feb 10, 2024

Dynamic @distributed scheduling JuliaLang/Distributed.jl#45

Open

vtjnash mentioned this pull request Feb 10, 2024

pmap: make CachingPool the default for anonymous functions #22843

Closed

oxinabox deleted the patch-25 branch May 4, 2024 01:00

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

A legendary tale of why we should make pmap default to using CachingPool #33892

A legendary tale of why we should make pmap default to using CachingPool #33892

oxinabox commented Nov 19, 2019 •

edited

Loading

musm commented Nov 19, 2019 •

edited

Loading

oxinabox commented Nov 19, 2019

oxinabox commented Nov 28, 2019

nickrobinson251 commented Feb 7, 2020

ViralBShah commented Feb 7, 2020

tanmaykm commented Feb 8, 2020

tanmaykm commented Feb 8, 2020

tanmaykm commented Feb 8, 2020 •

edited

Loading

oxinabox commented Feb 8, 2020 •

edited

Loading

tanmaykm commented Feb 10, 2020

tanmaykm commented Feb 10, 2020

oxinabox commented Feb 10, 2020

oxinabox commented Sep 29, 2021

quildtide commented Sep 29, 2021 •

edited

Loading

JeffBezanson commented Sep 30, 2021

oxinabox commented Jul 28, 2023

A legendary tale of why we should make pmap default to using CachingPool #33892

A legendary tale of why we should make pmap default to using CachingPool #33892

Conversation

oxinabox commented Nov 19, 2019 • edited Loading

musm commented Nov 19, 2019 • edited Loading

oxinabox commented Nov 19, 2019

oxinabox commented Nov 28, 2019

nickrobinson251 commented Feb 7, 2020

ViralBShah commented Feb 7, 2020

tanmaykm commented Feb 8, 2020

tanmaykm commented Feb 8, 2020

tanmaykm commented Feb 8, 2020 • edited Loading

oxinabox commented Feb 8, 2020 • edited Loading

Closure,

no pool

pool

Nonclosure

no pool:

pool

tanmaykm commented Feb 10, 2020

tanmaykm commented Feb 10, 2020

oxinabox commented Feb 10, 2020

oxinabox commented Sep 29, 2021

quildtide commented Sep 29, 2021 • edited Loading

JeffBezanson commented Sep 30, 2021

oxinabox commented Jul 28, 2023

oxinabox commented Nov 19, 2019 •

edited

Loading

musm commented Nov 19, 2019 •

edited

Loading

tanmaykm commented Feb 8, 2020 •

edited

Loading

oxinabox commented Feb 8, 2020 •

edited

Loading

quildtide commented Sep 29, 2021 •

edited

Loading