Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

A legendary tale of why we should make pmap default to using CachingPool #33892

Merged
merged 1 commit into from
Jul 28, 2023

Conversation

oxinabox
Copy link
Contributor

@oxinabox oxinabox commented Nov 19, 2019

Once upon a time, there was a young julia user first getting started with parallelism.
And she found it fearsomely slow.
And so she did investigate, and she did illuminate upon her issue.
Her closures, they were being reserialized again and again.
And so this young woman, she openned an issue #16345.
Lo and behold, a noble soul did come and resolve it,
by making the glorious CachingPool() in #16808.

3 long years a later this julia user did bravely return to the world of parallism, with many battle worn scars.
and once more she did face the demon that is pmap over closures.
But to her folly, she felt no fear, for she believed the demon to be crippled and chained by the glorious CachingPool.
Fearlessly, she threw his closure over 2GB of data into the maw of the demon pmap.
But alas, alas indeed, she was wrong.
The demon remained unbound, and it slew her, and slew her again.
100 times did it slay her for 101 items was the user iterating upon.
For the glorious chains of the the CachingPool() remains unused, left aside in the users tool chest, forgotten.

@ararslan ararslan added parallelism Parallel or distributed computation needs news A NEWS entry is required for this change labels Nov 19, 2019
@musm
Copy link
Contributor

musm commented Nov 19, 2019

I now feel like every PR should be mandated to start with a tale or poem 👍

@oxinabox
Copy link
Contributor Author

To be clear the main argument against this is it introduces some additional overhead at start of pmap. but it saves so much that I think it should be the default, and one should opt out.

@oxinabox
Copy link
Contributor Author

@ararslan can the Needs News label be removed?

@KristofferC KristofferC removed the needs news A NEWS entry is required for this change label Nov 28, 2019
@nickrobinson251
Copy link
Contributor

bump?

@ViralBShah
Copy link
Member

@tanmaykm can you review this PR?

@fredrikekre fredrikekre added the triage This should be discussed on a triage call label Feb 7, 2020
@tanmaykm
Copy link
Member

tanmaykm commented Feb 8, 2020

Sure, will take a look.

@tanmaykm
Copy link
Member

tanmaykm commented Feb 8, 2020

While the changes look fine to me, this seems to be catering to a very specific situation, which is probably an exception rather than the norm. Using a CachingPool in the API call seems pretty simple to use. Since #16808 mentions a penalty of 10-20% for regular workloads if CachingPool is used, I am not sure if this should be the default.

@tanmaykm
Copy link
Member

tanmaykm commented Feb 8, 2020

Though, maybe a pointer to CachingPool and its usefulness in the documentation of pmap would help address the cause of this PR.

@oxinabox
Copy link
Contributor Author

oxinabox commented Feb 8, 2020

While the changes look fine to me, this seems to be catering to a very specific situation, which is probably an exception rather than the norm.

What is the use case you say is niche and what is typical?
Why do you say that?

In my experience closures are typical.
I don't think I have ever used pmap not on a closure.
and for closures, CachingPool gives a big speed-up.
(an order of time complexity speedup)

For nonclosures, it does indeed give a slow-down (a constant overhead).

Following are my timings, on a very small closure.
and again my experiences is mostly one is doing big closures, like closing over a ML model, and/or its training data.
As such I would argue one should opt-out of the CachingPool.

Closure,

no pool

  • first: 3.559014 seconds (793.17 k allocations: 47.080 MiB, 0.22% gc time)
  • second: 2.025315 seconds (299.26 k allocations: 21.447 MiB)

pool

  • first: 1.987681 seconds (1.42 M allocations: 77.463 MiB, 1.82% gc time)
  • second; 1.652644 seconds (760.81 k allocations: 44.492 MiB, 2.99% gc time)

Nonclosure

no pool:

  • first: 1.765842 seconds (302.51 k allocations: 14.290 MiB, 0.53% gc time)
  • second: 1.664156 seconds (302.54 k allocations: 14.102 MiB)

pool

  • 1.929067 seconds (563.57 k allocations: 26.727 MiB, 2.19% gc time)
  • 2.007292 seconds (555.20 k allocations: 26.463 MiB)

using Distributed

@time let
    x = ones(1_000_000)
    pmap(i->sum(i.*x), 1:1000);
end;

@time let
    x = ones(1_000_000)
    pmap(i->sum(i.*x), CachingPool(workers()), 1:1000);
end;

@time let
    pmap(i->sum(i.*ones(1_000_000)), 1:1000);
end;

@time let
    pmap(i->sum(i.*ones(1_000_000)), CachingPool(workers()), 1:1000);
end;

@tanmaykm
Copy link
Member

Thanks @oxinabox, that clarifies some of my doubts. The downside does seem small compared to the benefits.

Would you squash and rebase the PR? I think we can merge this.

@tanmaykm tanmaykm requested review from tanmaykm and removed request for tanmaykm February 10, 2020 06:03
@tanmaykm
Copy link
Member

We may also be able to reduce the overhead of CachingPool somewhat with this change:

diff --git a/stdlib/Distributed/src/workerpool.jl b/stdlib/Distributed/src/workerpool.jl
index 628876334c..3830a420cb 100644
--- a/stdlib/Distributed/src/workerpool.jl
+++ b/stdlib/Distributed/src/workerpool.jl
@@ -338,15 +338,20 @@ function clear!(pool::CachingPool)
 end
 
 exec_from_cache(rr::RemoteChannel, args...; kwargs...) = fetch(rr)(args...; kwargs...)
-function exec_from_cache(f_ref::Tuple{Function, RemoteChannel}, args...; kwargs...)
-    put!(f_ref[2], f_ref[1])        # Cache locally
-    f_ref[1](args...; kwargs...)
-end
 
 function remotecall_pool(rc_f, f, pool::CachingPool, args...; kwargs...)
     worker = take!(pool)
-    f_ref = get(pool.map_obj2ref, (worker, f), (f, RemoteChannel(worker)))
-    isa(f_ref, Tuple) && (pool.map_obj2ref[(worker, f)] = f_ref[2])   # Add to tracker
+    f_ref = get!(pool.map_obj2ref, (worker, f)) do
+        chan = RemoteChannel(worker)
+        put!(chan, f)
+        chan
+    end
 
     try
         rc_f(exec_from_cache, worker, f_ref, args...; kwargs...)

With this change:

julia> @time let
           pmap(i->sum(i.*ones(1_000_000)), 1:1000);
       end;
  2.110482 seconds (329.84 k allocations: 15.280 MiB)

julia> @time let
           pmap(i->sum(i.*ones(1_000_000)), pool, 1:1000);
       end;
  2.142915 seconds (491.90 k allocations: 23.318 MiB)

And without, it looked like:

julia> @time let
           pmap(i->sum(i.*ones(1_000_000)), 1:1000);
       end;
  2.092032 seconds (329.69 k allocations: 15.264 MiB)

julia> @time let
           pmap(i->sum(i.*ones(1_000_000)), pool, 1:1000);
       end;
  2.181026 seconds (570.54 k allocations: 27.101 MiB)

@oxinabox
Copy link
Contributor Author

We could make it smart about thing.
idk if its too magic but we can detect closures using fieldcount.
and then we could decide to use a CachingPool or not.

@oxinabox
Copy link
Contributor Author

Would you squash and rebase the PR? I think we can merge this.

@tanmaykm cool, done.

@quildtide
Copy link
Contributor

quildtide commented Sep 29, 2021

I believe this would close #21946 also, if merged.

It would also make it possible to close pull req #22843, so then it'd also be possible to close JuliaLang/Distributed.jl#46.

EDIT: I suppose you could say that this would bring closure to many things.

@JeffBezanson
Copy link
Member

Triage is ok with this. We also like the idea of conditioning it based on whether the function has fields, but that can be done in the future.

@JeffBezanson JeffBezanson removed the triage This should be discussed on a triage call label Sep 30, 2021
@oxinabox
Copy link
Contributor Author

So how about we merge this then?

@oscardssmith oscardssmith added the merge me PR is reviewed. Merge when all tests are passing label Jul 28, 2023
Update stdlib/Distributed/src/pmap.jl
Update NEWS.md
use some workers
@oscardssmith oscardssmith merged commit 4825a0c into JuliaLang:master Jul 28, 2023
@oscardssmith oscardssmith removed the merge me PR is reviewed. Merge when all tests are passing label Jul 28, 2023
vchuravy pushed a commit to JuliaLang/Distributed.jl that referenced this pull request Oct 6, 2023
…ool (JuliaLang/julia#33892)

Once upon a time, there was a young julia user first getting started
with parallelism.
And she found it fearsomely slow.
And so she did investigate, and she did illuminate upon her issue.
Her closures, they were being reserialized again and again.
And so this young woman, she openned an issue JuliaLang/julia#16345.
Lo and behold, a noble soul did come and resolve it,
by making the glorious `CachingPool()` in JuliaLang/julia#16808.

3 long years a later this julia user did bravely return to the world of
parallism, with many battle worn scars.
and once more she did face the demon that is `pmap` over closures.
But to her folly, she felt no fear, for she believed the demon to be
crippled and chained by the glorious `CachingPool`.
Fearlessly, she threw his closure over 2GB of data into the maw of the
demon `pmap`.
But alas, alas indeed, she was wrong.
The demon remained unbound, and it slew her, and slew her again.
100 times did it slay her for 101 items was the user iterating upon. 
For the glorious chains of the the `CachingPool()` remains unused, left
aside in the users tool chest, forgotten.
@oxinabox oxinabox deleted the patch-25 branch May 4, 2024 01:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
parallelism Parallel or distributed computation
Projects
None yet
Development

Successfully merging this pull request may close these issues.