RFC/WIP : cache mapped function remotely for pmap. [ci skip] #16695

amitmurthy · 2016-06-01T09:47:51Z

This is a WIP to address #16345 and the initial results are encouraging.

It caches the mapping function remotely for the duration of the pmap call.

Some timings with julia -p4:

function foo(c, n, cache)
    a = ones(n)
    f = x->sum(a) * x
    t = @elapsed results = pmap(f, 1:c; cache=cache)
    @assert results == map(f, 1:c)
    t
end

for coll_sz in [10, 10^2, 10^4]
    for data_sz in [10, 10^4, 10^6]
        println("coll_sz:", coll_sz, ", data_sz:", data_sz)
        tt = foo(coll_sz, data_sz, true)
        tf = foo(coll_sz, data_sz, false)

        println("cache=true ",tt)
        println("cache=false ",tf)
        println()
    end
end


coll_sz:10, data_sz:10
cache=true 0.010037915
cache=false 0.005980962

coll_sz:10, data_sz:10000
cache=true 0.007717142
cache=false 0.006861273

coll_sz:10, data_sz:1000000
cache=true 0.01567612
cache=false 0.029287787

coll_sz:100, data_sz:10
cache=true 0.03850731
cache=false 0.061579714

coll_sz:100, data_sz:10000
cache=true 0.036386185
cache=false 0.07370864

coll_sz:100, data_sz:1000000
cache=true 0.072936067
cache=false 0.260751843

coll_sz:10000, data_sz:10
cache=true 4.049040494
cache=false 7.348653169

coll_sz:10000, data_sz:10000
cache=true 4.24085953
cache=false 7.581796319

coll_sz:10000, data_sz:1000000
cache=true 6.063055093
cache=false 27.442259194

I'll hold off working on this till #16508 is addressed as both the code and interface may change. For now feedback on the caching method used here will be appreciated.

tkelman · 2016-06-01T10:06:02Z

base/workerpool.jl

+end
+
+function exec_from_cache(f, rr::RemoteChannel, args...)
+    if (f==nothing)


no parens needed around if condition when it's this short, usually a bit better to compare to nothing using ===

JeffBezanson · 2016-06-02T16:24:10Z

I think we need a more comprehensive cache. The slow part is sending a TypeName. We should remember which TypeNames we have sent to which workers, and avoid sending them more than once. Granted, it's a bit ugly to do this in the serialization layer, but I think we should just do it because it will fix the problem for all forms of remote calls.

oxinabox · 2016-06-03T09:38:22Z

base/workerpool.jl

+    end
+end
+
+cached_remote(cwp::CachedWorkerPool, f) = (args...) -> remotecall_fetch(f, cwp, args...)


Why not instead use multiple dispatch and have

remote(cwp::CachedWorkerPool, f) = (args...) -> remotecall_fetch(f, cwp, args...)

If CachedWorkerPool<:AbstractWorkerPool, and WorkerPool<:AbstractWorkerPool,
then in general there could just be

remote(p::AbstractWorkerPool, f) = (args...; kwargs...)->remotecall_fetch(f, p, args...; kwargs...)

(that would also mean adding support for kwargs, but it think that would be a good thing for making things transparent.

I think in general maybe if CachedWorkerPool acted just like a workerpool, but caching all suitable data passed through remote_* with it, might be good. So a CachingWorkerPool
(I'm not entirely satisfied with that either)

Yeah, I have thought about this, i.e., defining an AbstractWorkerPool and the expected methods for any implementation of the same.

I think it should be done irrespective of how this PR unfolds - will be useful to allow users to extend the WorkerPool concept as per their specific needs.

amitmurthy · 2016-06-07T08:57:22Z

Will open another PR with a generic version of this functionality

cache mapped function remotely for pmap. [ci skip]

32cf0ee

tkelman reviewed Jun 1, 2016
View reviewed changes

kshyatt added the parallelism Parallel or distributed computation label Jun 1, 2016

StefanKarpinski mentioned this pull request Jun 2, 2016

serializing closures is slow on master #16508

Closed

StefanKarpinski assigned JeffBezanson Jun 2, 2016

oxinabox reviewed Jun 3, 2016
View reviewed changes

amitmurthy mentioned this pull request Jun 5, 2016

Send TypeName objects only once to workers #16774

Merged

amitmurthy closed this Jun 7, 2016

amitmurthy deleted the amitm/pmap_cache branch June 7, 2016 08:57

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RFC/WIP : cache mapped function remotely for pmap. [ci skip] #16695

RFC/WIP : cache mapped function remotely for pmap. [ci skip] #16695

amitmurthy commented Jun 1, 2016

tkelman Jun 1, 2016 •

edited

Loading

JeffBezanson commented Jun 2, 2016

oxinabox Jun 3, 2016

amitmurthy Jun 3, 2016

amitmurthy commented Jun 7, 2016

RFC/WIP : cache mapped function remotely for pmap. [ci skip] #16695

RFC/WIP : cache mapped function remotely for pmap. [ci skip] #16695

Conversation

amitmurthy commented Jun 1, 2016

tkelman Jun 1, 2016 • edited Loading

Choose a reason for hiding this comment

JeffBezanson commented Jun 2, 2016

oxinabox Jun 3, 2016

Choose a reason for hiding this comment

amitmurthy Jun 3, 2016

Choose a reason for hiding this comment

amitmurthy commented Jun 7, 2016

tkelman Jun 1, 2016 •

edited

Loading