map revisions #128

davidagold · 2016-07-06T16:02:19Z

This is an attempt to transition away from the metaprogramming-heavy approach to map we took for Julia v0.4. Admittedly, this does introduce the use of generated functions in n-arg map/map!. ~~Does anybody have any suggestions as to how best maintain compatibility with 0.4? What I have here seems kind of crude.~~

There's a lot of code repetition due to the 1-arg, 2-arg and n-arg methods for map. The Base map(f, A) regime calls collect_similar(A, Generator(f, A)), but I haven't found a way to collect over a Generator{NullableArray, F} without creating a lot of unnecessary Nullable objects -- so, hooking into that interface isn't really an option. I'm not sure the best way to reduce the code footprint here.

Here are the significant points for this PR:

fix map is broken on v0.0.5 #130
remove as much dependence on metaprogramming as possible
Settle on well-defined behavior for the two "emtpy" cases: (i) map over empty NullableArray and (ii) map over non-empty but all-nullNullableArray`.

For both cases (i) and (ii) above, the call to map can be lifted or not. If map is not lifted, it falls back on the implementation in Base, which differs between 0.4 and 0.5 (and also between the 1-arg, 2-arg and n-arg implementations). On 0.5, we have the following four combinations of behaviors:

map(f, X; lift=false), where X is empty. On 0.5, now that 0.5-style comprehensions JuliaLang/julia#16622 has been merged, this uses type inference (Base.return_types) to select the type parameter for the returned empty NullableArray{T}. Note that, since f is not lifted, return_types is called on f, (eltype(X),). If somewhere down the line a method specialized on Nullable arguments is missing, this will return an empty NullableArray{Union{}}:

julia> X = NullableArray(Int[])
0-element NullableArrays.NullableArray{Int64,1}

julia> f(x) = 5*x
f (generic function with 1 method)

julia> map(f, X)
0-element NullableArrays.NullableArray{Union{},1}

julia> h(x::Nullable) = Nullable(5*x.value, x.isnull)
h (generic function with 1 method)

julia> map(h, X)
0-element NullableArrays.NullableArray{Int64,1}

map(f, X; lift=true), where X is empty. This uses type inference (Base.return_types) to select the type parameter for the returned empty NullableArray. Since f is lifted, return_types is called on f, (inner_eltype(X),), where inner_eltype(X::NullableArray{T}) = T:

julia> map(f, X; lift=true)
0-element NullableArrays.NullableArray{Int64,1}

map(f, X; lift=false) where X is non-empty but all null. The returned NullableArray{T} does not depend on type inference --- it depends entirely on the behavior of f applied to empty Nullable{T} objects.

julia> map(f, Y)
ERROR: MethodError: no method matching *(::Int64, ::Nullable{Int64})
Closest candidates are:
  *(::Any, ::Any, ::Any, ::Any...)
  *{S}(::Nullable{Union{}}, ::Nullable{S})
  *{T<:Union{Int128,Int16,Int32,Int64,Int8,UInt128,UInt16,UInt32,UInt64,UInt8}}(::T<:Union{Int128,Int16,Int32,Int64,Int8,UInt128,UInt16,UInt32,UInt64,UInt8}, ::T<:Union{Int128,Int16,Int32,Int64,Int8,UInt128,UInt16,UInt32,UInt64,UInt8})
  ...
 in _collect(::NullableArrays.NullableArray{Int64,1}, ::Base.Generator{NullableArrays.NullableArray{Int64,1},#f}, ::Base.EltypeUnknown, ::Base.HasShape) at ./array.jl:283
 in #map#11 at /Users/David/.julia/v0.5/NullableArrays/src/map.jl:108 [inlined]
 in map(::Function, ::NullableArrays.NullableArray{Int64,1}) at /Users/David/.julia/v0.5/NullableArrays/src/map.jl:108
 in eval(::Module, ::Any) at ./boot.jl:234
 in macro expansion at ./REPL.jl:92 [inlined]
 in (::Base.REPL.##1#2{Base.REPL.REPLBackend})() at ./event.jl:46

julia> map(h, Y)
3-element NullableArrays.NullableArray{Int64,1}:
 #NULL
 #NULL
 #NULL

map(f, X; lift=true) where X is non-empty but all null. This does use type inference to determine the return type of the returned NullableArray.

julia> map(f, Y; lift=true)
3-element NullableArrays.NullableArray{Int64,1}:
 #NULL
 #NULL
 #NULL

Brief performance summary (on 0.5) (NOTE: revised as of 7/12/16):

using NullableArrays
using BenchmarkTools
n = 1_000_000
A = rand(n)
M = rand(Bool, n)
X = NullableArray(A)
Y = NullableArray(A, M)
As = [ rand(n) for i in 1:5 ]
Ms = [ rand(Bool, n) for i in 1:5 ]
Xs = [ NullableArray(A) for A in As ]
Ys = [ NullableArray(A, M) for (A, M) in zip(As, Ms)]
f(x) = 5*x
f(x::Nullable) = Nullable(5)*x
f(xs...) = prod(xs)

julia> @benchmark map(f, A)
BenchmarkTools.Trial: 
  samples:          1094
  evals/sample:     1
  time tolerance:   5.00%
  memory tolerance: 1.00%
  memory estimate:  7.63 mb
  allocs estimate:  4
  minimum time:     2.79 ms (0.00% GC)
  median time:      3.38 ms (0.00% GC)
  mean time:        4.57 ms (21.49% GC)
  maximum time:     21.65 ms (0.00% GC)

julia> @benchmark map(f, X)
BenchmarkTools.Trial: 
  samples:          400
  evals/sample:     1
  time tolerance:   5.00%
  memory tolerance: 1.00%
  memory estimate:  8.58 mb
  allocs estimate:  12
  minimum time:     9.47 ms (0.00% GC)
  median time:      12.01 ms (0.00% GC)
  mean time:        12.51 ms (8.02% GC)
  maximum time:     36.04 ms (0.00% GC)

julia> @benchmark map(f, X; lift=true)
BenchmarkTools.Trial: 
  samples:          895
  evals/sample:     1
  time tolerance:   5.00%
  memory tolerance: 1.00%
  memory estimate:  8.58 mb
  allocs estimate:  9
  minimum time:     3.83 ms (0.00% GC)
  median time:      4.59 ms (0.00% GC)
  mean time:        5.58 ms (16.47% GC)
  maximum time:     65.67 ms (0.00% GC)

julia> @benchmark map(f, Y)
BenchmarkTools.Trial: 
  samples:          298
  evals/sample:     1
  time tolerance:   5.00%
  memory tolerance: 1.00%
  memory estimate:  8.58 mb
  allocs estimate:  12
  minimum time:     14.41 ms (0.00% GC)
  median time:      16.95 ms (0.00% GC)
  mean time:        16.82 ms (5.38% GC)
  maximum time:     21.59 ms (18.36% GC)

julia> @benchmark map(f, Y; lift=true)
BenchmarkTools.Trial: 
  samples:          582
  evals/sample:     1
  time tolerance:   5.00%
  memory tolerance: 1.00%
  memory estimate:  8.58 mb
  allocs estimate:  9
  minimum time:     6.05 ms (0.00% GC)
  median time:      7.84 ms (0.00% GC)
  mean time:        8.60 ms (12.49% GC)
  maximum time:     28.21 ms (44.18% GC)

julia> @benchmark map(f, As...)
BenchmarkTools.Trial: 
  samples:          222
  evals/sample:     1
  time tolerance:   5.00%
  memory tolerance: 1.00%
  memory estimate:  7.63 mb
  allocs estimate:  9
  minimum time:     19.83 ms (0.00% GC)
  median time:      22.67 ms (0.00% GC)
  mean time:        22.59 ms (4.04% GC)
  maximum time:     30.93 ms (8.61% GC)

@benchmark map(f, Xs...)
BenchmarkTools.Trial: 
  samples:          2
  evals/sample:     1
  time tolerance:   5.00%
  memory tolerance: 1.00%
  memory estimate:  2.10 gb
  allocs estimate:  61995904
  minimum time:     4.87 s (9.77% GC)
  median time:      4.97 s (9.63% GC)
  mean time:        4.97 s (9.63% GC)
  maximum time:     5.07 s (9.49% GC)

julia> @benchmark map(f, Xs...; lift=true)
BenchmarkTools.Trial: 
  samples:          291
  evals/sample:     1
  time tolerance:   5.00%
  memory tolerance: 1.00%
  memory estimate:  8.58 mb
  allocs estimate:  15
  minimum time:     12.57 ms (0.00% GC)
  median time:      15.94 ms (0.00% GC)
  mean time:        17.18 ms (5.67% GC)
  maximum time:     80.23 ms (0.00% GC)

julia> @benchmark map(f, Ys...)
BenchmarkTools.Trial: 
  samples:          1
  evals/sample:     1
  time tolerance:   5.00%
  memory tolerance: 1.00%
  memory estimate:  2.10 gb
  allocs estimate:  61995904
  minimum time:     5.30 s (9.30% GC)
  median time:      5.30 s (9.30% GC)
  mean time:        5.30 s (9.30% GC)
  maximum time:     5.30 s (9.30% GC)

@benchmark map(f, Ys...; lift=true)
BenchmarkTools.Trial: 
  samples:          350
  evals/sample:     1
  time tolerance:   5.00%
  memory tolerance: 1.00%
  memory estimate:  8.58 mb
  allocs estimate:  15
  minimum time:     12.21 ms (0.00% GC)
  median time:      14.08 ms (0.00% GC)
  mean time:        14.31 ms (6.22% GC)
  maximum time:     22.85 ms (11.79% GC)

It seems pretty clear that, as far as performance goes, we ought to transition towards lifting by default. I think this seems reasonable for most statistics/data science-related use cases. To do so, would we need to deprecate the current behavior for a release cycle? [7/12/16]: Lifting will probably yield best performance, but clearly something is sub-optimal with the (n-arg especially) non-lifted map implementation.

@nalimilan @johnmyleswhite thoughts?

-Specifically include tests for 2 arg map/map! -Reorganize module along 1 arg, 2 arg, n arg methods -Correct mistake (n args tests tested lifted map! twice, failed to test lifted map)

Remove metaprogramming from previous regime (though introduce use of generated functions for n arg map) include compatibility for versions <= JuliaLang/julia@a2ce230

nalimilan · 2016-07-07T09:33:00Z

I wouldn't worry too much about 0.4 compatibility, keeping the existing code in a separate file is fine.

Regarding lifting or not, that's a tough call. In practice, I guess most people will use the lifting behavior, but doing this by default would mean we break the AbstractArray API by not applying the function on elements of the type of eltype(x). Performance-wise, the benchmarks show a lot of allocations which I wouldn't expect: are you sure there isn't a problem somewhere? (I don't master that code at all.)

Finally, I wonder about the difference between these two benchmarks:

julia> @benchmark map(f, As...)
BenchmarkTools.Trial: 
  samples:          158
  evals/sample:     1
  time tolerance:   5.00%
  memory tolerance: 1.00%
  memory estimate:  7.63 mb
  allocs estimate:  10
  minimum time:     27.83 ms (0.00% GC)
  median time:      30.86 ms (0.00% GC)
  mean time:        31.65 ms (2.66% GC)
  maximum time:     40.85 ms (6.13% GC)

julia> @benchmark map(f, Xs...; lift=true)
BenchmarkTools.Trial: 
  samples:          354
  evals/sample:     1
  time tolerance:   5.00%
  memory tolerance: 1.00%
  memory estimate:  8.58 mb
  allocs estimate:  32
  minimum time:     12.27 ms (0.00% GC)
  median time:      13.62 ms (0.00% GC)
  mean time:        14.18 ms (6.01% GC)
  maximum time:     23.19 ms (10.96% GC)

How it is that the NullableArray version is twice faster? :-)

davidagold · 2016-07-07T15:39:04Z

@nalimilan

w/r/t the allocations on non-lifted map: I thought these are due to all the Nullable objects being created by indexing. Or should they not show up? There may be a problem.

w/r/t beating map(f, As...): Haha, well... there are two primary differences (in my eyes) between the implementation of map(f, As...) here and in Base. The first is that the Base version collects over Generator(f, As...), and the latter zips all of the As together in a Zip object. I don't know if this introduces overhead, but it might be less efficient than working with the As directly, as we do here. The second difference has to do with indexing. The Base implementation uses ith_all(i, As)... to access the ith element of each argument Array, which involves a fair amount of splatting. What we do here is use a macro @fcall to splice f(Xs[1][i], Xs[2][i], Xs[3][i], ...) into the _liftedmap_to! loop body. This does require that _liftedmap_to! be a generated function.

Whatever the case, if you write map for regular Arrays in the style I've written them for NullableArrays, you do see a speed up over the current Base implementation: https://gist.github.com/davidagold/d7088aae22f23d383e5bf1f26aa1a045

using BenchmarkTools
n = 1_000_000
As = [ rand(n) for i in 1:5 ]
f(xs...) = prod(xs)

julia> @benchmark mymap(f, As...)
BenchmarkTools.Trial: 
  samples:          786
  evals/sample:     1
  time tolerance:   5.00%
  memory tolerance: 1.00%
  memory estimate:  7.63 mb
  allocs estimate:  8
  minimum time:     4.59 ms (0.00% GC)
  median time:      5.35 ms (0.00% GC)
  mean time:        6.36 ms (14.65% GC)
  maximum time:     34.15 ms (0.00% GC)

julia> mymap(f, As...) == map(f, As...)
true

This suggests that the current Base n-arg map implementation leaves some performance on the table --- around 5x in this case.

nalimilan · 2016-07-07T16:33:14Z

w/r/t the allocations on non-lifted map: I thought these are due to all the Nullable objects being created by indexing. Or should they not show up? There may be a problem.

Not sure. But since it's an immutable, it should be possible to avoid the allocation at least in some scenarios. For example, by inlining the function which creates the Nullable (the rest of the work could be wrapped in a @noinline function). You'd need to ask more expert people about that.

As regards being faster than Base's map: could you open an issue to discuss this, or ask for feedback on Slack?

Removes metaprogramming-heavy 0.4 map regime, which seems to be responsible for strange intermittent Travis failures.

codecov-io · 2016-07-12T21:10:59Z

Current coverage is 80.24%

Merging #128 into master will decrease coverage by 1.37%

@@             master       #128   diff @@
==========================================
  Files            13         14     +1   
  Lines           729        825    +96   
  Methods           0          0          
  Messages          0          0          
  Branches          0          0          
==========================================
+ Hits            595        662    +67   
- Misses          134        163    +29   
  Partials          0          0

Powered by Codecov. Last updated by 5ed6f7c...3a49e22

Specifications described in detail in #128

davidagold · 2016-07-12T21:23:47Z

I'm going to merge this, since it passes on 0.4 and the map tests pass on nightly -- it's just the other stuff that's broken due to the introduction of the Base.OneTo range object that's causing tests to fail. The design choices above and their implementation don't need to be the final word, but I think they're a step in the right direction. And I feel it's important to fix #130.

ararslan · 2016-07-12T21:31:59Z

or ask for feedback on Slack

There's a Slack group?

johnmyleswhite · 2016-07-12T21:39:19Z

A few of us chat on a private Slack channel.

ararslan · 2016-07-12T21:48:06Z

Ah, okay. Org owners only, I presume?

Revise tests for src/map.jl

8135340

-Specifically include tests for 2 arg map/map! -Reorganize module along 1 arg, 2 arg, n arg methods -Correct mistake (n args tests tested lifted map! twice, failed to test lifted map)

davidagold force-pushed the lmap branch 3 times, most recently from 6478caa to 3bc6fa2 Compare July 6, 2016 18:25

map for Julia v0.5

3bc6fa2

Remove metaprogramming from previous regime (though introduce use of generated functions for n arg map) include compatibility for versions <= JuliaLang/julia@a2ce230

davidagold force-pushed the lmap branch 2 times, most recently from 1c53e04 to 3c9ae3b Compare July 7, 2016 16:58

davidagold mentioned this pull request Jul 7, 2016

n-arg map performance JuliaLang/julia#17321

Closed

davidagold force-pushed the lmap branch 2 times, most recently from fe04a2a to 3a49e22 Compare July 7, 2016 18:18

Revise 0.4 compatibility

3a49e22

Removes metaprogramming-heavy 0.4 map regime, which seems to be responsible for strange intermittent Travis failures.

davidagold mentioned this pull request Jul 10, 2016

map is broken on v0.0.5 #130

Closed

Behavior for empty/all null NullableArrays

8b03beb

Specifications described in detail in #128

davidagold changed the title ~~WIP: map for 0.5~~ map for 0.5 Jul 12, 2016

davidagold changed the title ~~map for 0.5~~ map revisions Jul 12, 2016

davidagold merged commit e350e40 into master Jul 12, 2016

ararslan deleted the lmap branch July 12, 2016 21:31

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

map revisions #128

map revisions #128

davidagold commented Jul 6, 2016 •

edited

Loading

nalimilan commented Jul 7, 2016

davidagold commented Jul 7, 2016

nalimilan commented Jul 7, 2016

codecov-io commented Jul 12, 2016

davidagold commented Jul 12, 2016

ararslan commented Jul 12, 2016

johnmyleswhite commented Jul 12, 2016

ararslan commented Jul 12, 2016

map revisions #128

map revisions #128

Conversation

davidagold commented Jul 6, 2016 • edited Loading

nalimilan commented Jul 7, 2016

davidagold commented Jul 7, 2016

nalimilan commented Jul 7, 2016

codecov-io commented Jul 12, 2016

Current coverage is 80.24%

davidagold commented Jul 12, 2016

ararslan commented Jul 12, 2016

johnmyleswhite commented Jul 12, 2016

ararslan commented Jul 12, 2016

davidagold commented Jul 6, 2016 •

edited

Loading