Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Making sum(skipmissing(x)) faster #27679

Closed
nalimilan opened this issue Jun 20, 2018 · 2 comments
Closed

Making sum(skipmissing(x)) faster #27679

nalimilan opened this issue Jun 20, 2018 · 2 comments
Labels
missing data Base.missing and related functionality performance Must go faster

Comments

@nalimilan
Copy link
Member

sum(skipmissing(x)) is reasonably fast, but since #27651 a naive implementation of sum is even faster (up to 5×). This is particularly clear when the input array does contain missing values, probably because the naive sum uses masked SIMD instructions while skipmissing relies on branching (and the presence of missing values kills prediction).

I haven't investigated this deeply, but IIUC sum(skipmissing(x)) dispatches to mapfoldl, which uses the iteration protocol to go over the input. This means we lose the benefit of @inbounds and of @simd. Maybe something like #27384 would help with @inbounds at least?

function mysum(X)
   s = zero(eltype(X))
   @inbounds @simd for x in X
       if x !== missing
           s += x
       end
   end
   s
end

# With Vector{Int}
X1 = rand(Int, 10_000_000);
X2 = Vector{Union{Missing,Int}}(X1);
X3 = ifelse.(rand(length(X2)) .< 0.9, X2, missing);

julia> @btime mysum(X1);
 5.594 ms (1 allocation: 16 bytes)

julia> @btime mysum(X2);
 5.791 ms (1 allocation: 16 bytes)

julia> @btime mysum(X3);
 5.681 ms (1 allocation: 16 bytes)

julia> @btime sum(skipmissing(X1));
 7.316 ms (2 allocations: 32 bytes)

julia> @btime sum(skipmissing(X2));
 18.175 ms (2 allocations: 32 bytes)

julia> @btime sum(skipmissing(X3));
 28.240 ms (2 allocations: 32 bytes)

# With Vector{Float64}
Y1 = rand(Float64, 10_000_000);
Y2 = Vector{Union{Missing,Float64}}(Y1);
Y3 = ifelse.(rand(length(Y2)) .< 0.9, Y2, missing);

julia> @btime mysum(Y1);
 5.860 ms (1 allocation: 16 bytes)

julia> @btime mysum(Y2);
 13.849 ms (1 allocation: 16 bytes)

julia> @btime mysum(Y3);
 17.428 ms (1 allocation: 16 bytes)

julia> @btime sum(skipmissing(Y1));
 13.816 ms (2 allocations: 32 bytes)

julia> @btime sum(skipmissing(Y2));
 18.254 ms (2 allocations: 32 bytes)

julia> @btime sum(skipmissing(Y3));
 28.430 ms (2 allocations: 32 bytes)
@nalimilan nalimilan added performance Must go faster missing data Base.missing and related functionality labels Jun 20, 2018
@nalimilan
Copy link
Member Author

After some experimentation, it turns out @simd is really the annotation which makes a difference for performance, while @inbounds has only a very small effect alone. This is particularly the case for Vector{Int}, which is surprising since I would have expected it to give more room to the compiler for Vector{Float64}. This is worth investigating, as it doesn't seem we should need an explicit @simd to get good performance for Int (another question is why aren't SIMD instructions used for Float64, but it's kind of orthogonal).

The more general problem is that we cannot use @simd like sum(::Array) does since the involved methods use while loops here rather than for loops. The pattern used by iterate on SkipMissing objects is probably too hard to optimize for LLVM: it's a while loop returning the first non-missing value. Unless/until we can make LLVM detect that this pattern can be translated to masked SIMD instructions, it looks like we should implement special reduction methods for SkipMissing arguments which call reduce on the underlying array with a helper function which does nothing when the input is missing.

@nalimilan
Copy link
Member Author

Actually it doesn't seem possible to implement an efficient mapreduce method for SkipMissing by just calling mapreduce on the wrapped array with a custom function wrapping op:

  • x === missing is used by inference only if it's in the same function (ismissing(x) much slower than x === missing #27681 (comment)), so calling it inside a function wrapping op and passed to mapreduce isn't efficient (even if it's inlined).
  • Even if that was fast, it's not clear to me how to get the function wrapping op to work when passed two missing values, as it's always supposed to return a non-missing value. That would work if a valid v0 was always passed, but that's not the case, and we only know a neutral v0 for standard operations like sum, not in general. Maybe it could work with a pseudo-missing value which would disappear instead of propagating, but that would require defining all the methods that missing overrides.

So I had to implement a full mapreduce implementation for SkipMissing{<:AbstractArray}. It's too bad we need to duplicate/adapt methods, but at least the results are good in terms of performance. See #27743.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
missing data Base.missing and related functionality performance Must go faster
Projects
None yet
Development

No branches or pull requests

1 participant