Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix getindex inlining for Array{Union{T,Missing}} #27651

Merged
merged 3 commits into from
Jun 19, 2018
Merged

Fix getindex inlining for Array{Union{T,Missing}} #27651

merged 3 commits into from
Jun 19, 2018

Conversation

nalimilan
Copy link
Member

@nalimilan nalimilan commented Jun 19, 2018

Fixes #27403.

With this PR, a naive sum over non-missing values in a Vector{Union{Int,Missing}} is (probably) as fast as it could theoretically be. Masked SIMD instructions are used, which is almost too good to be true.

function mysum(X)
    s = zero(eltype(X))
    @inbounds @simd for x in X
        if x !== missing
            s += x
        end
    end
    s
end

X1 = rand(Int, 10_000_000);
X2 = Vector{Union{Missing,Int}}(X1);
X3 = ifelse.(rand(length(X2)) .< 0.9, X2, missing);

julia> @btime mysum(X1);
  5.787 ms (1 allocation: 16 bytes)

julia> @btime mysum(X2);
  5.726 ms (1 allocation: 16 bytes)

julia> @btime mysum(X3);
  5.844 ms (1 allocation: 16 bytes)

I've added a test to check that getindex is inlined, but I'm really not sure that's the best way to do it. There are also benchmarks for this in BaseBenchmarks, and I'll add another one (EDIT: see JuliaCI/BaseBenchmarks.jl#207).

vtjnash added 2 commits June 18, 2018 16:07
Reference the wrong copy of `deprecate` relative to the `isdefined` check.
The `isknowntype` test should apply to the container (array) type, not to its eltype.

Fix #27403.
@nalimilan nalimilan added performance Must go faster arrays [a, r, r, a, y, s] missing data Base.missing and related functionality labels Jun 19, 2018
@nalimilan nalimilan requested a review from vtjnash June 19, 2018 08:26
@vtjnash vtjnash merged commit 267dc59 into master Jun 19, 2018
@vtjnash vtjnash deleted the jn/27403 branch June 19, 2018 18:02
@davidanthoff
Copy link
Contributor

I get slightly different numbers, in particular the X1 case is faster than the other two (which kind of makes sense?):

julia> @btime mysum(X1);
  4.812 ms (1 allocation: 16 bytes)

julia> @btime mysum(X2);
  5.757 ms (1 allocation: 16 bytes)

julia> @btime mysum(X3);
  5.753 ms (1 allocation: 16 bytes)

@nalimilan
Copy link
Member Author

Yes, the particular run I posted was a bit extreme. The one I added to the blog post is more similar to yours. I have no idea why it changed.

@vtjnash
Copy link
Member

vtjnash commented Jun 19, 2018

@nanosoldier runbenchmarks(ALL, vs = "@b331c70642e1f792577f2b591a4e5016e3c2d53f")

@nanosoldier
Copy link
Collaborator

Your benchmark job has completed - possible performance regressions were detected. A full report can be found here. cc @ararslan

@nalimilan
Copy link
Member Author

Some probably spurious regressions, lots of Union-related improvements (about 2×-3×), and some very dramatic speedups. This one is particularly nice:

ID time ratio memory ratio
["union", "array", "("perf_sum", Int8, true)"] 0.04 (15%) 1.00 (1%)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
arrays [a, r, r, a, y, s] missing data Base.missing and related functionality performance Must go faster
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants